Disaster Recovery Glossary | Crispy Umbrella

A

Air Gap: A security measure where a computer or network is physically isolated from unsecured networks, including the internet. Air-gapped backups provide protection against ransomware.
Related: Backup, Ransomware Protection, Data Security

B

Business Continuity Plan (BCP): A comprehensive plan that outlines how a business will continue operating during and after an unplanned disruption. BCP covers all aspects of the business including personnel, facilities, communications, and IT systems.
Related: Disaster Recovery Plan, Business Impact Analysis, Crisis Management
Business Impact Analysis (BIA): A systematic process to determine and evaluate the potential effects of an interruption to critical business operations. BIA identifies time-sensitive functions and their resource dependencies.
Related: Business Continuity Plan, Risk Assessment, Recovery Time Objective

C

Cold Site: A backup facility with basic infrastructure (power, cooling, network connectivity) but no pre-installed computing equipment. Requires significant time to become operational after a disaster.
Related: Hot Site, Warm Site, Disaster Recovery Site

D

DR Planning & Readiness: Disaster Recovery as a Service - a cloud computing service model that allows organizations to back up their data and IT infrastructure in a third-party cloud computing environment.
Related: Cloud Backup, Disaster Recovery, BaaS
Disaster Recovery Plan (DRP): A documented, structured approach that describes how an organization can quickly resume work after an unplanned incident. A DRP focuses specifically on IT systems and data recovery.
Related: Business Continuity Plan, Recovery Time Objective, Runbook

F

Failback

The process of returning operations to the primary system after a failover event, once the primary system has been restored and verified.

Related: Failover, Disaster Recovery

Failover

The process of automatically or manually switching to a redundant or standby system, server, or network when the primary system becomes unavailable due to failure or planned downtime.

Example: When the primary database server fails, automatic failover switches traffic to the standby server within seconds.

Related: High Availability, Redundancy, Failback

H

High Availability (HA)

A system design approach and associated service implementation that ensures a prearranged level of operational performance, usually uptime of 99.9% or higher.

Example: A high availability database cluster maintains 99.99% uptime by automatically failing over between nodes.

Related: Failover, Redundancy, Load Balancing

Hot Site

A fully operational offsite data center equipped with hardware and software, kept in sync with the primary site, that can take over operations immediately after a disaster.

I

Incident Response Plan (IRP): A set of instructions to help IT staff detect, respond to, and recover from network security incidents including data breaches, malware attacks, and system compromises.
Related: Disaster Recovery Plan, Security Operations

M

Mean Time Between Failures (MTBF): The predicted elapsed time between inherent failures of a system during operation. Higher MTBF indicates more reliable systems.
Related: Mean Time to Recovery, Reliability
Mean Time to Recovery (MTTR): The average time required to repair a failed component or system and restore it to operational status. A key metric for measuring DR effectiveness.
Related: Recovery Time Objective, Mean Time Between Failures

R

Recovery Point Objective (RPO)

The maximum acceptable amount of data loss measured in time. RPO determines the frequency of backups and represents how much data the organization can afford to lose in a disaster scenario.

Example: A financial system with a 15-minute RPO requires transaction logs to be backed up every 15 minutes.

Recovery Time Objective (RTO)

The maximum acceptable time that a system, application, or function can be down after a failure or disaster before the business impact becomes unacceptable. RTO is measured from the point of disruption to the point when the system is restored.

Example: An e-commerce site with a 4-hour RTO must be back online within 4 hours of any outage.

Redundancy

The duplication of critical components or functions of a system with the intention of increasing reliability. In DR, redundancy ensures backup resources are available when primary resources fail.

Related: High Availability, Failover, N+1 Redundancy

Replication

The process of copying data from one location to another to ensure consistency between redundant resources. Can be synchronous (real-time) or asynchronous (delayed).

Runbook

A compilation of routine procedures and operations that a system administrator or operator carries out. In disaster recovery, runbooks contain step-by-step instructions for recovery procedures.

T

Tabletop Exercise: A discussion-based drill where team members walk through a simulated disaster scenario to test the disaster recovery plan, identify gaps, and improve response procedures without actually executing the recovery.
Related: Disaster Recovery Testing, Simulation Test

W

Warm Site: A backup facility that has some pre-installed hardware and connectivity but is not fully operational. Offers a middle ground between hot and cold sites in terms of cost and recovery time.
Related: Hot Site, Cold Site, Disaster Recovery Site