Disaster Recovery Glossary
Clear definitions for disaster recovery, business continuity, and IT resilience terms.
A
- Air Gap
-
A security measure where a computer or network is physically isolated from unsecured networks, including the internet. Air-gapped backups provide protection against ransomware.
Related: Backup, Ransomware Protection, Data Security
B
- Business Continuity Plan (BCP)
-
A comprehensive plan that outlines how a business will continue operating during and after an unplanned disruption. BCP covers all aspects of the business including personnel, facilities, communications, and IT systems.
Related: Disaster Recovery Plan, Business Impact Analysis, Crisis Management
- Business Impact Analysis (BIA)
-
A systematic process to determine and evaluate the potential effects of an interruption to critical business operations. BIA identifies time-sensitive functions and their resource dependencies.
Related: Business Continuity Plan, Risk Assessment, Recovery Time Objective
C
- Cold Site
-
A backup facility with basic infrastructure (power, cooling, network connectivity) but no pre-installed computing equipment. Requires significant time to become operational after a disaster.
Related: Hot Site, Warm Site, Disaster Recovery Site
D
- DR Planning & Readiness
-
Disaster Recovery as a Service - a cloud computing service model that allows organizations to back up their data and IT infrastructure in a third-party cloud computing environment.
Related: Cloud Backup, Disaster Recovery, BaaS
- Disaster Recovery Plan (DRP)
-
A documented, structured approach that describes how an organization can quickly resume work after an unplanned incident. A DRP focuses specifically on IT systems and data recovery.
Related: Business Continuity Plan, Recovery Time Objective, Runbook
F
- Failback
-
The process of returning operations to the primary system after a failover event, once the primary system has been restored and verified.
Related: Failover, Disaster Recovery
- Failover
-
The process of automatically or manually switching to a redundant or standby system, server, or network when the primary system becomes unavailable due to failure or planned downtime.
Example: When the primary database server fails, automatic failover switches traffic to the standby server within seconds.
Related: High Availability, Redundancy, Failback
H
- High Availability (HA)
-
A system design approach and associated service implementation that ensures a prearranged level of operational performance, usually uptime of 99.9% or higher.
Example: A high availability database cluster maintains 99.99% uptime by automatically failing over between nodes.
Related: Failover, Redundancy, Load Balancing
- Hot Site
-
A fully operational offsite data center equipped with hardware and software, kept in sync with the primary site, that can take over operations immediately after a disaster.
Related: Cold Site, Warm Site, Disaster Recovery Site
I
- Incident Response Plan (IRP)
-
A set of instructions to help IT staff detect, respond to, and recover from network security incidents including data breaches, malware attacks, and system compromises.
Related: Disaster Recovery Plan, Security Operations
M
- Mean Time Between Failures (MTBF)
-
The predicted elapsed time between inherent failures of a system during operation. Higher MTBF indicates more reliable systems.
Related: Mean Time to Recovery, Reliability
- Mean Time to Recovery (MTTR)
-
The average time required to repair a failed component or system and restore it to operational status. A key metric for measuring DR effectiveness.
Related: Recovery Time Objective, Mean Time Between Failures
R
- Recovery Point Objective (RPO)
-
The maximum acceptable amount of data loss measured in time. RPO determines the frequency of backups and represents how much data the organization can afford to lose in a disaster scenario.
Example: A financial system with a 15-minute RPO requires transaction logs to be backed up every 15 minutes.
Related: Recovery Time Objective, Backup, Replication
- Recovery Time Objective (RTO)
-
The maximum acceptable time that a system, application, or function can be down after a failure or disaster before the business impact becomes unacceptable. RTO is measured from the point of disruption to the point when the system is restored.
Example: An e-commerce site with a 4-hour RTO must be back online within 4 hours of any outage.
Related: Recovery Point Objective, Disaster Recovery Plan, Failover
- Redundancy
-
The duplication of critical components or functions of a system with the intention of increasing reliability. In DR, redundancy ensures backup resources are available when primary resources fail.
Related: High Availability, Failover, N+1 Redundancy
- Replication
-
The process of copying data from one location to another to ensure consistency between redundant resources. Can be synchronous (real-time) or asynchronous (delayed).
Related: Backup, Recovery Point Objective, Data Mirroring
- Runbook
-
A compilation of routine procedures and operations that a system administrator or operator carries out. In disaster recovery, runbooks contain step-by-step instructions for recovery procedures.
Related: Playbook, Standard Operating Procedure, Disaster Recovery Plan
T
- Tabletop Exercise
-
A discussion-based drill where team members walk through a simulated disaster scenario to test the disaster recovery plan, identify gaps, and improve response procedures without actually executing the recovery.
Related: Disaster Recovery Testing, Simulation Test
W
- Warm Site
-
A backup facility that has some pre-installed hardware and connectivity but is not fully operational. Offers a middle ground between hot and cold sites in terms of cost and recovery time.
Related: Hot Site, Cold Site, Disaster Recovery Site