Describe disaster recovery strategies

Disaster Recovery Strategies – Cambridge International AS & A Level IT (9626)

1. What is Disaster Recovery (DR)?

Disaster recovery (DR) is the collection of policies, tools and procedures that enable an organisation to restore its IT systems, data and network services after a disruptive event such as hardware failure, natural disaster, cyber‑attack or human error. DR is a specialised part of business continuity that focuses on the technology layer.

2. Key Objectives of a DR Plan

  • Minimise downtime and loss of productivity.
  • Protect the integrity, confidentiality and availability of data.
  • Meet legal, contractual and regulatory obligations.
  • Provide a clear, documented process that can be followed under stress.
  • Support the wider business‑continuity strategy of the organisation.

3. Core DR Metrics

  • Recovery Time Objective (RTO) – the maximum acceptable period that a service can be unavailable.
  • Recovery Point Objective (RPO) – the maximum acceptable amount of data loss measured in time (e.g., an RPO of 15 minutes means backups must be taken at least every 15 minutes).

4. Syllabus Mapping – How DR Relates to the Cambridge IT Curriculum

Topic (AS / A Level) Link to Disaster Recovery
1. Data processing & information
  • RPO/RTO define how much data may be lost and how long services can be down.
  • After a restore, data validation (checksum, hash‑check) ensures integrity, while data quality checks verify accuracy and completeness.
2. Hardware & software
  • Identify critical servers, storage devices, operating systems and virtualisation platforms.
  • Utility software – backup agents, imaging tools (e.g., Acronis, Clonezilla).
  • System software – OS‑level recovery options (Windows Recovery Environment, Linux initramfs), boot‑media creation.
  • Off‑the‑shelf vs. custom‑written backup solutions (topic 2.4).
3. Monitoring & control
  • Environmental sensors (temperature, fire, flood) and UPS monitoring feed alerts to the DR team.
  • Automated health‑checks (heartbeat, service‑status) trigger the DR activation procedure.
4. Algorithms & flowcharts

Decision‑making algorithms are expressed as flowcharts and pseudocode. Example:

IF incident_severity >= DR_THRESHOLD THEN
    CALL activate_DR_plan()
    IF primary_site_accessible = FALSE THEN
        IF RPO <= 15min THEN
            SELECT hot_site
        ELSEIF RPO <= 4h THEN
            SELECT warm_site
        ELSE
            SELECT cold_site
        ENDIF
    ENDIF
ENDIF
                

See the “Suggested Diagram – DR Process Flowchart” (section 13) for a visual version.

5. e‑Security
  • Encryption of backups (AES‑256, TLS for cloud storage).
  • Access‑control mechanisms – role‑based access, multi‑factor authentication for backup repositories.
  • Secure key‑management (hardware security modules, key‑rotation policies).
  • Air‑gapped media and ransomware‑hardening measures.
6. Digital divide
  • Low‑cost strategies (tape backups, cold sites) enable small schools, NGOs or developing‑country businesses to implement DR.
  • Discussion of equity: how lack of DR can widen the digital divide.
7. Expert systems & AI
  • Knowledge base – historic failure logs, component health thresholds.
  • Inference engine – AI model predicts imminent disk failure and automatically initiates a pre‑emptive replication.
  • Example: A data‑centre monitoring system that, after detecting a temperature rise, runs a rule‑based expert system to switch traffic to a hot site.
8. Spreadsheets
  • DR‑status register (see section 9.1).
  • Functions required by the syllabus: IF, VLOOKUP/XLOOKUP, data‑validation lists, conditional formatting, pivot tables and pivot charts.
9. Modelling
  • What‑if cost vs. RTO model (section 9.2) includes sensitivity analysis (changing bandwidth cost, storage price, data volume).
  • Students may use Excel Solver or a simple Monte‑Carlo simulation to explore risk.
10. Databases & file concepts
  • Backup of relational databases – full dump, transaction‑log backups, point‑in‑time recovery (PITR).
  • Use of ER diagrams to document backup metadata (tables, relationships, retention policies).
  • File concepts – versioning, immutable storage (WORM), and file‑system snapshots (e.g., ZFS, VSS).
11. Video & audio editing Large media assets require high‑capacity storage and fast restore times; archival strategies (LTO tape, cloud object storage) are part of DR.
12. IT in society DR failures can affect public services, banking and health care – highlighting social responsibility and ethical considerations.
13. Emerging technologies Cloud DR, container orchestration, edge computing and blockchain‑based immutable backups.
14. Communications & networking Redundant network links, DNS failover, VPN tunnels and SD‑WAN for remote‑site activation.
15. Project management DR planning follows the same phases: initiation, planning, execution, monitoring & control, closure.
16. System life‑cycle DR considerations are built into design, implementation and maintenance stages of any system.
17. Data analysis Log analysis identifies patterns that inform RPO settings and replication frequency.
18. Mail‑merge & document automation Templates for DR incident reports, communication plans and stakeholder notifications.
19. Graphics & animation Visual assets (e.g., marketing videos) are included in media‑asset DR policies.
20. Web programming Version control, automated deployment pipelines and container snapshots aid rapid web‑service recovery.
21. Ethical, legal & environmental issues Compliance with data‑protection law, environmental impact of off‑site storage, and ethical handling of personal data during recovery.

5. Types of Disaster Recovery Strategies

5.1 Backup‑Centric Strategies

  • Full backup – a complete copy of all data and system files.
  • Incremental backup – stores only the changes made since the previous backup (full or incremental).
  • Differential backup – stores all changes since the last full backup.
  • Each method impacts RTO, RPO and storage cost; students should calculate trade‑offs.

5.2 Replication Strategies

  • Synchronous replication – data is written to a secondary site simultaneously; provides near‑zero RPO but requires high‑bandwidth, low‑latency links.
  • Asynchronous replication – data is copied with a configurable delay; reduces bandwidth demand but introduces a non‑zero RPO.

5.3 Site‑Based Strategies

  • Cold site – ready‑to‑use facility with power, cooling and networking but no pre‑installed hardware.
  • Warm site – hardware and basic software are installed; data is restored from backups.
  • Hot site – fully operational duplicate of the primary environment, kept in sync continuously (often via synchronous replication).

5.4 Cloud‑Based Strategies

  • IaaS failover – virtual machines are spun up in a cloud region; often combined with automated scripts (Terraform, Cloud‑Formation).
  • SaaS redundancy – critical applications are delivered by a provider that already offers multi‑region replication.
  • Hybrid cloud DR – on‑premises primary site + cloud secondary site for added resilience and cost flexibility.

6. Backup Methods & Media

Choosing the appropriate backup media influences recovery speed, cost and reliability.

Media Capacity Speed (read/write) Cost (relative) Typical Use‑case
Magnetic tape (LTO) 10 TB – 30 TB per cartridge Slow (tens of MB/s) Low Long‑term archival, off‑site storage, compliance archives
External HDD / SSD 1 TB – 8 TB (HDD) / 500 GB – 4 TB (SSD) Fast (100 – 500 MB/s HDD, 500 – 3000 MB/s SSD) Medium Local daily/weekly backups, portable recovery kits
Network‑Attached Storage (NAS) Scalable (up to dozens of TB) Depends on network (1 GbE‑10 GbE) Medium‑High Centralised incremental backups for multiple workstations
Cloud object storage (e.g., Amazon S3, Azure Blob) Virtually unlimited Limited by internet bandwidth; can be accelerated with Direct Connect / ExpressRoute Variable (pay‑as‑you‑go) Off‑site, geographic redundancy, disaster‑proof archiving

7. Developing a Disaster Recovery Plan (DRP)

  1. Business Impact Analysis (BIA) – identify critical systems, data, and the financial impact of downtime.
  2. Define RTO & RPO for each critical service.
  3. Select technologies – backup software, replication tools, hardware, cloud services.
  4. Design recovery architecture – decide on cold, warm, hot or hybrid sites.
  5. Document procedures – step‑by‑step instructions, scripts, contact lists.
  6. Assign roles & responsibilities – DR manager, technical team, communications officer.
  7. Implement monitoring & alerting – health checks, log analysis, automated failover triggers.
  8. Schedule regular testing – tabletop exercises, partial restores, full failover drills.
  9. Review & update – after each test, after major system changes, or when business priorities shift.

8. Practical Classroom Activities

8.1 Spreadsheet DR‑Status Register

Students build a spreadsheet that tracks each backup job and demonstrates all required spreadsheet functions.

  • Columns: System, Backup Type, Last Run, Next Scheduled, Status (OK/Failed), RPO Met?, Owner.
  • Formulas:
    • =IF(TODAY()-B2>RPO,"No","Yes") – flags overdue backups.
    • =VLOOKUP(A2,OwnerTable,2,FALSE) – pulls the responsible person.
  • Data‑validation lists for “Backup Type” (Full, Incremental, Differential).
  • Conditional formatting – red fill for failed status, green for OK.
  • Pivot table & chart – summarise how many systems meet their RPO per month.

8.2 Modelling Cost vs. RTO

Students create a simple linear cost model and perform a what‑if analysis.

Cost = FixedSiteCost + (BandwidthCost × 24 h / RPO) + (StorageCost × DataVolume)
  • Use Excel Solver to minimise Cost while keeping RTO ≤ target.
  • Perform sensitivity testing by varying BandwidthCost and DataVolume.
  • Present results in a scatter plot (RTO on X‑axis, Cost on Y‑axis) and discuss trade‑offs.

8.3 Algorithm Flowchart – DR Decision Process

Students design a flowchart and write the corresponding pseudocode (see section 4). The flowchart should include:

  1. Disaster detection (sensor/alert).
  2. Is severity ≥ activation threshold?
  3. Is primary site reachable?
  4. Can required RPO be met with existing backups?
  5. Select recovery site (cold / warm / hot / cloud).
  6. Execute restoration steps.
  7. Validate services and return to normal.

Standard symbols (oval start/stop, rectangle process, diamond decision) must be used and colour‑coded (e.g., red for “RTO exceeded”).

9. Testing & Maintenance

  • Tabletop exercise – discussion‑based walk‑through; no systems are touched.
  • Partial restore test – restore a representative data set to verify backup integrity and restore procedures.
  • Full failover test – switch operations to the secondary site for a defined period; record actual RTO and RPO.
  • Post‑test review – capture lessons learned, update documentation, adjust RTO/RPO if required.

10. Comparison of Common Strategies

Strategy Typical RTO Typical RPO Cost (relative) Complexity Best Use‑Case
Cold site Hours‑to‑days 24 h + Low Low Non‑critical systems, tight budgets, small organisations.
Warm site Hours 4–12 h Medium Medium Mid‑critical services where moderate downtime is acceptable.
Hot site (synchronous replication) Minutes‑seconds Near‑zero High High Mission‑critical applications (banking, e‑commerce, health‑care).
Cloud IaaS failover Minutes‑hours (depends on bandwidth) 5–30 min Variable (pay‑as‑you‑go) Medium‑High Organisations needing scalability, geographic redundancy, or rapid provisioning.

11. Suggested Diagram – DR Process Flowchart

The diagram should illustrate the sequence below using standard flowchart symbols.

  1. Disaster detection (monitoring alert).
  2. Activate DR plan – notify DR manager.
  3. Impact assessment – determine severity and whether RTO is exceeded.
  4. Choose recovery site (cold, warm, hot, cloud) based on RPO and resource availability.
  5. Execute data restoration (restore from backup, start replication, spin‑up VMs).
  6. Validate services (functional testing, security checks, data integrity verification).
  7. Return to normal operations – de‑activate secondary site, update documentation.

Colour‑code decision diamonds (red = “RTO exceeded”, green = “Proceed”). The flowchart can be created with any diagramming tool (draw.io, Lucidchart) and inserted into student reports.

12. Key Take‑aways

  • Disaster recovery is a specialised subset of business continuity that concentrates on IT systems.
  • The choice of strategy depends on required RTO, RPO, budget, and the criticality of services.
  • Effective DR integrates with many syllabus topics – from data processing and security to project management and emerging technologies.
  • Regular testing, clear documentation and continuous review keep the plan reliable.
  • Combining multiple methods (e.g., local tape backups + cloud IaaS failover) often provides the best balance of cost, speed and resilience.

Create an account or Login to take a Quiz

35 views
0 improvement suggestions

Log in to suggest improvements to this note.