BCM

Creating a Recovery Plan: Guide with Template for SMEs

TL;DR
  • A recovery plan describes the concrete steps to restore business operations in a defined sequence after an outage.
  • Recovery Time Objective (RTO) and Recovery Point Objective (RPO) determine how quickly recovery must happen and how much data loss is acceptable.
  • The plan is structured in phases: immediate measures, emergency operations, restoration, and normal operations. Each phase has its own responsible persons and success criteria.
  • Without a Business Impact Analysis, the recovery plan lacks its foundation. The BIA provides the prioritization; the recovery plan implements it operationally.
  • A recovery plan that has never been tested is not a plan. At least once annually, a test run should take place.

Why you need a recovery plan

Friday afternoon, just before the weekend. Your ERP system's central database gives up the ghost. Order processing stands still, production gets no more approvals, shipping does not know what goes where. Your IT team starts frantically trying to restore the system. But in what order? What has priority - the database, the mail server, or the VPN for the field sales team? Who decides? And from when must executive management be informed?

These are precisely the questions a recovery plan answers. It is the operational document that describes how you systematically bring business operations back online after an outage. Not somehow and not by gut feeling, but in a defined sequence with clear responsibilities, time targets, and success criteria.

For mid-market companies, the topic has gained urgency in recent years. NIS2 explicitly requires business continuity management, ISO 27001 demands plans for recovery after security incidents, and every ransomware attack shows what happens when these plans are missing. The reality, however, often looks different: Many companies have backup concepts but no structured recovery plan. They can restore data but do not know in what order systems must be brought up, what dependencies exist, and who does what in each phase.

What exactly is a recovery plan?

A recovery plan (also known as a disaster recovery plan) is a documented process that describes how business operations are restored after a disruption or outage. It belongs to the overarching business continuity management (BCM) and is closely interlinked with the business impact analysis and emergency plans.

The recovery plan differs from other BCM documents through its operational character. While the BIA analyzes which processes are how critical, and the emergency plan governs the initial response, the recovery plan goes one step further: It defines the concrete technical and organizational steps to get from emergency operations back to normal operations.

Delineation from related documents

Document Focus Timing
Business Impact Analysis (BIA) Assessment of business process criticality Before the incident (analysis)
Emergency Plan Immediate measures when a disruption occurs During the incident (response)
Recovery Plan Step-by-step restoration of normal operations After the initial response (recovery)
Emergency Handbook Overarching document with all plans, contacts, procedures Entire lifecycle

The recovery plan picks up where the emergency plan leaves off. The initial response is complete, the immediate danger is contained, and now it is about getting operations running again.

The connection: BIA, emergency plan, and recovery plan

These three documents form a logical chain, and the sequence is crucial. You cannot create a meaningful recovery plan without first having conducted a BIA, because the BIA provides the foundation for all prioritization decisions in recovery.

The BIA provides:

  • Which business processes are critical?
  • Which IT systems and assets support these processes?
  • How long may a process be down at most (RTO)?
  • How much data loss is tolerable (RPO)?
  • What dependencies exist between systems?

The emergency plan governs:

  • Who is notified when?
  • What immediate measures are initiated?
  • How is emergency operations ensured?
  • When is the recovery plan activated?

The recovery plan defines:

  • In what order are systems restored?
  • Who is responsible for each recovery step?
  • What resources are needed?
  • What are the success criteria for each step?
  • When is normal operations considered restored?

If you skip the BIA, your recovery plan is based on assumptions instead of facts. Perhaps you prioritize restoring the mail server while production control is actually more business-critical. A thorough protection needs assessment also helps classify the criticality of individual assets. Or you plan a 48-hour recovery time for the ERP system, although executive management expects it back after 8 hours.

Understanding RTO and RPO

Two metrics are absolutely central to every recovery plan: Recovery Time Objective (RTO) and Recovery Point Objective (RPO). Both come from the BIA but are operationally implemented in the recovery plan.

Recovery Time Objective (RTO)

The RTO specifies how long a system, application, or business process may be down at most before the damage to the company becomes unacceptable. It is the maximum tolerable downtime.

An example: Your ERP system has an RTO of 4 hours. That means the recovery of the ERP system must be completed within 4 hours of the outage occurring. Anything beyond that causes damage that the company has assessed as no longer acceptable - whether through revenue losses, contractual penalties, or reputational damage.

Recovery Point Objective (RPO)

The RPO describes the maximum tolerable data loss, measured in time. It answers the question: What is the latest data state I can fall back to without the damage becoming unacceptable?

An example: Your accounting system has an RPO of 1 hour. That means you may lose at most the data from the last hour. If your last backup is 4 hours old, you do not meet the RPO. The backup frequency must therefore match the RPO.

Additional important metrics

Metric Meaning Example
RTO Maximum downtime ERP: 4 hours
RPO Maximum data loss Accounting: 1 hour
MTD (Maximum Tolerable Downtime) Absolute maximum downtime before existential damage occurs Entire system: 72 hours
WRT (Work Recovery Time) Time for functional testing and data validation after restoration ERP after restore: 2 hours

Important: RTO + WRT must not exceed the MTD. If you need 4 hours for the restore and 2 hours for validation, your actual recovery time is 6 hours. If the MTD is 8 hours, you still have buffer. If it is 4 hours, you have a problem.

Structure of a recovery plan

A good recovery plan follows a clear structure. It must be written so that someone who does not work with the topic daily can follow and execute the steps. Because in an emergency, the responsible IT manager might be on vacation, and their deputy must be able to implement the plan.

1. Cover page and metadata

Every recovery plan starts with the basic information:

  • Document title and version: e.g., "Recovery Plan IT Infrastructure v2.3"
  • Scope: Which systems and processes does the plan cover?
  • Responsible person: Who is the plan owner?
  • Last review: When was the plan last updated?
  • Next review: When is the next revision scheduled?
  • Distribution list: Who has a copy of the plan?

2. Trigger conditions

The plan clearly defines under which circumstances it is activated. Not every disruption requires the recovery plan. A clear trigger definition prevents the plan from being unnecessarily activated for minor incidents while also preventing it from being activated too late during a real emergency.

Typical triggers:

  • Total failure of a business-critical system for more than [defined time]
  • Ransomware attack with encryption of production systems
  • Physical damage to the data center (fire, water, power outage > 4 hours)
  • Failure of the primary data center or cloud provider
  • Decision by the crisis team after assessing the situation

3. Roles and responsibilities

Who does what? The plan must answer this question unambiguously. For every step in the recovery, a responsible person is named, including a deputy.

Role Responsibility Person Deputy
Recovery Lead Overall coordination of recovery IT Manager Deputy IT Manager
Infrastructure Team Network, servers, storage Admin team External service provider
Application Team ERP, databases, business applications Application administrator Software vendor
Communications Information to employees, customers, partners CEO / PR Executive assistant
Business Units Validation of restored systems Department heads Deputies

4. The four phases of recovery

A recovery plan is typically structured in four phases. Each phase has its own goals, activities, and completion criteria.

Phase 1: Immediate measures (0 to 2 hours)

In this phase, the focus is on stabilizing the situation and creating the foundations for recovery.

Activities:

  • Damage assessment: Which systems are affected? What data is available?
  • Initiate emergency operations: Activate manual processes, communicate workarounds
  • Assemble the recovery team and assign tasks
  • Check backup availability: Are backups intact? When was the last successful backup?
  • Activate communication plan: Inform employees, notify customers if needed

Completion criterion: Situation is assessed, team is ready, backup status is known.

Phase 2: Restore infrastructure (2 to 8 hours)

The technical foundation must be in place before applications can be restored. This phase focuses on network, servers, and storage.

Activities:

  • Check and restore network infrastructure (switches, firewalls, DNS, DHCP)
  • Bring up the hypervisor environment / cloud infrastructure
  • Restore storage systems and SAN
  • Restore Active Directory / LDAP (without AD, almost nothing works)
  • Activate monitoring to track progress

Completion criterion: Base infrastructure is functional, VMs can be started.

Phase 3: Restore applications (8 to 24 hours)

Now business applications are restored in the order defined by the BIA.

Activities:

  • Restore databases from backup and perform consistency checks
  • Restore ERP system and test basic functions
  • Restore email system
  • Restore additional applications per priority list
  • Have business units validate: Is the data correct? Do the processes work?
  • Check interfaces between systems

Completion criterion: All critical applications are running, business units have confirmed functionality.

Phase 4: Normalization (24 to 72 hours)

The final phase transitions from limited operations to normal operations.

Activities:

  • Restore non-critical systems
  • Enter accumulated data (orders, bookings that were processed manually during the outage)
  • Complete functional testing of all systems and interfaces
  • Reactivate backup routines and run the first backup
  • Document lessons learned
  • Update the plan if weaknesses were identified

Completion criterion: All systems are running in normal operation, backup routines are active, no open items.

5. Recovery steps per asset

The heart of the recovery plan is the detailed recovery steps for each critical asset. This is where it gets concrete and technical.

In ISMS Lite, you can link recovery steps per asset directly with BIA results and RTO/RPO values, so your recovery plan always stays up to date. For each asset, you document:

  • Asset name and description: e.g., "ERP system (SAP Business One on SQL Server)"
  • Criticality: Taken from the BIA (e.g., High)
  • RTO / RPO: Also from the BIA
  • Dependencies: Which other systems must be running first? (e.g., Active Directory, network, SQL Server)
  • Backup type and location: e.g., "Daily full backup to NAS + weekly offsite backup"
  • Recovery steps: Numbered instructions with concrete commands, paths, and parameters
  • Validation steps: How do you verify that recovery was successful?
  • Responsible person: Who performs the steps?

Sample plan: IT infrastructure of a mid-market company

Let us take a concrete example: A machine builder with 150 employees, a local data center with VMware virtualization, and these critical systems:

Prioritized asset list

Prio Asset RTO RPO Dependencies
1 Network (Firewall, Switches, DNS) 1h - Power, cooling
2 Active Directory / DNS 2h 1h Network
3 VMware vSphere (Hypervisor) 2h - Network, storage
4 Storage (SAN) 2h - Network
5 ERP System (SAP B1 + SQL Server) 4h 1h AD, VMware, storage
6 Email (Exchange Online) 4h 0h Internet, AD
7 File Server 8h 4h AD, VMware, storage
8 Production Control (MES) 8h 1h ERP, network
9 VPN / Remote Access 12h - Firewall, AD
10 Telephony (VoIP) 12h - Network

Recovery steps for the ERP system (Priority 5)

Asset: SAP Business One on Microsoft SQL Server 2022 Criticality: High RTO: 4 hours | RPO: 1 hour Backup: Hourly transaction log backup, daily full backup (Veeam), weekly offsite tape Responsible: Application administrator (Mr. Mueller), Deputy: external SAP partner

Prerequisites (must be completed beforehand):

  • Active Directory is available
  • VMware host is operational
  • Storage is available and datastore mounted
  • Network connectivity between servers is established

Step-by-step recovery:

  1. Restore VM from Veeam backup (Instant VM Recovery for fastest start)
  2. Start VM and check network configuration (IP address, DNS, gateway)
  3. Check SQL Server service - is the service running? Are databases online?
  4. Apply transaction log backups to reach the most current state
  5. Check database integrity: Run DBCC CHECKDB
  6. Start SAP Business One services (SBO-Common, License Server, DI Server)
  7. Open SAP client and test login
  8. Spot-check by business unit: Recent orders, open items, inventory levels
  9. Check interfaces (web shop, production control, shipping)

Validation:

  • At least 3 users can log in and work
  • The last 10 orders before the outage are present
  • Inventory levels match the last known state
  • Interface data is correctly transferred

Estimated duration: 2.5 to 3.5 hours (within RTO of 4 hours)

Recovery steps for Active Directory (Priority 2)

Asset: Active Directory Domain Services (2x Domain Controller, Windows Server 2022) Criticality: Critical (dependency for almost all systems) RTO: 2 hours | RPO: 1 hour Backup: System State Backup daily, VM backup via Veeam Responsible: IT Administrator (Mr. Schmidt), Deputy: IT systems provider

Step-by-step recovery:

  1. Restore primary domain controller from Veeam backup as VM
  2. Start VM, check DSRM mode (Directory Services Restore Mode) if needed
  3. Check DNS service - forward and reverse lookup zones correct?
  4. Test AD replication (if second DC is still available)
  5. If only one DC can be restored: Check FSMO roles and seize if necessary
  6. Perform test login with a domain user on a workstation
  7. Check group policies: Run gpresult on a test workstation

Validation:

  • Domain login works
  • DNS resolution of internal names works
  • FSMO roles are correctly assigned

Estimated duration: 1 to 1.5 hours

Recovery plan template

The following template can serve as a starting point for your own plan. It covers the most important areas and can be adapted to your specific environment.

Section 1: General information

Document title:      Recovery Plan [Area]
Version:             [x.x]
Scope:               [Which systems/processes]
Plan owner:          [Name, function]
Created on:          [Date]
Last review:         [Date]
Next review:         [Date]
Distribution:        [Who has access]

Section 2: Trigger conditions

The recovery plan is activated when:
[] Total failure of a criticality-high system > [X] hours
[] Ransomware attack with encryption of production systems
[] Data center failure (physical or logical)
[] Decision by crisis team
[] [Additional company-specific triggers]

Section 3: Recovery team contact list

| Role               | Name           | Phone (Mobile)   | Email                | Deputy         |
|--------------------|----------------|------------------|----------------------|----------------|
| Recovery Lead      | [Name]         | [Number]         | [Mail]               | [Name]         |
| Infrastructure     | [Name]         | [Number]         | [Mail]               | [Name]         |
| Applications       | [Name]         | [Number]         | [Mail]               | [Name]         |
| Ext. Provider      | [Company/Name] | [Number/Hotline] | [Mail]               | [Alternative]  |

Section 4: Asset recovery profile (per asset)

Asset:              [Name and description]
Criticality:        [Critical / High / Medium / Low]
RTO:                [Hours]
RPO:                [Hours]
Dependencies:       [Which systems must be running first]
Backup type:        [Full/Incremental/Differential, frequency]
Backup location:    [NAS/Tape/Cloud/Offsite]
Responsible:        [Name] | Deputy: [Name]

Recovery steps:
1. [Step with concrete details]
2. [Step with concrete details]
3. ...

Validation:
[] [Checkpoint 1]
[] [Checkpoint 2]
[] [Checkpoint 3]

Estimated duration: [X] hours

Section 5: Communication plan

| Timing              | Recipient            | Channel       | Content                              | Responsible    |
|---------------------|----------------------|---------------|--------------------------------------|----------------|
| Immediately         | Recovery team        | Phone/SMS     | Activate recovery plan               | Recovery Lead  |
| Within 1h           | Executive mgmt       | Phone         | Situation report, estimated duration | Recovery Lead  |
| Within 2h           | All employees        | SMS/Intranet  | Status and workaround tips           | Communications |
| As needed           | Customers/Partners   | Email/Phone   | Info about limitations               | CEO/Sales      |
| After recovery      | All stakeholders     | Email         | Systems available again              | Communications |

Common mistakes in recovery plans

From consulting practice and real incidents, certain patterns emerge that repeatedly lead to problems. These mistakes cost hours or days in an emergency - time you do not have.

Mistake 1: The plan exists but nobody knows it

A 40-page PDF sits somewhere on the file server that is currently unreachable. The IT department created the plan two years ago but neither presented it to the recovery team nor tested it in an exercise. In an emergency, everyone improvises.

Solution: Maintain the distribution list, print the plan physically and store it at a defined location (e.g., in the safe or with the managing director). Conduct a tabletop exercise at least once annually where the team walks through the plan.

Mistake 2: RTO and RPO are not aligned with the business

IT defines an RTO of 24 hours for the ERP system because that seems realistic. Executive management assumes everything will be running again after 4 hours. This discrepancy only surfaces in an emergency - and then it is too late for discussions.

Solution: RTO and RPO must be determined together with business units and executive management. The BIA is the right framework for this. IT provides the technical assessment of what is feasible, and management decides what risk level is acceptable.

Mistake 3: Dependencies are ignored

The plan calls for restoring the ERP system first because it has the highest business priority. But the ERP needs Active Directory for authentication, a running SQL Server, and network connectivity. Without these prerequisites, the ERP will not start, no matter how quickly you run the restore.

Solution: Document dependencies for every asset. The restoration sequence must follow these dependencies: first infrastructure, then platforms, then applications.

Mistake 4: The plan is too vague

"Restore server from backup" is not a recovery instruction. Which server? From which backup? Where is the backup? What software is needed for the restore? What credentials are required?

Solution: Document recovery steps in enough detail that a competent IT employee who does not administer the system daily can perform the restore. Concrete paths, commands, parameters, and credentials (reference to password manager) belong in the plan. Clean IT documentation is a prerequisite for this.

Mistake 5: No testing, no updates

The plan was created three years ago. Since then, the IT infrastructure has fundamentally changed: new hypervisor, different backup tool, different network structure. The plan describes an environment that no longer exists.

Solution: Review the recovery plan at least annually and update it with every significant change to IT infrastructure. Conduct a test run at least annually - whether as a tabletop exercise or as a technical restore test.

Creating your recovery plan in 7 steps

If you do not yet have a recovery plan, here is the pragmatic path for mid-market companies:

Step 1: Conduct BIA (or use results) Without a BIA, no meaningful recovery plan. If you do not have a BIA yet, start there. If you already have one, use the results as input.

Step 2: Identify and prioritize critical assets List all IT systems needed for the critical business processes identified in the BIA. Assign them the RTO/RPO values from the BIA.

Step 3: Map dependencies Draw the dependencies between systems. What must run first for other systems to function? This yields the restoration sequence.

Step 4: Document recovery steps per asset For each critical asset: How is it restored? What backups exist? How long does the restore take? Who performs it? Write the steps so a competent deputy can execute them.

Step 5: Define roles and responsibilities Who is the recovery lead? Who handles infrastructure, who handles applications? Who communicates externally? For each role, you need a deputy.

Step 6: Create the communication plan Who is informed when about what? Employees, executive management, customers, partners, possibly authorities (NIS2 reporting obligation).

Step 7: Test and iterate Conduct an initial test run. This does not have to be a full technical test - a tabletop exercise is sufficient to start. Document the findings and improve the plan.

Keeping the plan accessible

A recovery plan that exists only digitally on the file server may not be accessible in an emergency - precisely because the file server may be part of the outage. Therefore the rule: The plan must be available through at least two independent channels.

Recommended availability:

  • Printed: A current version in the safe or at the IT manager's workstation. Sounds old-fashioned but works when everything else fails.
  • Cloud storage: A copy in a cloud service independent of your own infrastructure (e.g., SharePoint Online, Google Drive, or encrypted cloud storage).
  • Mobile: PDF on the company phone of the recovery lead and their deputy.
  • At the service provider: The external IT service provider should also have a current version.

This also includes ensuring that the contact details in the plan are current. If Mr. Mueller has not been with the company for three months and his name still appears as the person responsible for the ERP restore, even the best plan will not help.

PDF export and versioning

A recovery plan lives and changes. Every change to the IT infrastructure can affect the plan. Therefore, clean versioning is essential.

Best practices for versioning:

  • Version number in the document (e.g., v1.0, v1.1, v2.0)
  • Change history with date, author, and type of change
  • Major versions for fundamental changes (new system, new infrastructure)
  • Minor versions for updates (new phone numbers, adjusted recovery times)

PDF export: Export the plan regularly as PDF for offline availability. The PDF should contain everything needed for recovery - no references to other documents that may not be accessible in an emergency. Appendices such as network diagrams, IP address lists, and credentials (encrypted) can be attached as annexes.

In ISMS Lite, you can generate recovery plans directly from BIA results. RTO/RPO values, dependencies, and responsible persons are automatically adopted, and the PDF export produces a print-ready document that you can distribute to relevant parties.

Connection with NIS2 and ISO 27001

If your company falls under NIS2 or is pursuing ISO 27001 certification, the recovery plan is not an optional document. Both frameworks explicitly require it.

NIS2 (Article 21, Paragraph 2c): Requires "business continuity, such as backup management and disaster recovery, and crisis management." A documented recovery plan is the evidence that you are meeting this requirement.

ISO 27001 (Annex A, Controls A.5.29 and A.5.30): The information security continuity controls require that recovery plans are created, implemented, and tested.

In both cases, it is not sufficient to simply have the plan. You must be able to demonstrate that it is regularly reviewed and tested. Auditors specifically ask for test protocols and the improvements derived from them.

Next steps

You now have a comprehensive overview of the structure, content, and creation of a recovery plan. The key points:

  1. Start with the BIA - without it, the foundation for all prioritization decisions is missing.
  2. Document recovery steps concretely - vague instructions help nobody in an emergency.
  3. Account for dependencies - the restoration sequence is not a wish list.
  4. Test the plan - an untested plan is a dangerous illusion of security.
  5. Keep it current - an outdated plan causes more harm in an emergency than having none at all.

Further reading

If you do not yet have the BIA as a foundation, read the business impact analysis article as the next step. If you are looking for a way to regularly test the plan, the tabletop exercise article will help you further.

Recovery planning made easy

ISMS Lite links your BIA results directly to recovery plans. Assets, RTO/RPO, and responsible persons in one place - including PDF export.

Install now