BCM

Backup Strategy and Restore Tests: Because Backups Alone Are Not Enough

TL;DR
  • The 3-2-1 rule forms the foundation: 3 copies, on 2 different media types, with 1 at an offsite location.
  • Immutable backups protect against ransomware. Attackers can encrypt data, but they cannot delete or tamper with immutable backups.
  • Restore tests are mandatory, not optional. A backup that has never been tested is an assumption, not protection. Test at least quarterly.
  • Test not only individual files but also complete system restores and bare-metal recovery. A real emergency is rarely a single deleted file.
  • Restore tests validate your RPO targets from the BIA. If the restore takes 8 hours and the RTO is 4 hours, you have a problem.

The Most Expensive Misconception in IT Security

"We have backups" is one of the most dangerous sentences in IT. Not because it is wrong, but because it conveys a sense of security that may be built on sand.

A backup is only worth as much as the restore that works from it. A backup that has not been verified for recoverability in two years is not a safety net. It is a hope. And hope is not a strategy — certainly not when ransomware has encrypted the file server and management is asking how quickly everything will be running again.

The statistics paint a sobering picture. Studies regularly show that between 30 and 40 percent of all restore attempts fail — whether due to damaged backup media, inconsistent data, outdated backup software, or simply missing documentation of the restore process. In many cases, the error is only discovered when the backup is urgently needed.

On top of this comes a threat landscape that undermines classic backup concepts. Modern ransomware variants encrypt not only production data but specifically seek out backup repositories to destroy them as well. If your backups reside on network storage that the ransomware can reach, you may have no usable backup at all.

This article covers both sides: the backup strategy (what, how often, where) and the restore tests (does the restore work, and does it meet your requirements).

The 3-2-1 Rule: The Foundation

The 3-2-1 rule has been the gold standard for backup strategies for decades. It is simple, memorable, and covers the most common failure scenarios.

The Three Numbers

3 copies of your data: The original plus two backup copies. If one copy is damaged, you still have a second. The probability that three copies fail simultaneously — with independent storage — is negligibly low.

2 different media types: Backups are stored on at least two different storage media. For example: hard drive and tape, local storage and cloud, NAS and USB hard drive. The reason: if one media type has a systematic defect (e.g., a faulty firmware version across all identical NAS devices), not all copies are affected.

1 copy at an offsite location: At least one backup copy is physically separated from the production system. This protects against local disasters: fire in the server room, burst water pipe, break-in, or ransomware that encrypts all network-accessible storage.

3-2-1-1: The Extended Rule for the Ransomware Era

The classic 3-2-1 rule has one weakness: if all backup copies are online and reachable from the network, ransomware can theoretically reach all three. That is why the rule is often extended today with a fourth digit:

3-2-1-1: The additional 1 stands for an offline or immutable copy. At least one backup copy is either physically offline (e.g., a tape or USB hard drive that is not permanently connected) or stored immutably (immutable backup).

Implementation Example for Mid-Market Companies

Copy Medium Location Protection
Copy 1 (Production data) SAN / local storage Server room RAID, UPS
Copy 2 (Primary backup) NAS / backup storage Server room (separate fire compartment) Daily backup via Veeam
Copy 3 (Offsite backup) Cloud storage (S3/Azure Blob) Cloud provider data center Encrypted, immutable
Copy 4 (Offline) USB hard drive / RDX medium Safe outside the building Weekly rotation, physically separated

Understanding Backup Types

Not every backup is the same. The three common backup types differ in speed, storage requirements, and restore complexity.

Full Backup

A full backup saves all data completely. Each backup is self-contained and can be restored independently of other backups.

Advantages:

  • Simplest restore: you only need a single backup set
  • Independent of previous backups
  • Fastest recovery

Disadvantages:

  • Highest storage requirement: each backup contains all data
  • Longest backup duration
  • May not be feasible overnight with large data volumes

Typical use: Weekly (e.g., on weekends when more time is available for the backup)

Incremental Backup

An incremental backup saves only the data that has changed since the last backup (whether full or incremental).

Advantages:

  • Lowest storage requirement per backup
  • Fastest backup duration
  • Ideal for daily or hourly backups

Disadvantages:

  • Restore requires the last full backup plus all subsequent incrementals
  • The more incrementals in the chain, the slower and more error-prone the restore
  • If one incremental in the chain is damaged, all subsequent ones are unusable

Typical use: Daily or more frequently, between full backups

Differential Backup

A differential backup saves all data that has changed since the last full backup. Unlike an incremental, it grows with each day because it always contains the entire difference from the last full backup.

Advantages:

  • Restore requires only the last full backup plus the last differential
  • Independent of the chain of daily backups
  • Good compromise between backup speed and restore simplicity

Disadvantages:

  • Growing storage requirement until the next full backup
  • Longer backup duration than incremental (but shorter than full)

Typical use: Daily, in combination with a weekly full backup

Comparison at a Glance

Criterion Full Incremental Differential
Backup duration Long Short Medium
Storage requirement High Low Medium (growing)
Restore duration Fast Slow (chain) Medium
Restore complexity Simple Complex Simple
Dependency on other backups None Chain of all incrementals Only the last full

Common Combination Strategy

Most mid-market companies use a combination:

  • Weekly: Full backup (e.g., Sunday night)
  • Daily: Incremental or differential backup (e.g., at 2:00 AM)
  • Hourly (for critical systems): Transaction log backup or incremental snapshot

Which combination suits you depends on your RPO. You define the appropriate RPO values in the business impact analysis. If the RPO for accounting is 1 hour, you need at least hourly backups. If the RPO for the internal wiki is 24 hours, a daily backup suffices.

Retention Periods

How long you retain backups is not a purely technical question. Legal requirements, regulatory mandates, and your own protection needs determine the retention policy.

Legal Requirements (Germany)

Regulation Retention Period Affected Data
HGB (§ 257) 10 years Commercial books, inventories, annual financial statements, accounting records
AO (§ 147) 10 years Books, records, accounting vouchers
AO (§ 147) 6 years Received and sent business correspondence
DSGVO (GDPR) Only as long as necessary Personal data (observe deletion obligations!)
GoBD 10 years Tax-relevant electronic documents

Recommended Backup Retention

Backup Type Retention Rationale
Daily backup 30 days Protection against accidental deletion, timely recovery
Weekly full backup 3 months Medium-term recovery, detecting gradual problems
Monthly full backup 1 year Long-term recovery, compliance
Annual full backup 10 years Legal retention requirements (HGB, AO)

GDPR Tension

The GDPR requires personal data to be deleted when the processing purpose ceases. This can conflict with long backup retention periods. If a customer requests deletion of their data and you have a 10-year-old annual backup containing their data, you must document this and re-execute the deletion in case of a restore. Complete deletion from backups is technically not feasible in most cases.

The pragmatic solution: document in your deletion concept that backups are exempt from immediate deletion, and ensure that deletion requirements are re-applied in case of a restore.

Immutable Backups: Protection Against Ransomware

Immutable backups are the most important advancement in backup strategy for the ransomware era. The principle: once written, backup data cannot be modified or deleted for a defined period. Not even by an administrator, and not even by ransomware that has gained admin rights.

Why Classic Backups Can Fail Against Ransomware

Modern ransomware groups know that companies can restore from backups. That is why the backup itself has become an attack target. Typical attack patterns:

  1. Deleting backup repositories: The attackers gain access to the backup server and delete all backups before starting encryption of the production systems.
  2. Uninstalling backup agents: The backup software is disabled or uninstalled so that no new backups are created.
  3. Encrypting backups: The backup files themselves are encrypted, just like production data.
  4. Delayed attack: The attackers remain in the network for weeks or months and wait until they have compromised the older backup generations as well.

How Immutable Backups Work

Immutable backups use mechanisms that protect the backup against any modification for a defined period after writing:

Object Lock (S3-compatible cloud storage): Amazon S3, Azure Blob Storage, and compatible on-premises solutions offer Object Lock in WORM mode (Write Once, Read Many). Once-written objects cannot be deleted or overwritten until the retention period expires — not even by the account owner. A detailed setup guide can be found in the article on immutable backups with S3 Object Lock.

Hardened Linux Repository (Veeam): Veeam offers a hardened repository on a Linux server where immutability flags are set at the file system level (xattr). Even with root access, the backup files cannot be deleted while the retention period is active.

Tape / Air Gap: The classic variant: a tape that is removed from the drive after writing and locked in a safe is immutable and air-gapped by definition. No network attack in the world can encrypt a tape in a safe.

Implementation Recommendation

Approach Complexity Cost Protection
Offline medium (tape, USB) Low Low High (manual effort)
Cloud Object Lock (S3/Azure) Medium Medium Very high
Hardened Linux Repository Medium Low-Medium Very high
Combination of cloud + offline Medium Medium Very high

For mid-market companies, a combination is recommended: primary backup on local storage for fast restores, a replicated backup to the cloud with Object Lock for ransomware protection, and a weekly offline medium as the last line of defense.

Restore Tests: Why and How Often

Here we come to the core of this article. Creating backups is one half. Ensuring they work in an emergency is the other — and the one that gets neglected far more often.

Why Restore Tests Are Indispensable

Technical reasons:

  • Backup media can be physically damaged (bit rot, defective sectors)
  • Backup software can create faulty backups without reporting an error
  • Backup consistency (especially for databases) is not a given
  • After an update to the backup software or operating system, the restore may function differently

Organizational reasons:

  • Does the team know how to perform a restore? Is documentation available?
  • How long does a restore actually take? Does it match the RTO from the BIA?
  • Are the necessary credentials (encryption keys, passwords) available?
  • Does the restore work even when the primary responsible person is unavailable?

Regulatory reasons:

  • NIS2 requires that BCM measures (including backup) are regularly tested
  • ISO 27001 (A.8.13) requires that backup copies are regularly tested
  • Auditors specifically ask for test protocols for restore tests

Recommended Test Frequency

Test Type Frequency Effort
Restore individual files/folders Monthly 30 minutes
Restore a complete VM or server Quarterly 2-4 hours
Database recovery with consistency check Quarterly 2-4 hours
Full disaster recovery (all critical systems) Annually 1-2 days
Bare-metal recovery on different hardware Annually 4-8 hours

What and How to Test

Restore tests have different levels. Start with the simple tests and increase the complexity.

Level 1: Individual Files and Folders

What is tested: Can individual files be restored from the backup? How: Randomly select 5-10 files of various ages from the backup. Restore them to an alternative storage location. Check whether the files are complete and readable. Duration: 15-30 minutes Success criterion: All selected files are fully recoverable and content is correct.

This is the simplest test, and it should run monthly. It reveals problems with the backup media and ensures that basic backup functionality is working. What it does not verify: whether a complete system can be restored and how long that takes.

Level 2: Complete VM or Server Restore

What is tested: Can a complete server or VM be restored from backup and started? How: Select a non-production server or test VM. Restore it from the most recent backup in an isolated environment. Start the VM and verify that the operating system boots and applications start. Duration: 2-4 hours (depending on data volume and infrastructure) Success criterion: The VM starts, the operating system is functional, the application is reachable.

Important: Restore the VM in an isolated network to avoid conflicts with the production system (duplicate IP addresses, AD conflicts, etc.).

Level 3: Database Recovery with Consistency Check

What is tested: Can a database be consistently restored, including transaction log replay? How: Restore the database (e.g., SQL Server, PostgreSQL) from backup in a test environment. Replay the transaction logs. Run a consistency check (e.g., DBCC CHECKDB for SQL Server). Spot-check data records. Duration: 2-4 hours Success criterion: Database is consistent, transaction logs were successfully replayed, spot checks show correct data.

This test is especially important for ERP systems and other business-critical applications whose databases are brought up to date through transaction log backups.

Level 4: Full Disaster Recovery

What is tested: Can all critical systems be restored in the correct order, and how long does the entire recovery take? How: Simulate a total failure. Restore all critical systems according to the recovery plan: first infrastructure (AD, DNS), then platforms (hypervisor, storage), then applications (ERP, email). Measure the time for each step. Duration: 1-2 days (can also be scheduled over a weekend) Success criterion: All critical systems are restored within the defined RTO and are functional.

The full disaster recovery test is the most informative but also the most resource-intensive. It validates not only backup integrity but also the recovery plan, dependencies between systems, and actual recovery times. You should run this test at least once a year.

Level 5: Bare-Metal Recovery on Different Hardware

What is tested: Can a system be restored on hardware that differs from the original? How: Restore a server from backup on a different physical server or in a cloud environment. Check whether drivers, network configuration, and applications work. Duration: 4-8 hours Success criterion: The system boots on the alternative hardware and is functional.

This test is relevant because in a real emergency, identical hardware may not be available. A detailed guide can be found in the article on bare-metal recovery. If the server is physically destroyed (fire, water), you need to be able to restore onto whatever is available.

RPO Validation Through Restore Tests

An often-overlooked aspect: restore tests validate not only whether the backup works but also whether the recovery times match the requirements from the BIA.

RTO Validation

Measure the actual recovery time with every restore test and compare it to the RTO from the BIA.

Example:

Asset RTO (BIA) Actual Restore Time Status
Active Directory 2h 1.5h OK
ERP (SAP B1) 4h 3h OK
File server (2 TB) 8h 12h Exceeded
MES system 4h 6h Exceeded

If the actual restore time exceeds the RTO, you have three options:

  1. Adjust the RTO (with management approval and assessment of the consequences)
  2. Improve the recovery infrastructure (faster storage, instant VM recovery, prepared standby systems)
  3. Reduce data volume (archiving, offloading old data)

RPO Validation

With every restore test, check what point-in-time you can actually recover to.

Questions for RPO validation:

  • When was the last successful backup? Does it match the planned backup frequency?
  • If transaction log backups are used: can they be successfully replayed?
  • How much data is lost between the last backup and the incident?
  • Does the actual data loss match the RPO from the BIA?

Example: Your ERP has an RPO of 1 hour. You make hourly transaction log backups. During the restore test, you find that the last transaction log backup was successful and you can restore to the state from 45 minutes ago. The RPO is met.

However, if you discover that the transaction log backups for the past three days were faulty (because the disk was full and no one noticed the warning), you have a serious problem: your actual RPO is 72 hours instead of 1 hour.

Documenting Restore Tests

Documentation serves two purposes: it is the evidence for auditors (NIS2, ISO 27001), and it is the basis for continuously improving your backup strategy. In ISMS Lite, you can document restore tests directly, compare them with the RPO values from the BIA, and set automatic reminders for upcoming tests.

Restore Test Protocol

For each test, you document:

RESTORE TEST PROTOCOL

Date:               [DD.MM.YYYY]
Conducted by:       [Name]
Reviewed by:        [Name]

Tested asset:       [Name and description]
Backup source:      [Which backup was used? Date/time of the backup]
Backup type:        [Full/Incremental/Differential/Transaction Log]
Restore target:     [Where was the restore performed? Production/test/isolated?]

TEST RESULT

Restore successful:          [ ] Yes    [ ] No    [ ] Partially
Restore duration:            [HH:MM]
Recovered data point:        [Date/time]

Consistency check:           [ ] Passed    [ ] Failed
Functional test:             [ ] Passed    [ ] Failed
Spot checks:                 [ ] Correct   [ ] Deviations found

RTO met:                     [ ] Yes (RTO: [X]h, Actual: [Y]h)    [ ] No
RPO met:                     [ ] Yes (RPO: [X]h, Data loss: [Y]h)  [ ] No

REMARKS / DEVIATIONS
[Free text: What was noticed? Problems? Improvement suggestions?]

MEASURES
[If problems were found: What measures are being initiated?]

Signature:          [Conductor]
Date:               [DD.MM.YYYY]

Test Matrix: Annual Plan

Create an annual plan that covers all critical assets and planned tests:

Asset Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
ERP (Database) VM DB VM DR
Active Directory VM VM DR
File server Fil Fil Fil Fil DR
MES system VM VM DR
Mail server Fil Fil Fil DR

Legend: Fil = File restore, VM = VM restore, DB = Database restore, DR = Disaster recovery test

Common Problems with Restore Tests

Problem 1: No Isolated Test Environment

If you restore a server from backup into the production network, you risk IP conflicts, AD replication issues, and in the worst case, impairment of ongoing operations. Restore tests belong in an isolated network (separate VLAN, no connection to the production network).

Problem 2: Encryption Keys Not Available

Encrypted backups are useless if the encryption key has been lost. Store the keys at a separate location — ideally in a password manager and additionally in a sealed envelope in a safe. With every restore test, verify that the keys are available and functional.

Problem 3: Backup Software Version

The backup software was updated, but the restore catalog is incompatible. Or the backup was created with an older version that is no longer installed on the current backup server. With every backup software update, verify that older backups can still be restored.

Problem 4: Incomplete Backup

The backup reports "completed successfully," but individual files were skipped (open files, permission issues, paths too long). These warnings get buried in daily operations. During the restore test, it becomes apparent that certain data is missing.

Solution: Check daily backup reports not only for "success/failure" but also for warnings and skipped files. Automate this check if possible.

Problem 5: Backup Duration Doesn't Fit the Backup Window

The daily full backup takes 10 hours, but the backup window (overnight, when systems have low load) is only 6 hours. The backup is aborted every night and is incomplete. In the restore test, it becomes apparent: the last complete backup is three weeks old.

Solution: Adapt the backup strategy to the available window. Incremental backups instead of full backups during the week, full backup only on weekends. Or speed up the backup infrastructure (faster network, faster storage, deduplication).

Backup Monitoring and Alerting

Tests are one part. The other is daily monitoring to ensure that backups are actually running.

What You Should Monitor

Metric Threshold Action on Breach
Backup job: success/failure Any failure Immediate notification, fix the error
Backup duration > 150% of normal duration Investigate cause (more data? Performance issue?)
Storage consumption backup storage > 80% capacity Expand capacity or clean up old backups
Last successful backup > 24 hours (for daily backups) Immediate escalation
Skipped files > 0 Investigate cause and fix
Replication to offsite/cloud Failed Immediate notification

Who Receives the Alerts?

Backup alerts must not get buried in an overflowing inbox. Clearly define who receives the alerts and who is responsible for responding to failures.

A proven approach: automatic alerts via email and SMS to the responsible administrator, daily backup report via email to the IT manager, weekly summary report to IT management with success rate and open issues.

Backup Strategy and Regulation

NIS2

NIS2 explicitly requires in Article 21, Paragraph 2c "backup management and disaster recovery" as one of the minimum measures. This means:

  • Backups must exist and be documented
  • The backup strategy must match the protection needs of the data
  • Restore processes must be tested
  • In an audit, you must provide evidence (backup concept, test protocols)

ISO 27001

ISO 27001, Annex A, Control A.8.13 requires: "Backup copies of information, software, and system images shall be maintained and regularly tested in accordance with an agreed-upon backup policy."

The "regularly tested" is the crucial point. The control requires not only backups but evidence that they work. Auditors ask for restore test protocols and want to see that tests are conducted systematically and regularly. Those hosting their own ISMS face particular requirements for the backup strategy for self-hosted compliance systems.

Next Steps

A solid backup strategy and regular restore tests are not a luxury but a basic requirement for any business continuity management. The key points summarized:

  1. Implement 3-2-1-1 — three copies, two media types, one offsite, one immutable. Check where your current strategy has gaps.
  2. Introduce immutable backups — if you haven't already, this is the single most important measure against ransomware.
  3. Create a restore test calendar — plan tests at various levels (files, VMs, databases, full DR) distributed throughout the year.
  4. Align RPO from the BIA — do backup frequency and actual data loss during restore match the RPO?
  5. Set up monitoring — daily monitoring of backup jobs, alerting on failures.
  6. Document — backup concept, restore test protocols, and action plan. You need this for internal improvement and for every auditor.

Further Reading

If you have not yet conducted the BIA that provides your RPO values, start with the article on business impact analysis. And if you want to test your entire emergency process, the article on tabletop exercises will help.

Document your backup strategy

ISMS Lite links your backup strategy with the RPO values from the BIA and reminds you of upcoming restore tests. Documentation, evidence, and audit trail included.

Install now