Database Recovery: Point-in-Time vs Full Restore and When to Use Each
You get the call at 2 AM: the production database has a problem. Data has been corrupted, accidentally deleted, or modified by a runaway process. You need to recover. But how?
The answer depends on the nature of the incident, and choosing wrong can cost you hours of additional downtime or permanent data loss.
Full Database Restore
A full restore replaces the entire database with a backup copy. Everything in the database is rolled back to the state captured in the backup.
When to use full restore:
- Complete database corruption or hardware failure
- Ransomware encryption of the database files
- Database server is completely unrecoverable
- You need to migrate to new hardware
Advantages:
- Simple to execute — restore and go
- Well-tested (this is what most teams practice)
- Works regardless of the type of failure
Disadvantages:
- You lose all changes made after the backup was taken
- Recovery time is proportional to database size
- Requires enough storage for the full backup plus the live database during restore
- May require application downtime for the entire duration
Point-in-Time Recovery (PITR)
Point-in-time recovery restores the database to a specific moment by applying transaction logs to a full backup. You restore the full backup, then replay transaction logs up to the exact second you specify.
When to use PITR:
- Accidental data deletion (specific records or tables)
- Application bug that corrupted specific data
- Need to recover to a moment just before an incident
- Runaway process that modified data it shouldn't have
Advantages:
- Minimize data loss by recovering to just before the incident
- Can potentially preserve changes that happened after the incident on unaffected tables
- More precise than full restore
Disadvantages:
- Requires continuous transaction log backups (not just periodic full backups)
- More complex to execute correctly
- Recovery time includes full restore plus log replay
- You need to know the exact time of the incident
The Decision Framework
Ask these questions to determine the right approach:
- Is the entire database affected, or just specific data? Full corruption = full restore. Specific data = consider PITR.
- Do you know when the incident occurred? If yes, PITR to that moment. If no, you may need to restore multiple points and investigate.
- Are transaction logs available and intact? PITR requires unbroken transaction logs from the last full backup to the recovery point. If logs are missing or corrupted, full restore is your only option.
- How much data loss can you tolerate? Full restore loses everything since the last backup. PITR can recover to within seconds of the incident.
- How fast do you need to recover? Full restore of a large database can take hours. PITR adds log replay time on top of that. For very large databases, consider if you can use table-level restore instead.
Best Practices
Enable continuous transaction log shipping for every production database. The cost is minimal compared to the recovery flexibility it provides.
Test both recovery methods quarterly. Full restore tests are common. PITR tests are rare. Test both.
Document the decision criteria so on-call engineers don't have to figure out the right approach at 2 AM under pressure.
Maintain a recovery time estimate for each database at each recovery method. A 500 GB database might take 2 hours for full restore and 3 hours for PITR. Your team needs to know this in advance.
Keep multiple backup generations. If the most recent backup is corrupted, you need the one before it. Maintain at least 7 days of daily backups and 4 weeks of weekly backups.
The right recovery method depends on the situation. Make sure your team knows both and can execute either one under pressure.
Want More Data Protection Insights?
Listen to 300+ episodes of the Data Protection Gumbo podcast
Browse Episodes