Disaster Recovery Plan
Document ID: BCP-DRP-001 | Version: 1.0 | Effective: January 2026
1. Purpose
Establishes procedures for recovering Keshless systems and data following a disaster or major service disruption.
Scope: PostgreSQL database, GCP Cloud Storage, API secrets, Cloud Run services
2. Recovery Objectives
Recovery Time Objective (RTO)
| Category | RTO | Systems |
|---|---|---|
| Critical | 2 hours | Transaction processing, authentication |
| High | 4 hours | Dashboard, KYC processing |
| Medium | 8 hours | Reporting, analytics |
| Low | 24 hours | Non-essential features |
Recovery Point Objective (RPO)
| Data Type | RPO | Backup Schedule |
|---|---|---|
| PostgreSQL | 24 hours | Daily at 2:00 AM UTC |
| GCS Documents | 24 hours | Daily at 3:00 AM UTC |
| Secrets | 7 days | Weekly on Sundays at 4:00 AM UTC |
3. Backup Infrastructure
Backup Schedule
| Job | Schedule | Description |
|---|---|---|
| PostgreSQL Daily | Daily 2:00 AM | Full database export |
| PostgreSQL Monthly | 1st of month 2:00 AM | Long-term retention |
| GCS Sync | Daily 3:00 AM | Document storage sync |
| Secrets Backup | Sundays 4:00 AM | Encrypted config backup |
| Cleanup | Mondays 5:00 AM | Remove expired backups |
Retention Policy
| Backup Type | Retention |
|---|---|
| PostgreSQL Daily | 2 years |
| PostgreSQL Monthly | 5 years |
| Secrets Weekly | 2 years |
| GCS Sync Logs | 90 days |
Storage Location
GCP Cloud Storage: keshless-backups bucket
keshless-backups/
├── postgresql/daily/YYYY-MM-DD/
├── postgresql/monthly/YYYY-MM/
├── documents/
└── secrets/weekly/4. Disaster Scenarios
| Scenario | Category | Recovery Approach |
|---|---|---|
| Single collection corruption | Minor | Restore specific collection |
| Database corruption | Major | Full PostgreSQL restore |
| KYC documents deleted | Major | Cloud Storage restore |
| API secrets compromised | Critical | Secrets restore + rotation |
| Complete data center failure | Catastrophic | Full system recovery |
| Ransomware attack | Catastrophic | Clean restore + forensics |
5. Recovery Procedures
PostgreSQL Recovery
| Command | Description |
|---|---|
--list | List available backups |
--date YYYY-MM-DD | Restore from specific date |
--collection NAME | Restore single collection |
--dry-run | Preview without changes |
--drop | Drop existing before restore (DANGER) |
--monthly | Use monthly backup |
Always run --dry-run first.
GCS Document Recovery
| Command | Description |
|---|---|
--list | List available backups |
--folder NAME | Restore specific folder |
--all | Restore all folders |
--key PATH | Restore specific file |
--overwrite | Overwrite existing files |
Secrets Recovery
- Download encrypted backup from Cloud Storage
- Decrypt using
SECRETS_ENCRYPTION_KEY(stored in password manager) - Restore to Cloud Run environment
- Rotate any compromised secrets
CRITICAL
SECRETS_ENCRYPTION_KEY must be stored in a secure password manager. Without it, secrets cannot be decrypted.
6. Recovery Time Estimates
| Recovery Type | Estimated Time |
|---|---|
| Single collection | 5-15 minutes |
| Full PostgreSQL | 30-60 minutes |
| Single GCS folder | 15-30 minutes |
| Full GCS restore | 1-2 hours |
| Secrets restore | 15 minutes |
| Full system | 2-4 hours |
7. Full Disaster Recovery Sequence
- ASSESS - Determine damage extent, identify latest backup
- PREPARE - Access backup bucket, verify integrity
- RESTORE DATABASE - List, dry-run, execute
- RESTORE DOCUMENTS - Priority: kyc, selfies, vendor-kyc
- RESTORE SECRETS - Decrypt and update environment
- VERIFY - Health checks, test transactions
- RESUME - Deactivate emergency controls, monitor
Post-Recovery Actions
- [ ] Document incident timeline and actions
- [ ] Verify data integrity against manifest
- [ ] Notify regulators if required (within 72 hours)
- [ ] Conduct post-mortem within 5 days
- [ ] Update procedures based on lessons learned
8. DR Testing
| Test Type | Frequency |
|---|---|
| Backup verification | Weekly (automated) |
| Single collection restore | Monthly |
| Full restore drill | Quarterly |
| Full DR simulation | Annually |
9. Roles and Responsibilities
| Role | Responsibilities |
|---|---|
| DR Coordinator | Overall coordination (CTO) |
| Database Admin | PostgreSQL recovery |
| Infrastructure Lead | Cloud resources, secrets |
| Compliance Officer | Regulatory notification |
| Communications | Stakeholder updates |
10. Infrastructure Reference
GCP Resources
| Resource | Value |
|---|---|
| Project | contracts-470406 |
| Region | europe-west1 |
| Cloud SQL | eneza-40ab5:europe-west1:eneza-postgres |
| Cloud Run | keshless-api |
Storage Buckets
| Bucket | Purpose |
|---|---|
| keshless-documents | Primary documents |
| keshless-backups | Backup storage |
Quick Reference
┌─────────────────────────────────────────────────┐
│ DISASTER RECOVERY QUICK STEPS │
├─────────────────────────────────────────────────┤
│ 1. Activate emergency control (if needed) │
│ 2. List backups: restore-postgresql.ts --list │
│ 3. Dry run: --date YYYY-MM-DD --dry-run │
│ 4. Execute: --date YYYY-MM-DD [--drop] │
│ 5. Verify document counts │
│ 6. Test authentication and transactions │
│ 7. Deactivate emergency control │
└─────────────────────────────────────────────────┘Document Control: Version 1.0 | January 2026 | IT Operations Team