Disaster Recovery
This guide covers procedures for recovering Keshless data from backups in case of data loss, corruption, or system failure.
Quick Reference
| Data Type | Backup Location | Restore Method |
|---|---|---|
| PostgreSQL | gs://keshless-backups/postgresql/ | pg_restore or psql |
| Documents | gs://keshless-documents/ | GCS copy |
| Secrets | gs://keshless-backups/secrets/ | Manual decryption |
Prerequisites
Before running any restore:
| Requirement | Description |
|---|---|
| GCS Access | Cloud Run service account credentials |
| Environment | .env with correct values |
| Database Access | Valid DATABASE_URL connection string |
| Encryption Key | SECRETS_ENCRYPTION_KEY for secrets restore |
PostgreSQL Restore
List Available Backups
bash
# List daily backups
gcloud storage ls gs://keshless-backups/postgresql/daily/
# List monthly backups
gcloud storage ls gs://keshless-backups/postgresql/monthly/
# Check specific backup manifest
gcloud storage cat gs://keshless-backups/postgresql/daily/{date}/manifest.jsonDownload and Restore
bash
# Download backup
gcloud storage cp gs://keshless-backups/postgresql/daily/{date}/keshless-daily.sql.gz ./
# Extract
gunzip keshless-daily.sql.gz
# Start Cloud SQL proxy
/tmp/cloud-sql-proxy contracts-470406:europe-west1:keshless-postgres --port=5442 &
# Restore to database
psql -h 127.0.0.1 -p 5442 -U keshless_admin -d keshless_prod < keshless-daily.sqlRestore Options
| Scenario | Command |
|---|---|
| Full restore | psql -d keshless_prod < backup.sql |
| Single table | pg_restore -t tablename -d keshless_prod backup.sql |
| Clean restore | Drop schema first, then restore |
Restore Notes
| Note | Description |
|---|---|
| Duplicate keys | Restoring to existing DB fails on duplicates |
| Clean restore | Drop and recreate schema first |
| Selective restore | Use pg_restore for specific tables |
Document Restore
GCS Commands
bash
# List documents
gcloud storage ls gs://keshless-documents/{folder}/
# Download single file
gcloud storage cp gs://keshless-documents/{folder}/{file} ./
# Restore entire folder
gcloud storage cp -r gs://keshless-documents/{folder}/ ./restored-{folder}/Document Folders
| Folder | Contents |
|---|---|
kyc/ | User KYC documents (ID cards, passports) |
selfies/ | User verification selfies |
vendor-kyc/ | Vendor verification documents |
vendor-media/ | Vendor logos and media |
Secrets Restore
Download Encrypted Backup
bash
gcloud storage cp gs://keshless-backups/secrets/latest.json.encrypted ./Decryption Process
- Load encryption key from password manager
- Run decryption script with key
- Output decrypted secrets to file
- Manually restore to Cloud Run /
.envfiles
Decryption Parameters
| Parameter | Value |
|---|---|
| Algorithm | AES-256-GCM |
| Key Derivation | SHA-256 hash of encryption key |
| IV | Base64 encoded in backup file |
| Auth Tag | Base64 encoded in backup file |
Post-Decryption
| Step | Action |
|---|---|
| 1 | Review decrypted secrets |
| 2 | Update Cloud Run service configuration |
| 3 | Update local .env files |
| 4 | Update Secret Manager (if used) |
Full Disaster Recovery Drill
Pre-Drill Checklist
| Step | Verification |
|---|---|
| GCS Access | Can access backup bucket |
| Latest PostgreSQL | Backup exists and is recent |
| Documents | Files accessible in GCS |
| Secrets | Encrypted backup available |
PostgreSQL Restore Test
| Step | Action |
|---|---|
| 1 | List available backups |
| 2 | Download latest daily backup |
| 3 | Restore to test database |
| 4 | Verify row counts match manifest |
Document Restore Test
| Step | Action |
|---|---|
| 1 | List documents in each folder |
| 2 | Download single file from each folder |
| 3 | Verify file integrity |
Secrets Restore Test
| Step | Action |
|---|---|
| 1 | Download encrypted backup |
| 2 | Verify encryption key works |
| 3 | Decrypt and inspect contents |
Post-Drill
| Step | Action |
|---|---|
| 1 | Record any issues found |
| 2 | Update procedures if needed |
| 3 | Note time to complete each step |
Estimated Recovery Times
| Recovery Type | Estimated Time |
|---|---|
| Single table restore | 5-15 minutes |
| Full PostgreSQL restore | 30-60 minutes |
| Single GCS folder restore | 15-30 minutes |
| Full GCS restore | 1-2 hours |
| Secrets restore | 15 minutes |
| Full system recovery | 2-4 hours |
Recovery Decision Matrix
| Scenario | Action |
|---|---|
| Single row missing | Restore specific table from daily backup |
| Database corruption | Full restore from last known good backup |
| Accidental table drop | Restore specific table from daily backup |
| Media files deleted | Restore from GCS using gcloud storage cp |
| API secrets compromised | Rotate secrets, restore config, redeploy |
| Complete data loss | Full PostgreSQL + GCS restore |
Emergency Procedure
| Step | Action |
|---|---|
| 1 | Notify stakeholders about the incident |
| 2 | Document the failure (what, when, impact) |
| 3 | Assess damage (which data is affected) |
| 4 | Execute recovery following this guide |
| 5 | Verify restoration using checksums and counts |
| 6 | Post-mortem to prevent recurrence |
Best Practices
| Practice | Description |
|---|---|
| Test regularly | Run DR drills quarterly |
| Verify backups | Check backup completion status |
| Monitor retention | Ensure old backups are cleaned up |
| Secure encryption key | Store in multiple secure locations |
| Document changes | Update this guide when procedures change |
| Practice restores | Team should be familiar with procedures |