Pular para conteúdo

Backup & Restore

Procedimentos de backup e restore do banco de dados.

Automated Backups (RDS)

Configuração

Staging: - Backup window: 03:00-04:00 UTC - Retention: 7 dias - Point-in-time recovery: Não

Production: - Backup window: 03:00-04:00 UTC (baixo tráfego) - Retention: 30 dias - Point-in-time recovery: Sim (5 minutos granularidade) - Cross-region backup: Não (considerar para DR)

Verificar Backups

# Listar backups automáticos
aws rds describe-db-snapshots \
  --db-instance-identifier prod-db \
  --snapshot-type automated

# Ver backup mais recente
aws rds describe-db-snapshots \
  --db-instance-identifier prod-db \
  --snapshot-type automated \
  --query 'DBSnapshots[0].[DBSnapshotIdentifier,SnapshotCreateTime,Status]'

Manual Snapshots

Criar Snapshot

# Antes de migration ou mudança importante
aws rds create-db-snapshot \
  --db-instance-identifier prod-db \
  --db-snapshot-identifier prod-db-pre-migration-$(date +%Y%m%d-%H%M%S) \
  --tags Key=Purpose,Value=PreMigration Key=CreatedBy,Value=Manual

# Aguardar completar
aws rds wait db-snapshot-completed \
  --db-snapshot-identifier prod-db-pre-migration-...

echo "✅ Snapshot created successfully"

Listar Snapshots

# Todos os snapshots manuais
aws rds describe-db-snapshots \
  --db-instance-identifier prod-db \
  --snapshot-type manual \
  --query 'DBSnapshots[].[DBSnapshotIdentifier,SnapshotCreateTime]' \
  --output table

Deletar Snapshot

# Deletar snapshot antigo (liberar espaço/custo)
aws rds delete-db-snapshot \
  --db-snapshot-identifier prod-db-old-snapshot

Restore from Snapshot

Restore para Nova Instância

# Restore snapshot para nova instância
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier prod-db-restored \
  --db-snapshot-identifier prod-db-pre-migration-20260120 \
  --db-instance-class db.t3.medium \
  --vpc-security-group-ids sg-xxxxx \
  --db-subnet-group-name prod-subnet-group

# Aguardar disponibilidade (15-20 minutos)
aws rds wait db-instance-available \
  --db-instance-identifier prod-db-restored

# Verificar endpoint
aws rds describe-db-instances \
  --db-instance-identifier prod-db-restored \
  --query 'DBInstances[0].Endpoint.Address'

Validar Restore

# Conectar na instância restaurada
psql -h prod-db-restored.xxxxx.rds.amazonaws.com \
  -U app_user -d app_db

# Verificar dados
SELECT count(*) FROM users;
SELECT max(created_at) FROM orders;

# Se tudo OK, pode promover para primary
# (Requer mudança de connection string e downtime)

Point-in-Time Recovery

Para restaurar para momento específico:

# Restaurar para 1 hora atrás
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier prod-db \
  --target-db-instance-identifier prod-db-pitr-restored \
  --restore-time $(date -u -d '1 hour ago' +%Y-%m-%dT%H:%M:%SZ)

# Ou usar latest restorable time
aws rds restore-db-instance-to-point-in-time \
  --source-db-instance-identifier prod-db \
  --target-db-instance-identifier prod-db-pitr-restored \
  --use-latest-restorable-time

Export para S3

Para backup de longo prazo ou analytics:

# Export snapshot para S3
aws rds start-export-task \
  --export-task-identifier prod-export-20260120 \
  --source-arn arn:aws:rds:us-east-1:123456:snapshot:prod-db-snapshot \
  --s3-bucket-name app-db-exports \
  --s3-prefix exports/2026/01/20/ \
  --iam-role-arn arn:aws:iam::123456:role/RDSExportRole \
  --kms-key-id arn:aws:kms:us-east-1:123456:key/xxxxx

# Formato: Parquet (otimizado para analytics)

Backup Testing

Validar Backups Regularmente

# Mensalmente: restaurar backup em ambiente de teste
aws rds restore-db-instance-from-db-snapshot \
  --db-instance-identifier test-restore-$(date +%Y%m) \
  --db-snapshot-identifier <latest-snapshot>

# Verificar integridade
python scripts/validate_backup.py --host test-restore-...

# Deletar após validação
aws rds delete-db-instance \
  --db-instance-identifier test-restore-... \
  --skip-final-snapshot

Disaster Recovery

Cross-Region Replication

# Copiar snapshot para outra região
aws rds copy-db-snapshot \
  --source-db-snapshot-identifier arn:aws:rds:us-east-1:123456:snapshot:prod-snapshot \
  --target-db-snapshot-identifier prod-snapshot-dr \
  --region us-west-2 \
  --kms-key-id arn:aws:kms:us-west-2:123456:key/xxxxx

DR Procedure

RTO (Recovery Time Objective): 1 hora
RPO (Recovery Point Objective): 5 minutos

  1. Identify failure
  2. Promote read replica (if available) OU restore from snapshot
  3. Update connection strings
  4. Verify application works
  5. Communicate to team

Monitoring

Backup Alarms

BackupAlarm:
  Type: AWS::CloudWatch::Alarm
  Properties:
    AlarmName: rds-backup-failed
    MetricName: BackupRetentionPeriodStorageUsed
    Namespace: AWS/RDS
    Statistic: Average
    Period: 86400  # 24 hours
    EvaluationPeriods: 1
    Threshold: 0
    ComparisonOperator: LessThanOrEqualToThreshold

Backup Age

Alertar se último backup > 25 horas:

def check_backup_age():
    snapshots = rds.describe_db_snapshots(
        DBInstanceIdentifier='prod-db',
        SnapshotType='automated',
        MaxRecords=1
    )

    latest = snapshots['DBSnapshots'][0]
    age_hours = (datetime.now() - latest['SnapshotCreateTime']).total_seconds() / 3600

    if age_hours > 25:
        send_alert(f"Last backup is {age_hours:.1f} hours old!")

Cost Optimization

  • Snapshots manuais custam (storage)
  • Deletar snapshots antigos desnecessários
  • Usar lifecycle policies
  • Export para S3 Glacier para long-term

Referências