pgBackRest setup — Backblaze B2 archive + PITR
engineering-docs-operations-pgbackrest-setup · in engineering/docs/operations · org-wide · updated 2026-06-01 10:19
Frontmatter
- lang
- en
- imported_at
- 2026-06-01T10:19:43.334Z
- source_path
- productgalaxy/docs/operations/pgbackrest-setup.md
- source_repo
- productgalaxy
pgBackRest setup — Backblaze B2 archive + PITR
pgBackRest sidecar runs alongside Postgres on the same host. Full backup nightly, diff weekly (Sundays), WAL archived continuously. Retention: 30 days of fulls. Restore-to-point-in-time tested quarterly against a scratch DB.
0. Backblaze B2 setup (one-time, ~10 min)
- Backblaze B2 console → create a private bucket
galaxy-pgbackrest-prod - Buckets → bucket → Lifecycle settings → "Keep only the last 30 days of versions"
- App Keys → create an app key restricted to that bucket; scope =
Read and Write - Note:
keyID+applicationKey+endpoint(e.g.s3.us-west-002.backblazeb2.com)
Add to the VPS's /etc/galaxy/.env:
PGBACKREST_S3_BUCKET=galaxy-pgbackrest-prod
PGBACKREST_S3_ENDPOINT=s3.us-west-002.backblazeb2.com
PGBACKREST_S3_REGION=us-west-002
PGBACKREST_S3_KEY=<keyID>
PGBACKREST_S3_KEY_SECRET=<applicationKey>
1. Enable WAL archiving on Postgres
docker/postgres/postgresql-prod.conf:
# pgBackRest archive command — runs on every WAL segment switch.
archive_mode = on
archive_command = 'pgbackrest --stanza=galaxy archive-push %p'
archive_timeout = 60 # force a switch every 60s even on idle (small WALs ≠ bad)
# Postgres-level tuning aligned with pgBackRest's parallel restore.
max_wal_senders = 5
wal_level = replica # required by pgBackRest backups
Mount this file in docker-compose.yml:
postgres:
command: postgres -c config_file=/etc/postgresql/postgresql.conf
volumes:
- ./docker/postgres/postgresql-prod.conf:/etc/postgresql/postgresql.conf:ro
- /data/postgres:/var/lib/postgresql/data
Restart Postgres: docker compose restart postgres. Verify: docker exec galaxy_postgres psql -U galaxy -c "SHOW archive_mode" → on.
2. pgBackRest sidecar in docker-compose.yml
pgbackrest:
image: pgbackrest/pgbackrest:latest
container_name: galaxy_pgbackrest
restart: unless-stopped
depends_on:
postgres:
condition: service_healthy
environment:
PGBACKREST_REPO1_S3_BUCKET: ${PGBACKREST_S3_BUCKET}
PGBACKREST_REPO1_S3_ENDPOINT: ${PGBACKREST_S3_ENDPOINT}
PGBACKREST_REPO1_S3_REGION: ${PGBACKREST_S3_REGION}
PGBACKREST_REPO1_S3_KEY: ${PGBACKREST_S3_KEY}
PGBACKREST_REPO1_S3_KEY_SECRET: ${PGBACKREST_S3_KEY_SECRET}
PGBACKREST_REPO1_TYPE: s3
PGBACKREST_REPO1_PATH: /
PGBACKREST_REPO1_RETENTION_FULL: 30
PGBACKREST_REPO1_RETENTION_DIFF: 4
PGBACKREST_STANZA: galaxy
PGBACKREST_PROCESS_MAX: 4
PGBACKREST_LOG_LEVEL_CONSOLE: info
PGBACKREST_PG1_HOST: postgres
PGBACKREST_PG1_PORT: 5432
PGBACKREST_PG1_USER: galaxy
PGBACKREST_PG1_DATABASE: galaxy
PGBACKREST_PG1_PATH: /var/lib/postgresql/data
volumes:
- /data/pgbackrest:/var/lib/pgbackrest
- /data/postgres:/var/lib/postgresql/data:ro
Note: pgBackRest needs read-access to the Postgres data directory for stanza ops + restore — that's the :ro mount.
3. Bootstrap the stanza (one-time, ~3 min)
docker compose up -d pgbackrest
docker compose exec pgbackrest pgbackrest --stanza=galaxy stanza-create
docker compose exec pgbackrest pgbackrest --stanza=galaxy check
check must print INFO: switch wal not performed because no primary then INFO: stanza-create command end: completed successfully. If it fails, see Troubleshooting below.
4. First full backup
docker compose exec pgbackrest pgbackrest --stanza=galaxy --type=full backup
# wait ~5-10 min for 1 GB DB; bigger DB scales linearly
docker compose exec pgbackrest pgbackrest --stanza=galaxy info
Output should show full backup: <timestamp>F with status ok.
5. Scheduled backups (systemd timers)
/etc/systemd/system/galaxy-pgbackrest-full.service:
[Unit]
Description=galaxy pgBackRest full backup
After=docker.service
[Service]
Type=oneshot
ExecStart=/usr/bin/docker compose -f /home/galaxy/productgalaxy/docker-compose.yml exec -T pgbackrest pgbackrest --stanza=galaxy --type=full backup
WorkingDirectory=/home/galaxy/productgalaxy
User=galaxy
/etc/systemd/system/galaxy-pgbackrest-full.timer:
[Unit]
Description=galaxy pgBackRest full backup (nightly 03:00 UTC)
[Timer]
OnCalendar=*-*-* 03:00:00
Persistent=true
[Install]
WantedBy=timers.target
Same shape for diff (Sundays 04:00) — change --type=full to --type=diff.
sudo systemctl daemon-reload
sudo systemctl enable --now galaxy-pgbackrest-full.timer galaxy-pgbackrest-diff.timer
systemctl list-timers galaxy-*
6. Quarterly restore drill (no overwrite)
# Spin a scratch postgres container alongside production.
docker run -d --name galaxy_restore_scratch \
-e POSTGRES_PASSWORD=scratch \
-v /data/postgres-scratch:/var/lib/postgresql/data \
pgvector/pgvector:pg17
# Stop it; restore from B2 over the scratch data dir.
docker stop galaxy_restore_scratch
docker compose exec pgbackrest pgbackrest \
--stanza=galaxy --type=time \
--target='<YYYY-MM-DD HH:MM:SS+00>' \
--target-action=promote \
--pg1-path=/var/lib/postgresql/data-scratch \
restore
# Start scratch, query row counts, compare to last verify snapshot
docker start galaxy_restore_scratch
docker exec galaxy_restore_scratch psql -U galaxy -d galaxy -c "SELECT count(*) FROM comments;"
# Tear down — DO NOT leave running.
docker rm -f galaxy_restore_scratch
rm -rf /data/postgres-scratch
Record the drill timestamp + row counts in docs/operations/restore-drill-log.md. Drift > 0 rows = file an incident.
Troubleshooting
archive command failed: exit code 78: pgBackRest can't reach the S3 endpoint. CheckPGBACKREST_REPO1_S3_*env vars; tests3cmd ls s3://<bucket>from the host with the same creds.stanza-create error: missing parameter pg1-host: docker network — make surepgbackrestservice is on the same network aspostgres. Add explicitnetworks:block if not.- WAL piling up on disk: archive_command is failing silently. Check
docker logs galaxy_pgbackrest | grep -i error. Until fixed, Postgres WILL fill the disk (it won't recycle WALs that haven't been archived). - Restore says
chunk … not found in repo: backup file was deleted by lifecycle policy. BumpPGBACKREST_REPO1_RETENTION_FULLif you need older points.