← Galaxy / notesorg-wide / engineering-docs-operations-deploy-runbook

Galaxy production deploy runbook

engineering-docs-operations-deploy-runbook · in engineering/docs/operations · org-wide · updated 2026-06-01 10:19

Frontmatter

lang
en
imported_at
2026-06-01T10:19:43.141Z
source_path
productgalaxy/docs/operations/DEPLOY-RUNBOOK.md
source_repo
productgalaxy

Galaxy production deploy runbook

A single, plain-English walkthrough for taking productgalaxy live on a Hetzner VPS. Designed for a non-technical operator with the Claude Code assistant.

Before you start

You need:

  • A Hetzner Cloud account + one CCX23 (or larger) VPS provisioned with Ubuntu 24.04.
  • SSH access to the VPS as root (later we lock this down).
  • A domain (e.g. galaxy.example.com) with A/AAAA records pointed at the VPS IP. Three subdomains:
    • galaxy.example.com — admin + API
    • mcp.galaxy.example.com — MCP server
    • docs.galaxy.example.com — reading site
  • A Backblaze B2 account + one bucket named galaxy-pgbackrest (encrypted backups).
  • One generated production secrets file at ./secrets/galaxy.yaml.enc (SOPS-encrypted; see "Secrets" below).

Step 1 — VPS bootstrap (one-off)

SSH into the box and run:

ssh root@<vps-ip>

apt update && apt -y upgrade
apt install -y ca-certificates curl gnupg lsb-release

# Docker Engine (from the official repo)
install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg \
  | gpg --dearmor -o /etc/apt/keyrings/docker.gpg
echo "deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] \
  https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" \
  > /etc/apt/sources.list.d/docker.list
apt update
apt install -y docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

# A non-root user that owns the deploy directory
useradd --create-home --shell /bin/bash --groups docker galaxy
mkdir -p /srv/galaxy /etc/galaxy/secrets /var/lib/galaxy/attachments
chown -R galaxy:galaxy /srv/galaxy /var/lib/galaxy
chmod 700 /etc/galaxy/secrets

# Firewall — only SSH + HTTP(S) inbound
ufw default deny incoming
ufw default allow outgoing
ufw allow 22/tcp
ufw allow 80/tcp
ufw allow 443/tcp
ufw allow 443/udp   # HTTP/3 over QUIC
ufw --force enable

Step 2 — Push the secrets

Locally, encrypt your production secrets with SOPS + age (or a YubiKey, or a KMS). The example file .env.production.example shows every key you need to set.

# Locally:
sops --age <your-pubkey> --encrypt secrets/galaxy.yaml > secrets/galaxy.yaml.enc

# On first deploy, decrypt + push to the VPS (mode 600, owner root):
sops -d secrets/galaxy.yaml.enc | ssh root@<vps-ip> \
  'umask 077 && cat > /etc/galaxy/secrets/galaxy.yaml && chmod 600 /etc/galaxy/secrets/galaxy.yaml'

The three pgBackRest secrets are simpler one-line files:

echo "$B2_KEY"            | ssh root@<vps-ip> 'umask 077 && cat > /etc/galaxy/secrets/b2_key.txt'
echo "$B2_SECRET"         | ssh root@<vps-ip> 'umask 077 && cat > /etc/galaxy/secrets/b2_secret.txt'
openssl rand -base64 64   | ssh root@<vps-ip> 'umask 077 && cat > /etc/galaxy/secrets/pgbackrest_cipher_pass.txt'
echo "$POSTGRES_PASSWORD" | ssh root@<vps-ip> 'umask 077 && cat > /etc/galaxy/secrets/postgres_password.txt'

Step 3 — Pin image digests (12-Factor II + V)

Edit docker-compose.prod.yml and replace each @sha256:REPLACE_WITH_DIGEST_BEFORE_FIRST_PROD_DEPLOY with a real digest. To get them:

docker pull pgvector/pgvector:pg17 && docker inspect pgvector/pgvector:pg17 --format '{{index .RepoDigests 0}}'
docker pull caddy:2.10-alpine && docker inspect caddy:2.10-alpine --format '{{index .RepoDigests 0}}'
docker pull pgbackrest/pgbackrest:2.55 && docker inspect pgbackrest/pgbackrest:2.55 --format '{{index .RepoDigests 0}}'

Galaxy's own images (ghcr.io/sabaidea/galaxy-app, …-mcp, …-docs) build from this repo. Push them via GitHub Actions on every tagged release; the workflow stamps the digest into a PROD_TAG env var that the compose file reads.

Step 4 — First deploy

ssh galaxy@<vps-ip>
cd /srv/galaxy
git clone https://github.com/parhumm/productgalaxy.git .
cd /srv/galaxy

# Build local images (first time only — subsequent deploys pull pre-built from GHCR)
docker compose -f docker-compose.yml -f docker-compose.prod.yml build

# Bring up postgres + pgbackrest only first
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d postgres pgbackrest

# Run migrations as a one-off process (12-Factor XII)
docker compose -f docker-compose.yml -f docker-compose.prod.yml run --rm app \
  node packages/db/dist/migrate.js

# Seed the canonical taxonomies + first admin user
docker compose -f docker-compose.yml -f docker-compose.prod.yml run --rm app \
  node packages/db/dist/seed/run.js

# Then bring up the rest
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d

Wait 60 seconds for healthchecks to settle, then:

docker compose -f docker-compose.yml -f docker-compose.prod.yml ps
# Every row should say "running (healthy)"

Step 5 — Smoke-test from anywhere

Locally:

GALAXY_BASE=https://galaxy.example.com \
GALAXY_MCP_BASE=https://mcp.galaxy.example.com \
GALAXY_DOCS_BASE=https://docs.galaxy.example.com \
GALAXY_ADMIN_JWT="$(pnpm --filter @galaxy/db exec tsx tools/issue-admin-jwt.ts)" \
  ./scripts/smoke-test.sh

Expected: 12 green PASS lines, ending with ✓ smoke test passed.

Step 6 — Verify against live data

Once data is imported (see docs/operations/IMPORTERS.md):

# Per-domain data parity (byte equality vs source files)
pnpm --filter @galaxy/importer-audits     run verify:data-parity
pnpm --filter @galaxy/importer-pm         run verify:data-parity
pnpm --filter @galaxy/importer-abtests    run verify:data-parity
pnpm --filter @galaxy/importer-comments   run verify:data-parity

# Domain invariants
pnpm --filter @galaxy/importer-audits     run verify:walkthrough-parity
pnpm --filter @galaxy/importer-audits     run verify:content-parity
pnpm --filter @galaxy/importer-pm         run verify:cross-link-parity
pnpm --filter @galaxy/importer-comments   run verify:insights-parity

# Integration suite (runs against the live DB)
pnpm --filter @galaxy/tests-integration   test

# E2E suite (requires Playwright browsers — one-off install on the deploy box)
pnpm --filter @galaxy/tests-e2e install:browsers
pnpm --filter @galaxy/tests-e2e           test

A green run on all of the above is the cutover gate for the legacy app teams.

Step 7 — Day-two operations

Task Command
Roll a new release Push a tag → GHA builds + pushes images to GHCR → docker compose pull && docker compose up -d on the box
Restore from backup See pgbackrest-restore.md — full / diff / time-PITR all documented
Rotate the admin JWT pnpm --filter @galaxy/db exec tsx tools/issue-admin-jwt.ts > new.jwt, replace in secrets file, restart app + mcp
Read prod logs docker compose logs -f app mcp — 12-Factor XI: stdout streams
Health probe from a Slack alert curl -fsS https://galaxy.example.com/healthz?deep=1

Rollback

The deploy workflow keeps the previous image digest pinned in a tag file. Rollback is:

ssh galaxy@<vps-ip>
cd /srv/galaxy
git checkout HEAD~1 -- docker-compose.prod.yml   # restore the previous digest pin
docker compose -f docker-compose.yml -f docker-compose.prod.yml up -d
./scripts/smoke-test.sh

The DB never moves backward on rollback — migrations are append-only per CLAUDE.md §6. Restoring data is a separate pgBackRest PITR run, not a rollback.

Outbound links (0)

This note doesn't reference any other entity.

Version history (1)

  • v12026-06-01 10:19"galaxy-docs importer: initial import"