← Galaxy / notesorg-wide / engineering-docs-operations-sops-secrets-setup

SOPS + age secrets setup (deploy-time decryption)

engineering-docs-operations-sops-secrets-setup · in engineering/docs/operations · org-wide · updated 2026-06-01 10:19

Frontmatter

lang
en
imported_at
2026-06-01T10:19:43.525Z
source_path
productgalaxy/docs/operations/sops-secrets-setup.md
source_repo
productgalaxy

SOPS + age secrets setup (deploy-time decryption)

Status: required for deploy-staging.yml, deploy-prod.yml, deploy-prod-rollback.yml. Owner: ops. Last reviewed: 2026-05-25.

Why this exists

CLAUDE.md §14 and ADR-003 §"Secrets" together ban two patterns:

  1. env_file: for prod secrets in Compose. Docker inspect exposes every env var to anyone with Docker-socket access (read: any container in the engine, depending on socket exposure). Bug bounty reports from 2025-26 repeatedly catch teams here.
  2. Secrets in plain environment: blocks in prod Compose. Same problem, plus the values land in shell history, in ps output, and in docker compose config dumps.

The only safe pattern on a single-host Compose stack is Compose secrets: file mounts, where each secret is a file on tmpfs inside the container, readable only by the configured uid. To get the values onto the host in the first place — without committing plaintext to git — we use SOPS + age:

  • secrets/{staging,prod}/galaxy.enc.yaml is encrypted-at-rest in git. Anyone with read access to the repo can see structure but not values.
  • At deploy time, the GitHub Actions runner decrypts the file using SOPS_AGE_KEY_STAGING or SOPS_AGE_KEY_PROD (GitHub Encrypted Secrets) and scps the plaintext yaml to /etc/galaxy/secrets/galaxy.yaml on the VPS (mode 600), then immediately shreds the runner-side copy.
  • Compose mounts that yaml as Docker Secrets via the secrets: block in docker-compose.prod.yml. Containers read the file path (e.g. /run/secrets/database_url), not an env var.

What you'll set up

Component Where it lives
sops + age binaries local dev box + each VPS + GitHub Actions runners
Age key pairs (one per env) dev box (private) + GitHub Secrets (private) + repo (public recipient in .sops.yaml)
.sops.yaml repo root — maps secrets/<env>/*.enc.yaml → recipient
secrets/staging/galaxy.enc.yaml repo — encrypted
secrets/prod/galaxy.enc.yaml repo — encrypted
/etc/galaxy/secrets/galaxy.yaml each VPS — decrypted at deploy time, mode 600

Prerequisites

  • Mac (or Linux) dev box with Homebrew or apt.
  • Repo write access.
  • A safe place to store private age keys (1Password / Bitwarden vault).

Steps — one-time on your dev box

1. Install sops + age

# macOS
brew install sops age

# Debian / Ubuntu
SOPS_VER=3.9.1
curl -fsSL "https://github.com/getsops/sops/releases/download/v${SOPS_VER}/sops-v${SOPS_VER}.linux.amd64" \
  -o /usr/local/bin/sops && chmod +x /usr/local/bin/sops
AGE_VER=1.2.0
curl -fsSL "https://github.com/FiloSottile/age/releases/download/v${AGE_VER}/age-v${AGE_VER}-linux-amd64.tar.gz" \
  | tar -xz -C /tmp
sudo mv /tmp/age/age /tmp/age/age-keygen /usr/local/bin/

sops --version && age --version

2. Generate one age key pair per environment

mkdir -p ~/.config/sops/age
cd ~/.config/sops/age

age-keygen -o galaxy-dev.key       # local dev (optional but useful)
age-keygen -o galaxy-staging.key
age-keygen -o galaxy-prod.key

# The output looks like:
#   # created: 2026-05-25T...
#   # public key: age1abcdef...
#   AGE-SECRET-KEY-1XYZ...
#
# - The PUBLIC key (age1...) is the recipient — safe to commit.
# - The SECRET key (AGE-SECRET-KEY-1...) MUST NEVER leave this directory
#   without ending up in (a) the team password manager AND (b) GitHub Secrets.

chmod 600 galaxy-*.key

Stash the secret keys in the team password manager (vault entry per environment) BEFORE moving to step 3. A lost age key means rotating every secret in galaxy.enc.yaml, which is painful.

3. Commit .sops.yaml to the repo root

See .sops.yaml — already created by Phase 6b. It tells sops which recipient to use for each file:

creation_rules:
  - path_regex: ^secrets/dev/.*\.enc\.yaml$
    encrypted_regex: ^(?!sops_).*
    age: <age public key for dev>
  - path_regex: ^secrets/staging/.*\.enc\.yaml$
    encrypted_regex: ^(?!sops_).*
    age: <age public key for staging>
  - path_regex: ^secrets/prod/.*\.enc\.yaml$
    encrypted_regex: ^(?!sops_).*
    age: <age public key for prod>

When the file is updated with the actual age recipient values you generated in step 2, commit + push.

4. Create the encrypted secret files

mkdir -p secrets/{staging,prod}

# Build a plain YAML in a tmpfs scratch dir, encrypt with sops, delete plain.
TMPDIR=$(mktemp -d)
cat > "$TMPDIR/galaxy.yaml" <<'YAML'
# productgalaxy staging secrets — DECRYPTED FORM, never committed.
database_url: "postgres://galaxy_app:CHANGEME@postgres:5432/galaxy"
mcp_db_url: "postgres://mcp_app:CHANGEME@postgres:5432/galaxy"
better_auth_secret: "REPLACE_WITH_RANDOM_64_CHAR"
oauth_signing_key: "REPLACE_WITH_RANDOM_64_CHAR"
docs_api_token: "REPLACE_WITH_RANDOM_32_CHAR"
b2_key_id: "B2_KEY_ID"
b2_application_key: "B2_APPLICATION_KEY"
pgbackrest_cipher_pass: "REPLACE_WITH_RANDOM_64_CHAR"
smoke_principal_jwt: "smoke principal long-lived JWT (rotate quarterly)"
YAML
sops --encrypt "$TMPDIR/galaxy.yaml" > secrets/staging/galaxy.enc.yaml
shred -u "$TMPDIR/galaxy.yaml"
rm -rf "$TMPDIR"

# Repeat for prod with prod values.

Verify the encrypted file looks like ciphertext: head secrets/staging/galaxy.enc.yaml should show base64 blobs, not the values above. Commit + push.

Example decrypted secret shape

When sops --decrypt secrets/staging/galaxy.enc.yaml runs (either locally or in the deploy workflow), the result is:

database_url: "postgres://galaxy_app:s3cret@postgres:5432/galaxy"
mcp_db_url: "postgres://mcp_app:m3cret@postgres:5432/galaxy"
better_auth_secret: "f1c7d3...e9"     # 64 random chars
oauth_signing_key: "9a8b7c...41"
docs_api_token: "0e6d2a...c4"
b2_key_id: "K005abcdef..."
b2_application_key: "K005ghijklmn..."
pgbackrest_cipher_pass: "f3a7b1...d2"
smoke_principal_jwt: "eyJhbGc..."     # OAuth M2M short-lived JWT

docker-compose.prod.yml mounts the WHOLE FILE as a single Docker secret (/run/secrets/galaxy_yaml) and each container reads the keys it needs via a tiny entrypoint that exports them as env vars inside the container's own namespace (not via Compose environment:, which would leak to docker inspect).

5. Stash the age private keys in GitHub Secrets

# from your dev box
gh secret set SOPS_AGE_KEY_STAGING < ~/.config/sops/age/galaxy-staging.key
gh secret set SOPS_AGE_KEY_PROD    < ~/.config/sops/age/galaxy-prod.key

GitHub Encrypted Secrets are encrypted at rest with libsodium; only workflow runs can decrypt them, and they never appear in logs (auto-masked).

The deploy workflows decrypt on the runner and ship plaintext to the VPS, so the VPS doesn't strictly need sops. But install it anyway — it's used by the operator's emergency rollback runbook and by /galaxy:rollback:

# on each VPS (repeat the apt commands from step 1)
sudo apt-get install -y sops age

Rotation cadence

  • Age keys: rotate yearly minimum (calendar reminder in ops cycle). Rotation procedure:
    1. Generate a new key (age-keygen -o galaxy-prod-2027.key).
    2. Add the new recipient to .sops.yaml (keep the old one for grace).
    3. sops updatekeys secrets/prod/galaxy.enc.yaml re-encrypts to both recipients. Commit + push.
    4. Update SOPS_AGE_KEY_PROD in GitHub Secrets to the new private key.
    5. After 7 days, remove the old recipient from .sops.yaml; re-run sops updatekeys; commit + push. Old key is now ignored.
  • Individual secret values (DB passwords, B2 keys, JWT signing keys): rotate quarterly or immediately on personnel change. Use sops secrets/prod/galaxy.enc.yaml to open an editor with the decrypted file, change values, save, sops re-encrypts on close.

Troubleshooting

Symptom Likely cause
sops --decrypt returns "no key found" SOPS_AGE_KEY env var not set, or key doesn't match the recipient in the file
sops succeeds but yaml parse fails inside container encrypted_regex caught a structural key by mistake — check .sops.yaml
permission denied reading /etc/galaxy/secrets/galaxy.yaml deploy step didn't chmod 600 the destination; re-deploy
Workflow log shows secret value in plain someone added echo $secret to a step — rotate immediately

Hard rules (do NOT bypass)

  • NEVER git commit an unencrypted *.yaml under secrets/. The .gitignore already excludes secrets/**/*.yaml (without .enc.); the pre-commit hook (Phase 6c) re-checks. If you bypass and push: rotate every value the file contained, then revoke the commit history with git filter-repo.
  • NEVER cat /etc/galaxy/secrets/galaxy.yaml in a Claude Code session. Settings.json denies Bash(cat:*galaxy.yaml*); if you find a way around the deny rule, file a bug — that's the three-lock pattern leaking.
  • NEVER paste a secret value into chat / a PR comment / a Slack message. Anything that ends up in a SaaS log is effectively compromised.

Outbound links (0)

This note doesn't reference any other entity.

Version history (1)

  • v12026-06-01 10:19"galaxy-docs importer: initial import"