← Galaxy / notesorg-wide / engineering-docs-operations-cloudflare-tunnel-setup

Cloudflare Tunnel + Access setup (CI → VPS auth)

engineering-docs-operations-cloudflare-tunnel-setup · in engineering/docs/operations · org-wide · updated 2026-06-01 10:19

Frontmatter

lang
en
imported_at
2026-06-01T10:19:42.790Z
source_path
productgalaxy/docs/operations/cloudflare-tunnel-setup.md
source_repo
productgalaxy

Cloudflare Tunnel + Access setup (CI → VPS auth)

Status: required for deploy-staging.yml, deploy-prod.yml, deploy-prod-rollback.yml. Owner: ops. Last reviewed: 2026-05-25.

Why this exists

GitHub Actions needs to reach the staging + prod VPSes to run haloy deploy and to push SOPS-decrypted secrets to /etc/galaxy/secrets/galaxy.yaml. The two obvious options — long-lived SSH deploy keys or a deploy-user password stored as a GitHub Secret — are both rejected by ADR-003 §"CI → VPS auth" and CLAUDE.md §14:

  • Long-lived SSH keys are the PocketOS class of incident: a broadly-scoped credential gets discovered in an unrelated file (a CLI config, a backup, a CI artifact) and grants instant root.
  • A deploy-user password is even worse — keystroke-loggable, no per-action scoping, never auto-rotated.

Cloudflare Tunnel + Cloudflare Access service tokens replace both:

  • The VPS runs cloudflared and originates an outbound tunnel to Cloudflare. No inbound SSH port is open (port 22 can be firewalled off entirely).
  • GitHub Actions authenticates to Cloudflare Access using a short-lived service token (CF-Access-Client-Id + CF-Access-Client-Secret), not an SSH key. Cloudflare proxies the SSH session to the VPS.
  • The service token is revocable in one click from the Cloudflare dashboard. Granting access doesn't require touching the VPS.
  • All sessions are logged in Cloudflare Access logs with the service token id — every deploy is attributable.

What you'll set up

Component Where it runs
cloudflared tunnel each VPS (staging + prod), as a systemd unit
Cloudflare Tunnel routes Cloudflare dashboard → Zero Trust → Networks → Tunnels
Cloudflare Access application Cloudflare dashboard → Zero Trust → Access → Applications
Access service token Cloudflare dashboard → Zero Trust → Access → Service Auth
GitHub Actions secrets GitHub → repo → Settings → Secrets

Prerequisites

  • A Cloudflare zone for galaxy.example.com already onboarded.
  • A Cloudflare Zero Trust team (free tier for up to 50 users covers this).
  • Root or sudo access on each VPS.
  • An ops device with cloudflared installed locally for the one-time setup.

Steps — VPS side (run on each VPS, staging + prod)

1. Install cloudflared

# Debian / Ubuntu
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
  | sudo tee /usr/share/keyrings/cloudflare-main.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared bookworm main' \
  | sudo tee /etc/apt/sources.list.d/cloudflared.list
sudo apt-get update
sudo apt-get install -y cloudflared
cloudflared --version

2. Login + create the tunnel

# Opens a browser link to authorize cloudflared against your CF account.
sudo cloudflared tunnel login

# Create one tunnel per environment. Names are arbitrary but must be unique.
sudo cloudflared tunnel create galaxy-staging   # on staging VPS
sudo cloudflared tunnel create galaxy-prod      # on prod VPS

The create command outputs the tunnel UUID and writes credentials to /root/.cloudflared/<UUID>.json. Keep this file 0600; treat it like an SSH private key (it lives only on the VPS, never in git).

3. Configure the tunnel routes (/etc/cloudflared/config.yml)

# /etc/cloudflared/config.yml  — staging VPS example
tunnel: <UUID>                                  # the value from `tunnel create`
credentials-file: /root/.cloudflared/<UUID>.json

# Origin connections: cloudflared listens for tunnel-side requests on these
# hostnames and forwards to the local services. The first ingress rule that
# matches wins; the final `service: http_status:404` is the required catch-all.
ingress:
  # SSH for CI → VPS deploys (this is what GitHub Actions reaches via
  # `cloudflared access ssh --hostname ssh.staging.galaxy.example.com`).
  - hostname: ssh.staging.galaxy.example.com
    service: ssh://localhost:22

  # Public HTTP origins. Caddy listens on :80 / :443 inside the VPS; the
  # tunnel terminates TLS at Cloudflare's edge and re-originates HTTP to Caddy.
  - hostname: api-staging.galaxy.example.com
    service: http://localhost:80
  - hostname: mcp-staging.galaxy.example.com
    service: http://localhost:80
  - hostname: docs-staging.galaxy.example.com
    service: http://localhost:80

  # Catch-all
  - service: http_status:404

For prod use ssh.galaxy.example.com, api.galaxy.example.com, mcp.galaxy.example.com, docs.galaxy.example.com (no -staging suffix).

4. Create DNS records that point at the tunnel

sudo cloudflared tunnel route dns galaxy-staging ssh.staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging api-staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging mcp-staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging docs-staging.galaxy.example.com

This creates CNAMEs <hostname> → <UUID>.cfargotunnel.com. They're proxied (orange-clouded) automatically.

5. Run cloudflared as a systemd service

sudo cloudflared service install
sudo systemctl enable --now cloudflared
sudo systemctl status cloudflared   # expect "active (running)"

6. Firewall off port 22 from the public internet

Now that SSH is reachable only via the tunnel, close the public port:

sudo ufw deny 22/tcp comment "SSH via Cloudflare Tunnel only"
sudo ufw status

Verify from off-network: ssh deploy@<VPS_IP> should hang/timeout, while cloudflared access ssh --hostname ssh.staging.galaxy.example.com should prompt for the service token headers.

Steps — Cloudflare dashboard side

7. Create the Access application

Zero Trust → Access → Applications → Add an application → Self-hosted →

  • Application name: galaxy-staging-ssh (and a second galaxy-prod-ssh)
  • Session duration: 15 minutes (matches our short-lived JWT pattern)
  • Application domain: ssh.staging.galaxy.example.com
  • Identity providers: GitHub (for human ops) + Service Auth (for CI)

Add a policy:

  • Policy name: ci-deploy-bot
  • Action: Allow
  • Selector: Service Token is galaxy-staging-ci (created below)

For human ops, add a second policy:

  • Policy name: ops-engineers
  • Action: Allow
  • Selector: Emails ending in @sabaidea.com + Country is your home country

8. Create the service tokens

Zero Trust → Access → Service Auth → Create Service Token →

  • Token name: galaxy-staging-ci (and a second galaxy-prod-ci)
  • Duration: Non-expiring (we'll rotate yearly by hand — see below)

Save the displayed Client ID and Client Secret — the secret is shown only once. Stash them in your password manager too as a backup.

9. Add the tokens to GitHub Actions secrets

# from your dev box, with the gh CLI authenticated to the productgalaxy repo
gh secret set CLOUDFLARE_ACCESS_CLIENT_ID     --body 'XXXXXXXX.access'
gh secret set CLOUDFLARE_ACCESS_CLIENT_SECRET --body 'YYYY...'
gh secret set STAGING_TUNNEL_HOST             --body 'ssh.staging.galaxy.example.com'
gh secret set STAGING_DEPLOY_USER             --body 'deploy'
gh secret set STAGING_HALOY_TARGET            --body 'staging'
gh secret set PROD_TUNNEL_HOST                --body 'ssh.galaxy.example.com'
gh secret set PROD_DEPLOY_USER                --body 'deploy'
gh secret set PROD_HALOY_TARGET               --body 'prod'

Note: today we use the same CLOUDFLARE_ACCESS_CLIENT_ID / CLOUDFLARE_ACCESS_CLIENT_SECRET pair across staging and prod for simplicity (one service token per VPS would be even better; track that as a hardening followup). If you split them, name them CLOUDFLARE_ACCESS_CLIENT_ID_STAGING etc. and update the workflows.

10. Verify

gh secret list | grep CLOUDFLARE
gh secret list | grep TUNNEL
gh secret list | grep DEPLOY
gh secret list | grep HALOY

Then trigger a workflow_dispatch on deploy-staging.yml with a known-good SHA. The configure cloudflared SSH proxy step should succeed; the haloy deploy step should produce its usual layer-diff output. If you see 401 Unauthorized from cloudflared, the service token policy on the Access application is wrong — re-check step 7.

Rotation cadence

  • Service tokens: rotate yearly (calendar reminder on the quarterly /jaan-to:detect-pack re-audit cadence).
  • Tunnel credentials (/root/.cloudflared/<UUID>.json): re-issued automatically by cloudflared on certificate rollover; no action needed unless we change tunnel UUIDs.
  • DNS records: untouched unless the tunnel UUID changes.

To rotate a service token: Zero Trust → Access → Service Auth → … menu → Refresh. Then immediately gh secret set the new values. Old tokens keep working for 5 minutes for in-flight requests; new tokens take effect within seconds.

Troubleshooting

Symptom Likely cause
cloudflared access ssh returns 401 immediately Service token revoked or wrong Client-Id/Client-Secret in env vars
cloudflared access ssh returns 403 Access application policy doesn't match service token (check step 7)
Tunnel UP but ssh still times out ingress rule pointing at wrong port; check /etc/cloudflared/config.yml
GitHub Actions step hangs in cloudflared access ssh-config runner can't reach *.cloudflareaccess.com; very rare runner network issue, re-run
cloudflared tunnel run errors connection refused sshd on the VPS not running (sudo systemctl status ssh)

Outbound links (0)

This note doesn't reference any other entity.

Version history (1)

  • v12026-06-01 10:19"galaxy-docs importer: initial import"