Cloudflare Tunnel + Access setup (CI → VPS auth)
engineering-docs-operations-cloudflare-tunnel-setup · in engineering/docs/operations · org-wide · updated 2026-06-01 10:19
Frontmatter
- lang
- en
- imported_at
- 2026-06-01T10:19:42.790Z
- source_path
- productgalaxy/docs/operations/cloudflare-tunnel-setup.md
- source_repo
- productgalaxy
Cloudflare Tunnel + Access setup (CI → VPS auth)
Status: required for deploy-staging.yml, deploy-prod.yml, deploy-prod-rollback.yml.
Owner: ops.
Last reviewed: 2026-05-25.
Why this exists
GitHub Actions needs to reach the staging + prod VPSes to run haloy deploy
and to push SOPS-decrypted secrets to /etc/galaxy/secrets/galaxy.yaml.
The two obvious options — long-lived SSH deploy keys or a deploy-user
password stored as a GitHub Secret — are both rejected by ADR-003 §"CI → VPS
auth" and CLAUDE.md §14:
- Long-lived SSH keys are the PocketOS class of incident: a broadly-scoped credential gets discovered in an unrelated file (a CLI config, a backup, a CI artifact) and grants instant root.
- A deploy-user password is even worse — keystroke-loggable, no per-action scoping, never auto-rotated.
Cloudflare Tunnel + Cloudflare Access service tokens replace both:
- The VPS runs
cloudflaredand originates an outbound tunnel to Cloudflare. No inbound SSH port is open (port 22 can be firewalled off entirely). - GitHub Actions authenticates to Cloudflare Access using a short-lived
service token (
CF-Access-Client-Id+CF-Access-Client-Secret), not an SSH key. Cloudflare proxies the SSH session to the VPS. - The service token is revocable in one click from the Cloudflare dashboard. Granting access doesn't require touching the VPS.
- All sessions are logged in Cloudflare Access logs with the service token id — every deploy is attributable.
What you'll set up
| Component | Where it runs |
|---|---|
cloudflared tunnel |
each VPS (staging + prod), as a systemd unit |
| Cloudflare Tunnel routes | Cloudflare dashboard → Zero Trust → Networks → Tunnels |
| Cloudflare Access application | Cloudflare dashboard → Zero Trust → Access → Applications |
| Access service token | Cloudflare dashboard → Zero Trust → Access → Service Auth |
| GitHub Actions secrets | GitHub → repo → Settings → Secrets |
Prerequisites
- A Cloudflare zone for
galaxy.example.comalready onboarded. - A Cloudflare Zero Trust team (free tier for up to 50 users covers this).
- Root or sudo access on each VPS.
- An ops device with
cloudflaredinstalled locally for the one-time setup.
Steps — VPS side (run on each VPS, staging + prod)
1. Install cloudflared
# Debian / Ubuntu
curl -fsSL https://pkg.cloudflare.com/cloudflare-main.gpg \
| sudo tee /usr/share/keyrings/cloudflare-main.gpg >/dev/null
echo 'deb [signed-by=/usr/share/keyrings/cloudflare-main.gpg] https://pkg.cloudflare.com/cloudflared bookworm main' \
| sudo tee /etc/apt/sources.list.d/cloudflared.list
sudo apt-get update
sudo apt-get install -y cloudflared
cloudflared --version
2. Login + create the tunnel
# Opens a browser link to authorize cloudflared against your CF account.
sudo cloudflared tunnel login
# Create one tunnel per environment. Names are arbitrary but must be unique.
sudo cloudflared tunnel create galaxy-staging # on staging VPS
sudo cloudflared tunnel create galaxy-prod # on prod VPS
The create command outputs the tunnel UUID and writes credentials to
/root/.cloudflared/<UUID>.json. Keep this file 0600; treat it like an SSH
private key (it lives only on the VPS, never in git).
3. Configure the tunnel routes (/etc/cloudflared/config.yml)
# /etc/cloudflared/config.yml — staging VPS example
tunnel: <UUID> # the value from `tunnel create`
credentials-file: /root/.cloudflared/<UUID>.json
# Origin connections: cloudflared listens for tunnel-side requests on these
# hostnames and forwards to the local services. The first ingress rule that
# matches wins; the final `service: http_status:404` is the required catch-all.
ingress:
# SSH for CI → VPS deploys (this is what GitHub Actions reaches via
# `cloudflared access ssh --hostname ssh.staging.galaxy.example.com`).
- hostname: ssh.staging.galaxy.example.com
service: ssh://localhost:22
# Public HTTP origins. Caddy listens on :80 / :443 inside the VPS; the
# tunnel terminates TLS at Cloudflare's edge and re-originates HTTP to Caddy.
- hostname: api-staging.galaxy.example.com
service: http://localhost:80
- hostname: mcp-staging.galaxy.example.com
service: http://localhost:80
- hostname: docs-staging.galaxy.example.com
service: http://localhost:80
# Catch-all
- service: http_status:404
For prod use ssh.galaxy.example.com, api.galaxy.example.com,
mcp.galaxy.example.com, docs.galaxy.example.com (no -staging suffix).
4. Create DNS records that point at the tunnel
sudo cloudflared tunnel route dns galaxy-staging ssh.staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging api-staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging mcp-staging.galaxy.example.com
sudo cloudflared tunnel route dns galaxy-staging docs-staging.galaxy.example.com
This creates CNAMEs <hostname> → <UUID>.cfargotunnel.com. They're proxied
(orange-clouded) automatically.
5. Run cloudflared as a systemd service
sudo cloudflared service install
sudo systemctl enable --now cloudflared
sudo systemctl status cloudflared # expect "active (running)"
6. Firewall off port 22 from the public internet
Now that SSH is reachable only via the tunnel, close the public port:
sudo ufw deny 22/tcp comment "SSH via Cloudflare Tunnel only"
sudo ufw status
Verify from off-network: ssh deploy@<VPS_IP> should hang/timeout, while
cloudflared access ssh --hostname ssh.staging.galaxy.example.com should
prompt for the service token headers.
Steps — Cloudflare dashboard side
7. Create the Access application
Zero Trust → Access → Applications → Add an application → Self-hosted →
- Application name:
galaxy-staging-ssh(and a secondgalaxy-prod-ssh) - Session duration: 15 minutes (matches our short-lived JWT pattern)
- Application domain:
ssh.staging.galaxy.example.com - Identity providers: GitHub (for human ops) + Service Auth (for CI)
Add a policy:
- Policy name:
ci-deploy-bot - Action: Allow
- Selector:
Service Tokenisgalaxy-staging-ci(created below)
For human ops, add a second policy:
- Policy name:
ops-engineers - Action: Allow
- Selector:
Emails ending in@sabaidea.com+Countryis your home country
8. Create the service tokens
Zero Trust → Access → Service Auth → Create Service Token →
- Token name:
galaxy-staging-ci(and a secondgalaxy-prod-ci) - Duration: Non-expiring (we'll rotate yearly by hand — see below)
Save the displayed Client ID and Client Secret — the secret is shown only once. Stash them in your password manager too as a backup.
9. Add the tokens to GitHub Actions secrets
# from your dev box, with the gh CLI authenticated to the productgalaxy repo
gh secret set CLOUDFLARE_ACCESS_CLIENT_ID --body 'XXXXXXXX.access'
gh secret set CLOUDFLARE_ACCESS_CLIENT_SECRET --body 'YYYY...'
gh secret set STAGING_TUNNEL_HOST --body 'ssh.staging.galaxy.example.com'
gh secret set STAGING_DEPLOY_USER --body 'deploy'
gh secret set STAGING_HALOY_TARGET --body 'staging'
gh secret set PROD_TUNNEL_HOST --body 'ssh.galaxy.example.com'
gh secret set PROD_DEPLOY_USER --body 'deploy'
gh secret set PROD_HALOY_TARGET --body 'prod'
Note: today we use the same CLOUDFLARE_ACCESS_CLIENT_ID /
CLOUDFLARE_ACCESS_CLIENT_SECRET pair across staging and prod for
simplicity (one service token per VPS would be even better; track that as a
hardening followup). If you split them, name them
CLOUDFLARE_ACCESS_CLIENT_ID_STAGING etc. and update the workflows.
10. Verify
gh secret list | grep CLOUDFLARE
gh secret list | grep TUNNEL
gh secret list | grep DEPLOY
gh secret list | grep HALOY
Then trigger a workflow_dispatch on deploy-staging.yml with a known-good
SHA. The configure cloudflared SSH proxy step should succeed; the
haloy deploy step should produce its usual layer-diff output. If you see
401 Unauthorized from cloudflared, the service token policy on the Access
application is wrong — re-check step 7.
Rotation cadence
- Service tokens: rotate yearly (calendar reminder on the
quarterly /jaan-to:detect-pack re-auditcadence). - Tunnel credentials (
/root/.cloudflared/<UUID>.json): re-issued automatically bycloudflaredon certificate rollover; no action needed unless we change tunnel UUIDs. - DNS records: untouched unless the tunnel UUID changes.
To rotate a service token: Zero Trust → Access → Service Auth → … menu →
Refresh. Then immediately gh secret set the new values. Old tokens
keep working for 5 minutes for in-flight requests; new tokens take effect
within seconds.
Troubleshooting
| Symptom | Likely cause |
|---|---|
cloudflared access ssh returns 401 immediately |
Service token revoked or wrong Client-Id/Client-Secret in env vars |
cloudflared access ssh returns 403 |
Access application policy doesn't match service token (check step 7) |
Tunnel UP but ssh still times out |
ingress rule pointing at wrong port; check /etc/cloudflared/config.yml |
GitHub Actions step hangs in cloudflared access ssh-config |
runner can't reach *.cloudflareaccess.com; very rare runner network issue, re-run |
cloudflared tunnel run errors connection refused |
sshd on the VPS not running (sudo systemctl status ssh) |