ca-grow-ops-manager/docs/troubleshooting-dns-wildcards.md
fullsizemalt 28d8e9e4a2
Some checks failed
Deploy to Production / deploy (push) Failing after 0s
Test / backend-test (push) Failing after 0s
Test / frontend-test (push) Failing after 0s
docs: Add agent-optimized debugging workflows for DNS/Routing
2025-12-09 08:54:51 -08:00

2.7 KiB

Troubleshooting: The Cloudflare Wildcard Trap

Symptom: New subdomains (e.g., newapp.runfoo.run) redirect to an incorrect service (e.g., Alertmanager) or a different server entirely, despite correct local configuration (Traefik labels). Traefik logs show zero evidence of the request reaching the server.

Root Cause: Missing A-Record + Wildcard Fallback. Cloudflare DNS has a Wildcard (*) record pointing to a Legacy or Different server IP. When you deploy a new app but forget to add its specific A record, Cloudflare falls back to the Wildcard and routes traffic to the wrong server.

Diagnosis Steps:

  1. Trace Request: curl -H "User-Agent: TRACE_TEST" https://your-domain.runfoo.run Check docker logs traefik | grep TRACE_TEST.
    • Logs Found: Issue is local (Traefik config).
    • No Logs: Issue is upstream (DNS/Cloudflare).
  2. Verify DNS Resolution: host your-domain.runfoo.run Compare the IP with your target server IP (216.158.230.94 for Nexus-Vector).
    • Mismatch: You are hitting the Wildcard IP.
  3. Operational Matrix: Use the following script to audit the server's Ingress state and prove the server is "Innocent".

Operational Matrix Script (map_server.sh)

Save this as map_server.sh and run on the server to see what is really running.

#!/bin/bash
echo "=== OPERATIONAL MATRIX: SERVER SIDE INGRESS ==="
echo "Generated at: $(date)"
echo "---------------------------------------------------"

echo "[1] NATIVE PORTS (Who owns Port 80/443?)"
sudo ss -tulpn | grep -E ':80|:443' | awk '{print $1, $5, $7}'
echo ""

echo "[2] DOCKER CONTAINERS (Name + IP + Ports)"
docker ps --format "table {{.Names}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}"
echo ""

echo "[3] TRAEFIK ROUTERS (Label: traefik.http.routers.*.rule)"
docker inspect $(docker ps -q) --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule") (eq $k "traefik.http.routers.aspirant-api.rule")}}{{$k}}={{$v}}{{end}}{{end}}' | grep "rule="
echo ""

echo "[4] NGINX PROXY (Environment: VIRTUAL_HOST)"
docker inspect $(docker ps -q) --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}' | grep VIRTUAL_HOST
echo ""

echo "---------------------------------------------------"
echo "MATRIX COMPLETE."

Solution

  1. Log in to Cloudflare.
  2. Add an A Record for the specific subdomain.
    • Name: subdomain (e.g., 777wolfpack)
    • Content: 216.158.230.94 (Nexus-Vector IP)
    • Proxy: DNS Only (Grey) or Proxied (Orange).
  3. Wait for propagation.