ca-grow-ops-manager/docs/troubleshooting-dns-wildcards.md
fullsizemalt 28d8e9e4a2
Some checks failed
Deploy to Production / deploy (push) Failing after 0s
Test / backend-test (push) Failing after 0s
Test / frontend-test (push) Failing after 0s
docs: Add agent-optimized debugging workflows for DNS/Routing
2025-12-09 08:54:51 -08:00

59 lines
2.7 KiB
Markdown

# Troubleshooting: The Cloudflare Wildcard Trap
**Symptom**: New subdomains (e.g., `newapp.runfoo.run`) redirect to an incorrect service (e.g., Alertmanager) or a different server entirely, despite correct local configuration (Traefik labels). Traefik logs show **zero** evidence of the request reaching the server.
**Root Cause**: **Missing A-Record + Wildcard Fallback**.
Cloudflare DNS has a Wildcard (`*`) record pointing to a *Legacy* or *Different* server IP. When you deploy a new app but forget to add its specific `A` record, Cloudflare falls back to the Wildcard and routes traffic to the wrong server.
**Diagnosis Steps**:
1. **Trace Request**:
`curl -H "User-Agent: TRACE_TEST" https://your-domain.runfoo.run`
Check `docker logs traefik | grep TRACE_TEST`.
- **Logs Found**: Issue is local (Traefik config).
- **No Logs**: Issue is upstream (DNS/Cloudflare).
2. **Verify DNS Resolution**:
`host your-domain.runfoo.run`
Compare the IP with your target server IP (`216.158.230.94` for Nexus-Vector).
- **Mismatch**: You are hitting the Wildcard IP.
3. **Operational Matrix**:
Use the following script to audit the server's Ingress state and prove the server is "Innocent".
## Operational Matrix Script (`map_server.sh`)
Save this as `map_server.sh` and run on the server to see what is really running.
```bash
#!/bin/bash
echo "=== OPERATIONAL MATRIX: SERVER SIDE INGRESS ==="
echo "Generated at: $(date)"
echo "---------------------------------------------------"
echo "[1] NATIVE PORTS (Who owns Port 80/443?)"
sudo ss -tulpn | grep -E ':80|:443' | awk '{print $1, $5, $7}'
echo ""
echo "[2] DOCKER CONTAINERS (Name + IP + Ports)"
docker ps --format "table {{.Names}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}"
echo ""
echo "[3] TRAEFIK ROUTERS (Label: traefik.http.routers.*.rule)"
docker inspect $(docker ps -q) --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule") (eq $k "traefik.http.routers.aspirant-api.rule")}}{{$k}}={{$v}}{{end}}{{end}}' | grep "rule="
echo ""
echo "[4] NGINX PROXY (Environment: VIRTUAL_HOST)"
docker inspect $(docker ps -q) --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}' | grep VIRTUAL_HOST
echo ""
echo "---------------------------------------------------"
echo "MATRIX COMPLETE."
```
## Solution
1. Log in to Cloudflare.
2. Add an **A Record** for the specific subdomain.
- **Name**: `subdomain` (e.g., `777wolfpack`)
- **Content**: `216.158.230.94` (Nexus-Vector IP)
- **Proxy**: DNS Only (Grey) or Proxied (Orange).
3. Wait for propagation.