diff --git a/.agent/workflows/debug-dns-routing.md b/.agent/workflows/debug-dns-routing.md new file mode 100644 index 0000000..6bc771e --- /dev/null +++ b/.agent/workflows/debug-dns-routing.md @@ -0,0 +1,44 @@ +--- +description: Debug and fix Traefik routing issues where the wrong app (e.g., Alertmanager) is served, indicating an upstream DNS/Cloudflare Wildcard conflict. +--- + +# Debugging DNS & Routing Conflicts (The Wildcard Trap) + +If a subdomain (e.g., `777wolfpack.runfoo.run`) is serving the wrong application (like Alertmanager) and Traefik logs show NO activity for that domain, you are likely hitting a **Cloudflare Wildcard Fallback**. + +## Diagnosis Steps + +1. **Check Traefik Logs**: + `docker logs traefik --tail 50` + If you see requests for the domain, it's a local Traefik config issue. + If you see **ZERO requests**, traffic is not reaching this server. + +2. **Verify DNS**: + `host 777wolfpack.runfoo.run` + Compare the returned IP with the server's public IP. + - **Match**: Routing issue is local. + - **Mismatch**: You are hitting a Wildcard (`*`) record pointing to a different server. + +3. **Run the Server Matrix**: + Use this script to audit exactly what the server thinks it is doing. + + ```bash + #!/bin/bash + # map_server.sh + echo "=== OPERATIONAL MATRIX ===" + echo "[1] NATIVE PORTS (Who owns 80/443?)" + sudo ss -tulpn | grep -E ':80|:443' + echo "" + echo "[2] VIRTUAL_HOST (Nginx Proxy Check)" + docker ps -q | xargs docker inspect --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}' + echo "" + echo "[3] TRAEFIK ROUTERS" + docker ps -q | xargs docker inspect --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule")}}{{$k}}={{$v}}{{end}}{{end}}' + ``` + +## The Fix + +1. Go to **Cloudflare DNS**. +2. Add a specific **A Record** for the missing subdomain. +3. Point it to the **Correct Server IP**. +4. Wait 1 minute. diff --git a/docs/troubleshooting-dns-wildcards.md b/docs/troubleshooting-dns-wildcards.md new file mode 100644 index 0000000..f09fd4a --- /dev/null +++ b/docs/troubleshooting-dns-wildcards.md @@ -0,0 +1,59 @@ +# Troubleshooting: The Cloudflare Wildcard Trap + +**Symptom**: New subdomains (e.g., `newapp.runfoo.run`) redirect to an incorrect service (e.g., Alertmanager) or a different server entirely, despite correct local configuration (Traefik labels). Traefik logs show **zero** evidence of the request reaching the server. + +**Root Cause**: **Missing A-Record + Wildcard Fallback**. +Cloudflare DNS has a Wildcard (`*`) record pointing to a *Legacy* or *Different* server IP. When you deploy a new app but forget to add its specific `A` record, Cloudflare falls back to the Wildcard and routes traffic to the wrong server. + +**Diagnosis Steps**: + +1. **Trace Request**: + `curl -H "User-Agent: TRACE_TEST" https://your-domain.runfoo.run` + Check `docker logs traefik | grep TRACE_TEST`. + - **Logs Found**: Issue is local (Traefik config). + - **No Logs**: Issue is upstream (DNS/Cloudflare). +2. **Verify DNS Resolution**: + `host your-domain.runfoo.run` + Compare the IP with your target server IP (`216.158.230.94` for Nexus-Vector). + - **Mismatch**: You are hitting the Wildcard IP. +3. **Operational Matrix**: + Use the following script to audit the server's Ingress state and prove the server is "Innocent". + +## Operational Matrix Script (`map_server.sh`) + +Save this as `map_server.sh` and run on the server to see what is really running. + +```bash +#!/bin/bash +echo "=== OPERATIONAL MATRIX: SERVER SIDE INGRESS ===" +echo "Generated at: $(date)" +echo "---------------------------------------------------" + +echo "[1] NATIVE PORTS (Who owns Port 80/443?)" +sudo ss -tulpn | grep -E ':80|:443' | awk '{print $1, $5, $7}' +echo "" + +echo "[2] DOCKER CONTAINERS (Name + IP + Ports)" +docker ps --format "table {{.Names}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}" +echo "" + +echo "[3] TRAEFIK ROUTERS (Label: traefik.http.routers.*.rule)" +docker inspect $(docker ps -q) --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule") (eq $k "traefik.http.routers.aspirant-api.rule")}}{{$k}}={{$v}}{{end}}{{end}}' | grep "rule=" +echo "" + +echo "[4] NGINX PROXY (Environment: VIRTUAL_HOST)" +docker inspect $(docker ps -q) --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}' | grep VIRTUAL_HOST +echo "" + +echo "---------------------------------------------------" +echo "MATRIX COMPLETE." +``` + +## Solution + +1. Log in to Cloudflare. +2. Add an **A Record** for the specific subdomain. + - **Name**: `subdomain` (e.g., `777wolfpack`) + - **Content**: `216.158.230.94` (Nexus-Vector IP) + - **Proxy**: DNS Only (Grey) or Proxied (Orange). +3. Wait for propagation.