docs: Add agent-optimized debugging workflows for DNS/Routing
This commit is contained in:
parent
1cb84fa898
commit
28d8e9e4a2
2 changed files with 103 additions and 0 deletions
44
.agent/workflows/debug-dns-routing.md
Normal file
44
.agent/workflows/debug-dns-routing.md
Normal file
|
|
@ -0,0 +1,44 @@
|
||||||
|
---
|
||||||
|
description: Debug and fix Traefik routing issues where the wrong app (e.g., Alertmanager) is served, indicating an upstream DNS/Cloudflare Wildcard conflict.
|
||||||
|
---
|
||||||
|
|
||||||
|
# Debugging DNS & Routing Conflicts (The Wildcard Trap)
|
||||||
|
|
||||||
|
If a subdomain (e.g., `777wolfpack.runfoo.run`) is serving the wrong application (like Alertmanager) and Traefik logs show NO activity for that domain, you are likely hitting a **Cloudflare Wildcard Fallback**.
|
||||||
|
|
||||||
|
## Diagnosis Steps
|
||||||
|
|
||||||
|
1. **Check Traefik Logs**:
|
||||||
|
`docker logs traefik --tail 50`
|
||||||
|
If you see requests for the domain, it's a local Traefik config issue.
|
||||||
|
If you see **ZERO requests**, traffic is not reaching this server.
|
||||||
|
|
||||||
|
2. **Verify DNS**:
|
||||||
|
`host 777wolfpack.runfoo.run`
|
||||||
|
Compare the returned IP with the server's public IP.
|
||||||
|
- **Match**: Routing issue is local.
|
||||||
|
- **Mismatch**: You are hitting a Wildcard (`*`) record pointing to a different server.
|
||||||
|
|
||||||
|
3. **Run the Server Matrix**:
|
||||||
|
Use this script to audit exactly what the server thinks it is doing.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
# map_server.sh
|
||||||
|
echo "=== OPERATIONAL MATRIX ==="
|
||||||
|
echo "[1] NATIVE PORTS (Who owns 80/443?)"
|
||||||
|
sudo ss -tulpn | grep -E ':80|:443'
|
||||||
|
echo ""
|
||||||
|
echo "[2] VIRTUAL_HOST (Nginx Proxy Check)"
|
||||||
|
docker ps -q | xargs docker inspect --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}'
|
||||||
|
echo ""
|
||||||
|
echo "[3] TRAEFIK ROUTERS"
|
||||||
|
docker ps -q | xargs docker inspect --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule")}}{{$k}}={{$v}}{{end}}{{end}}'
|
||||||
|
```
|
||||||
|
|
||||||
|
## The Fix
|
||||||
|
|
||||||
|
1. Go to **Cloudflare DNS**.
|
||||||
|
2. Add a specific **A Record** for the missing subdomain.
|
||||||
|
3. Point it to the **Correct Server IP**.
|
||||||
|
4. Wait 1 minute.
|
||||||
59
docs/troubleshooting-dns-wildcards.md
Normal file
59
docs/troubleshooting-dns-wildcards.md
Normal file
|
|
@ -0,0 +1,59 @@
|
||||||
|
# Troubleshooting: The Cloudflare Wildcard Trap
|
||||||
|
|
||||||
|
**Symptom**: New subdomains (e.g., `newapp.runfoo.run`) redirect to an incorrect service (e.g., Alertmanager) or a different server entirely, despite correct local configuration (Traefik labels). Traefik logs show **zero** evidence of the request reaching the server.
|
||||||
|
|
||||||
|
**Root Cause**: **Missing A-Record + Wildcard Fallback**.
|
||||||
|
Cloudflare DNS has a Wildcard (`*`) record pointing to a *Legacy* or *Different* server IP. When you deploy a new app but forget to add its specific `A` record, Cloudflare falls back to the Wildcard and routes traffic to the wrong server.
|
||||||
|
|
||||||
|
**Diagnosis Steps**:
|
||||||
|
|
||||||
|
1. **Trace Request**:
|
||||||
|
`curl -H "User-Agent: TRACE_TEST" https://your-domain.runfoo.run`
|
||||||
|
Check `docker logs traefik | grep TRACE_TEST`.
|
||||||
|
- **Logs Found**: Issue is local (Traefik config).
|
||||||
|
- **No Logs**: Issue is upstream (DNS/Cloudflare).
|
||||||
|
2. **Verify DNS Resolution**:
|
||||||
|
`host your-domain.runfoo.run`
|
||||||
|
Compare the IP with your target server IP (`216.158.230.94` for Nexus-Vector).
|
||||||
|
- **Mismatch**: You are hitting the Wildcard IP.
|
||||||
|
3. **Operational Matrix**:
|
||||||
|
Use the following script to audit the server's Ingress state and prove the server is "Innocent".
|
||||||
|
|
||||||
|
## Operational Matrix Script (`map_server.sh`)
|
||||||
|
|
||||||
|
Save this as `map_server.sh` and run on the server to see what is really running.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
#!/bin/bash
|
||||||
|
echo "=== OPERATIONAL MATRIX: SERVER SIDE INGRESS ==="
|
||||||
|
echo "Generated at: $(date)"
|
||||||
|
echo "---------------------------------------------------"
|
||||||
|
|
||||||
|
echo "[1] NATIVE PORTS (Who owns Port 80/443?)"
|
||||||
|
sudo ss -tulpn | grep -E ':80|:443' | awk '{print $1, $5, $7}'
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "[2] DOCKER CONTAINERS (Name + IP + Ports)"
|
||||||
|
docker ps --format "table {{.Names}}\t{{.ID}}\t{{.Ports}}\t{{.Status}}"
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "[3] TRAEFIK ROUTERS (Label: traefik.http.routers.*.rule)"
|
||||||
|
docker inspect $(docker ps -q) --format '{{.Name}} {{range $k, $v := .Config.Labels}}{{if or (eq $k "traefik.http.routers.wolfpack-frontend.rule") (eq $k "traefik.http.routers.aspirant-dashboard.rule") (eq $k "traefik.http.routers.aspirant-api.rule")}}{{$k}}={{$v}}{{end}}{{end}}' | grep "rule="
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "[4] NGINX PROXY (Environment: VIRTUAL_HOST)"
|
||||||
|
docker inspect $(docker ps -q) --format '{{.Name}} {{range $e := .Config.Env}}{{if ge (len $e) 12}}{{if eq (slice $e 0 12) "VIRTUAL_HOST"}} {{$e}} {{end}}{{end}}{{end}}' | grep VIRTUAL_HOST
|
||||||
|
echo ""
|
||||||
|
|
||||||
|
echo "---------------------------------------------------"
|
||||||
|
echo "MATRIX COMPLETE."
|
||||||
|
```
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
1. Log in to Cloudflare.
|
||||||
|
2. Add an **A Record** for the specific subdomain.
|
||||||
|
- **Name**: `subdomain` (e.g., `777wolfpack`)
|
||||||
|
- **Content**: `216.158.230.94` (Nexus-Vector IP)
|
||||||
|
- **Proxy**: DNS Only (Grey) or Proxied (Orange).
|
||||||
|
3. Wait for propagation.
|
||||||
Loading…
Add table
Reference in a new issue