diff --git a/docs/INFRASTRUCTURE_STANDARDS.md b/docs/INFRASTRUCTURE_STANDARDS.md new file mode 100644 index 0000000..8a75f24 --- /dev/null +++ b/docs/INFRASTRUCTURE_STANDARDS.md @@ -0,0 +1,100 @@ +# Infrastructure Standards & Best Practices + +## Overview + +This document outlines the standardized infrastructure patterns and best practices for the **Elmeg** and **Ersen** ecosystem. It is intended for future agents and developers to ensure consistency and avoid common pitfalls during deployment and configuration. + +## 1. Routing & Load Balancing (Traefik) + +### Architecture + +The production server (`tangible-aacorn`) uses **Traefik** as the reverse proxy and load balancer. Unlike standard Docker deployments that often rely solely on container labels, this environment uses a **hybrid approach** with a strong preference for **static configuration files** for critical routing. + +### Critical Anti-Pattern: Docker Labels vs. Static Routes +> +> [!IMPORTANT] +> **Do not rely solely on `docker-compose.yml` labels for routing public traffic on the production server.** +> +> While the Traefik Docker provider is enabled, the primary routing logic for `ersen.xyz`, `elmeg.xyz`, and their subdomains is managed via a **centralized static routes file**. Relying only on labels can lead to 404s if the static configuration takes precedence or if the Docker socket connection is flaky. + +### Standard Procedure: Adding a New Subdomain + +To expose a new service (e.g., `stats.elmeg.xyz`), you must: + +1. **Define the Service in Docker**: Ensure your container exposes the correct port internally (e.g., `3000`). +2. **Update the Central Routes File**: + * **File Path**: `/srv/containers/ersen/traefik-routes.yml` (on `tangible-aacorn`) + * **Action**: Add a new `Router` and `Service` definition. + +#### Example Configuration (`traefik-routes.yml`) + +```yaml +http: + routers: + # ... existing routers ... + my-new-service-router: + rule: "Host(`new.elmeg.xyz`)" + service: my-new-service + entryPoints: + - websecure + tls: + certResolver: letsencrypt + + services: + # ... existing services ... + my-new-service: + loadBalancer: + servers: + - url: "http://container-name:port" +``` + +1. **Reloading**: Traefik is configured to watch this file (`--providers.file.watch=true`), so changes should apply automatically. If not, restart the Traefik container. + +## 2. Deployment Standards + +### Directory Structure + +All services reside in `/srv/containers/`. + +* **Repo Root**: `/srv/containers/` (e.g., `/srv/containers/elmeg-demo`) +* **Ownership**: All files must be owned by the `deploy` user. + * Fix command: `sudo chown -R deploy:deploy /srv/containers/` + +### Git-Based Deployment + +We use a **Pull-based** workflow on the server: + +1. **Commit & Push** changes to the remote configuration (Forgejo). +2. **SSH into Server**: `ssh tangible-aacorn` +3. **Pull & Restart**: + + ```bash + cd /srv/containers/ + git pull + docker compose up -d --build + ``` + +### Docker Compose + +* **Version**: Use `services` top-level key. Avoid the deprecated `version: '3.x'` line in new files. +* **Networking**: + * Services sharing the same proxy must be on the same external network (usually `ersern_traefik-public` or `traefik`). + * Define this network as `external: true` at the bottom of `docker-compose.yml`. + +## 3. Environment Variables + +* **Production**: Use a `.env` file in the project root on the server. +* **Security**: Never commit `.env` files to the repository. +* **Variables**: + * **DATABASE_URL**: Full connection string for Prisma/Postgres. + * **INTERNAL_API_URL**: Used for SSR (Server-Side Rendering) to talk to the backend within the Docker network. + * **NEXT_PUBLIC_API_URL**: Used for the client-side browser to talk to the backend (via public URL). + +## 4. Troubleshooting Checklist + +If a service is returning 404: + +1. **Check DNS**: Is the subdomain pointing to the server IP? +2. **Check Traefik Logs**: `docker logs ersen-traefik-1 --tail 50` +3. **Verify Routes File**: Cat the `traefik-routes.yml` to see if your route exists. +4. **Verify Network**: Inspect the service container to ensure it's connected to the `traefik` network.