docs: add infrastructure standards and best practices

This commit is contained in:
fullsizemalt 2025-12-24 10:11:12 -08:00
parent e94cb91010
commit 687e093ed9

View file

@ -0,0 +1,100 @@
# Infrastructure Standards & Best Practices
## Overview
This document outlines the standardized infrastructure patterns and best practices for the **Elmeg** and **Ersen** ecosystem. It is intended for future agents and developers to ensure consistency and avoid common pitfalls during deployment and configuration.
## 1. Routing & Load Balancing (Traefik)
### Architecture
The production server (`tangible-aacorn`) uses **Traefik** as the reverse proxy and load balancer. Unlike standard Docker deployments that often rely solely on container labels, this environment uses a **hybrid approach** with a strong preference for **static configuration files** for critical routing.
### Critical Anti-Pattern: Docker Labels vs. Static Routes
>
> [!IMPORTANT]
> **Do not rely solely on `docker-compose.yml` labels for routing public traffic on the production server.**
>
> While the Traefik Docker provider is enabled, the primary routing logic for `ersen.xyz`, `elmeg.xyz`, and their subdomains is managed via a **centralized static routes file**. Relying only on labels can lead to 404s if the static configuration takes precedence or if the Docker socket connection is flaky.
### Standard Procedure: Adding a New Subdomain
To expose a new service (e.g., `stats.elmeg.xyz`), you must:
1. **Define the Service in Docker**: Ensure your container exposes the correct port internally (e.g., `3000`).
2. **Update the Central Routes File**:
* **File Path**: `/srv/containers/ersen/traefik-routes.yml` (on `tangible-aacorn`)
* **Action**: Add a new `Router` and `Service` definition.
#### Example Configuration (`traefik-routes.yml`)
```yaml
http:
routers:
# ... existing routers ...
my-new-service-router:
rule: "Host(`new.elmeg.xyz`)"
service: my-new-service
entryPoints:
- websecure
tls:
certResolver: letsencrypt
services:
# ... existing services ...
my-new-service:
loadBalancer:
servers:
- url: "http://container-name:port"
```
1. **Reloading**: Traefik is configured to watch this file (`--providers.file.watch=true`), so changes should apply automatically. If not, restart the Traefik container.
## 2. Deployment Standards
### Directory Structure
All services reside in `/srv/containers/`.
* **Repo Root**: `/srv/containers/<project-name>` (e.g., `/srv/containers/elmeg-demo`)
* **Ownership**: All files must be owned by the `deploy` user.
* Fix command: `sudo chown -R deploy:deploy /srv/containers/<project-name>`
### Git-Based Deployment
We use a **Pull-based** workflow on the server:
1. **Commit & Push** changes to the remote configuration (Forgejo).
2. **SSH into Server**: `ssh tangible-aacorn`
3. **Pull & Restart**:
```bash
cd /srv/containers/<project-name>
git pull
docker compose up -d --build <service-name>
```
### Docker Compose
* **Version**: Use `services` top-level key. Avoid the deprecated `version: '3.x'` line in new files.
* **Networking**:
* Services sharing the same proxy must be on the same external network (usually `ersern_traefik-public` or `traefik`).
* Define this network as `external: true` at the bottom of `docker-compose.yml`.
## 3. Environment Variables
* **Production**: Use a `.env` file in the project root on the server.
* **Security**: Never commit `.env` files to the repository.
* **Variables**:
* **DATABASE_URL**: Full connection string for Prisma/Postgres.
* **INTERNAL_API_URL**: Used for SSR (Server-Side Rendering) to talk to the backend within the Docker network.
* **NEXT_PUBLIC_API_URL**: Used for the client-side browser to talk to the backend (via public URL).
## 4. Troubleshooting Checklist
If a service is returning 404:
1. **Check DNS**: Is the subdomain pointing to the server IP?
2. **Check Traefik Logs**: `docker logs ersen-traefik-1 --tail 50`
3. **Verify Routes File**: Cat the `traefik-routes.yml` to see if your route exists.
4. **Verify Network**: Inspect the service container to ensure it's connected to the `traefik` network.