elmeg-demo/docs/INFRASTRUCTURE_STANDARDS.md

4 KiB

Infrastructure Standards & Best Practices

Overview

This document outlines the standardized infrastructure patterns and best practices for the Elmeg and Ersen ecosystem. It is intended for future agents and developers to ensure consistency and avoid common pitfalls during deployment and configuration.

1. Routing & Load Balancing (Traefik)

Architecture

The production server (tangible-aacorn) uses Traefik as the reverse proxy and load balancer. Unlike standard Docker deployments that often rely solely on container labels, this environment uses a hybrid approach with a strong preference for static configuration files for critical routing.

Critical Anti-Pattern: Docker Labels vs. Static Routes

Important

Do not rely solely on docker-compose.yml labels for routing public traffic on the production server.

While the Traefik Docker provider is enabled, the primary routing logic for ersen.xyz, elmeg.xyz, and their subdomains is managed via a centralized static routes file. Relying only on labels can lead to 404s if the static configuration takes precedence or if the Docker socket connection is flaky.

Standard Procedure: Adding a New Subdomain

To expose a new service (e.g., stats.elmeg.xyz), you must:

  1. Define the Service in Docker: Ensure your container exposes the correct port internally (e.g., 3000).
  2. Update the Central Routes File:
    • File Path: /srv/containers/ersen/traefik-routes.yml (on tangible-aacorn)
    • Action: Add a new Router and Service definition.

Example Configuration (traefik-routes.yml)

http:
  routers:
    # ... existing routers ...
    my-new-service-router:
      rule: "Host(`new.elmeg.xyz`)"
      service: my-new-service
      entryPoints:
        - websecure
      tls:
        certResolver: letsencrypt

  services:
    # ... existing services ...
    my-new-service:
      loadBalancer:
        servers:
          - url: "http://container-name:port" 
  1. Reloading: Traefik is configured to watch this file (--providers.file.watch=true), so changes should apply automatically. If not, restart the Traefik container.

2. Deployment Standards

Directory Structure

All services reside in /srv/containers/.

  • Repo Root: /srv/containers/<project-name> (e.g., /srv/containers/elmeg-demo)
  • Ownership: All files must be owned by the deploy user.
    • Fix command: sudo chown -R deploy:deploy /srv/containers/<project-name>

Git-Based Deployment

We use a Pull-based workflow on the server:

  1. Commit & Push changes to the remote configuration (Forgejo).

  2. SSH into Server: ssh tangible-aacorn

  3. Pull & Restart:

    cd /srv/containers/<project-name>
    git pull
    docker compose up -d --build <service-name>
    

Docker Compose

  • Version: Use services top-level key. Avoid the deprecated version: '3.x' line in new files.
  • Networking:
    • Services sharing the same proxy must be on the same external network (usually ersern_traefik-public or traefik).
    • Define this network as external: true at the bottom of docker-compose.yml.

3. Environment Variables

  • Production: Use a .env file in the project root on the server.
  • Security: Never commit .env files to the repository.
  • Variables:
    • DATABASE_URL: Full connection string for Prisma/Postgres.
    • INTERNAL_API_URL: Used for SSR (Server-Side Rendering) to talk to the backend within the Docker network.
    • NEXT_PUBLIC_API_URL: Used for the client-side browser to talk to the backend (via public URL).

4. Troubleshooting Checklist

If a service is returning 404:

  1. Check DNS: Is the subdomain pointing to the server IP?
  2. Check Traefik Logs: docker logs ersen-traefik-1 --tail 50
  3. Verify Routes File: Cat the traefik-routes.yml to see if your route exists.
  4. Verify Network: Inspect the service container to ensure it's connected to the traefik network.