# Backend (Rust)

## Architecture

- Handlers: thin. Only extract the request and format the response.
- Services: business logic.
- DTOs: strict separation between DB models and API structs.
- State: no global mutable state. Use `Arc<AppState>` via DI.

## Error handling

- `unwrap()` BANNED in production
- Use `?` with mapped error types (thiserror/anyhow)
- `expect("context")` only with a justified reason

## Async

- No blocking operations inside `async` without `spawn_blocking`
- No `std::thread::sleep` in async handlers

## Abstraction

- Don't create Traits with a single implementation
- Refactor into a Trait only when a second implementation arrives

## Logging

- No `println!` / `eprintln!`
- Use `tracing` or `log`

## Health endpoint

GET /health or /v1/health

Dynamic version: `env!("CARGO_PKG_VERSION")`

---

## Connection pool (DB)

Mandatory rules for SeaORM / SQLx connection pools:

- **max_connections**: proportional to RAM. Rule of thumb: `(RAM_MB / 10)` max.
  - 256MB → max 20
  - 512MB → max 30
  - 1GB → max 50
- **min_connections**: 1–3 (no more, avoids unnecessary idle connections)
- **idle_timeout**: 60–120 seconds (NOT 300 s — idle connections cost memory)
- **max_lifetime**: 1800 seconds (30 min, NOT 1 hour)
- **acquire_timeout**: 5–10 seconds (NOT 30 s — fail fast when the pool is saturated)
- **connect_timeout**: 10 seconds max

Correct example for 512MB:

    opt.max_connections(30)
        .min_connections(2)
        .connect_timeout(Duration::from_secs(10))
        .acquire_timeout(Duration::from_secs(5))
        .idle_timeout(Duration::from_secs(120))
        .max_lifetime(Duration::from_secs(1800))

---

## Request timeouts

Every API MUST have timeouts at the HTTP layers:

- **Request timeout**: 30 seconds max per request (`tower::timeout::TimeoutLayer`)
- **TCP keepalive**: enable with `tcp_keepalive(Duration::from_secs(60))`
- **HTTP keepalive**: cap idle server connections

Mandatory in main.rs:

    use tower::timeout::TimeoutLayer;
    use std::time::Duration;

    let app = router
        .layer(TimeoutLayer::new(Duration::from_secs(30)));

    axum::serve(listener, app)
        .tcp_keepalive(Some(Duration::from_secs(60)))
        .tcp_nodelay(true)

---

## Graceful shutdown

Every API MUST handle SIGTERM to shut down cleanly:

    let (tx, rx) = tokio::sync::oneshot::channel::<()>();

    tokio::spawn(async move {
        tokio::signal::ctrl_c().await.ok();
        tx.send(()).ok();
    });

    axum::serve(listener, app)
        .with_graceful_shutdown(async { rx.await.ok(); })
        .await?;

This guarantees:

- In-flight requests complete before close
- DB connections are released cleanly
- Fly.io receives a clean shutdown signal

---

## Resource limits (Fly.io)

Sizing rules:

- **512MB RAM**: max 30 DB connections, request timeout 30 s
- **1GB RAM**: max 50 DB connections, request timeout 30 s
- **2GB RAM**: max 100 DB connections, request timeout 60 s

Always monitor:

- Active TCP connections: `cat /proc/net/tcp | wc -l`
- Free memory: `cat /proc/meminfo | grep MemFree`
- If TCP > max_connections × 3 → there's a connection leak

---

## Advanced health endpoint

The health endpoint MUST report the connection-pool state:

    {
        "status": "ok",
        "version": "0.2.0",
        "uptime_seconds": 3600,
        "checks": {
            "database": {
                "status": "ok",
                "latency_ms": 5,
                "pool_size": 30,
                "pool_idle": 25,
                "pool_in_use": 5
            }
        }
    }

If `pool_idle < 2` or `pool_in_use > 80%` of `pool_size` → early warning.
