# AUDIT PROTOCOL: BACKEND RUST

**Target Stack:** Rust (Axum, SQLx, Tokio)
**Context:** Lean Code Philosophy
**Executor:** AI Agent / LLM
**Quality Target:** 90-95%

---

## 0. QUALITY PHILOSOPHY

Perfection does not exist. This audit targets **90-95% quality**:

- **90%** = Production-ready, no critical issues, minor improvements optional
- **95%** = Excellent, only cosmetic or future-proofing suggestions remain
- **100%** = Impossible; chasing it creates infinite loops

**Stop condition:** When score >= 90, the audit PASSES. Further improvements are optional and should be documented as "nice-to-have" for future iterations.

---

## 1. PRIME DIRECTIVE: LEAN CODE

Every line of code must verify its existence against:

1.  **Value**: Does this add direct user or business value?
2.  **Necessity**: Can this be done simpler? (KISS)
3.  **Uniqueness**: Is this logic repeated? (DRY)
4.  **Relevance**: Is this "Just in Case" code? (YAGNI) -> **DELETE IT**.

---

## 2. ARCHITECTURE & MODULARITY

### 2.1 Handlers (Controllers)

- **Rule**: Handlers must be "thin". They only extract requests and format responses.
- **Violation**: Complex business logic inside a handler function.
- **Acceptable**: Simple validation, ID generation, direct DB calls for CRUD operations.
- **Fix**: Move complex logic to a Service module only when complexity warrants it.

### 2.2 Data Transfer Objects (DTOs)

- **Rule**: Strict separation between Database Models (`FromRow` structs) and API DTOs (Request/Response structs).
- **Violation**: Returning a DB Entity directly as a JSON response.
- **Fix**: Create specific Response structs implementing `Serialize` with `From<Entity>`.

### 2.3 State Management

- **Rule**: No global mutable state (`static mut`, `lazy_static` with Mutex).
- **Fix**: Use dependency injection via `State<T>` or `Arc<AppState>` passed to the router.

---

## 3. CODE HYGIENE & SAFETY

### 3.1 Error Handling

- **Rule**: `unwrap()` is **PROHIBITED** in production code paths.
- **Violation**: `result.unwrap()` or `option.unwrap()`.
- **Acceptable**: `expect("descriptive message")` for startup/initialization that MUST succeed.
- **Fix**: Propagate errors using `?` with `anyhow::Result` or custom error types.

### 3.2 Async Hygiene

- **Rule**: No blocking operations (std::fs, heavy computation) inside `async` functions without `spawn_blocking`.
- **Violation**: `std::thread::sleep` inside an async handler.

### 3.3 Abstraction

- **Rule**: No Premature Abstraction.
- **Violation**: Creating a `Trait` that has only one implementation.
- **Fix**: Use a concrete struct. Refactor to a Trait only when a second implementation arises.

### 3.4 Macros

- **Rule**: Avoid macro magic unless it significantly reduces boilerplate (e.g., `Derive`).
- **Violation**: Custom macros that obscure control flow.

### 3.5 Dead Code

- **Rule**: No unused functions, structs, or imports.
- **Fix**: Delete or use `#[allow(dead_code)]` with justification comment if intentionally reserved.

### 3.6 Connection Pool Sizing

- **Rule**: DB pool `max_connections` must be proportional to available RAM: `RAM_MB / 10` max.
- **Violation**: `max_connections(100)` on a 512MB machine (should be ≤ 30).
- **Violation**: `idle_timeout > 120s` — idle connections waste memory.
- **Violation**: `acquire_timeout > 10s` — masks pool exhaustion instead of failing fast.
- **Violation**: `max_lifetime > 1800s` — stale connections accumulate.
- **Fix**: Apply limits from `stack/apis.md` Connection Pool section.
- **Severity**: **Critical** — causes OOM and connection exhaustion in production.

### 3.7 Request Timeouts

- **Rule**: Every API must have a global request timeout layer (tower::timeout::TimeoutLayer).
- **Violation**: No timeout layer → requests can hang forever, accumulating TCP connections.
- **Rule**: TCP keepalive must be enabled on the server socket.
- **Violation**: No `tcp_keepalive()` → dead connections never get cleaned up.
- **Fix**: Add `TimeoutLayer::new(Duration::from_secs(30))` and `tcp_keepalive(Some(Duration::from_secs(60)))`.
- **Severity**: **Critical** — direct cause of connection exhaustion (8000+ zombie TCP connections).

### 3.8 Graceful Shutdown

- **Rule**: API must handle SIGTERM/SIGINT with `with_graceful_shutdown()`.
- **Violation**: No signal handler → abrupt termination leaks DB connections and in-flight requests.
- **Fix**: Add `tokio::signal::ctrl_c()` handler with `with_graceful_shutdown()`.
- **Severity**: **Major** — causes connection leaks on Fly.io restarts/deploys.

### 3.9 Resource Monitoring

- **Rule**: Health endpoint must report connection pool metrics (pool_size, pool_idle, pool_in_use).
- **Violation**: Health endpoint only checks DB latency, not pool saturation.
- **Rule**: If `pool_in_use > 80%` of `pool_size`, health should return warning status.
- **Fix**: Query pool stats via SQLx pool methods and include in health response.
- **Severity**: **Major** — without pool metrics, exhaustion is invisible until crash.

---

## 4. GENERAL QUALITY

### 4.1 Zombie Code

- **Rule**: No commented-out code.
- **Fix**: Delete it. Git history preserves it.

### 4.2 Logging

- **Rule**: No `println!` or `eprintln!`.
- **Fix**: Use structured logging (`tracing`).

### 4.3 SQL Queries

- **Rule**: Use parameterized queries. Never string interpolation for user input.
- **Rule**: Use `QueryBuilder` for dynamic filters to avoid SQL injection.

---

## 5. AUDIT EXECUTION

1.  **Scan** `src/` recursively.
2.  **Identify** violations of the rules above.
3.  **Categorize** findings:
    - **Critical**: Security issues, unwrap in hot paths, SQL injection risk
    - **Major**: Architecture violations, missing error handling
    - **Minor**: Dead code, cosmetic issues
4.  **Score** the codebase (0-100):
    - < 80 = FAILED (critical issues must be fixed)
    - 80-89 = NEEDS WORK (major issues should be addressed)
    - 90-95 = PASSED (production-ready)
    - > 95 = EXCELLENT (no action required)
5.  **Report** findings with file:line references and specific fixes.
