docs: security sweep, code review, Yerin design, Plan 6 brainstorm brief

- security-sweep-2026-06-01.md: fresh sweep of alpha.6 (1 fixed, findings + carry-overs) - code-review-2026-06-01.md: optimisation/cleanliness notes (pool error handler, O(n) bcrypt token scan, FTS index alignment, dup auth parsing) - yerin-security-agent.md: security-agent design + tool roadmap + Orthos role proposal - plan-6-brainstorm-brief.md: Sacred Valley widget inventory + open design questions - security-followups.md: marked the pending_changes CHECK finding RESOLVED Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 23:26:46 +10:00
parent 6c393d8069
commit afbf075d26
4 changed files with 323 additions and 0 deletions
--- a/docs/code-review-2026-06-01.md
+++ b/docs/code-review-2026-06-01.md
@@ -0,0 +1,62 @@
+# Void 2.0 — Code Review & Optimisation Notes (2026-06-01, alpha.6)
+
+Full read-through of `lib/` (~4.2k LOC), the public SPA, and the Python workers.
+Overall the codebase is in good shape: ~1:1 test-to-code ratio, consistent repo +
+audit pattern, clean tool registry, disciplined safe-DOM and SSRF handling. Items
+below are improvements, ordered by value. Severity ≠ urgency — most are "before
+scale / before more agents", not "broken now".
+
+## Correctness / robustness
+
+1. **`pool.js` has no error handler or statement timeout.** `lib/db/pool.js`
+   creates a `pg.Pool` with no `pool.on('error', …)`. An error on an *idle*
+   pooled client (e.g. DB restart, replication failover on the .215 cluster)
+   emits an `error` event with no listener → **process crash**. Add a logging
+   error handler. Also consider `statement_timeout` / `query_timeout` so a
+   pathological query can't pin a connection indefinitely (the pool is only
+   `max: 10`).
+
+2. **`applyPendingChange`'s `upsert` arm assumes a refs-shaped repo.** After
+   today's fix, `case 'upsert'` calls `repo.upsertByExternal(...)`. Only
+   `refsRepo` has that method; the action only originates from the refs route so
+   it's safe today, but it's an implicit coupling. Cheap guard: assert
+   `typeof repo.upsertByExternal === 'function'` and throw a clear
+   `ValidationError` otherwise, so a future entity emitting `upsert` fails loud,
+   not with a bare `TypeError`.
+
+## Performance / scale
+
+3. **`verifyToken` is O(n·bcrypt) per request.** (Also in the security sweep.)
+   Loads every non-revoked token and bcrypt-compares each. Fine at 1–2 agents,
+   quadratic-feeling as agents grow. Selector+verifier split makes it O(1). This
+   is the single highest-value optimisation once Yerin/others get their own tokens.
+
+4. **Hybrid-search FTS uses inline `to_tsvector(...)` expressions.** `search.js`
+   computes `to_tsvector('english', title || …)` at query time per branch. This
+   only avoids a seq scan if a **GIN index on the identical expression** exists
+   for each table. Verify the expression indexes match these exact expressions
+   (incl. the `coalesce(...)` shapes); a drift means silent seq scans. Worth a
+   `EXPLAIN` check on prod-sized data.
+
+## Cleanliness / maintainability
+
+5. **`owner.js` and `agent_auth.js` duplicate bearer parsing + owner check.**
+   Both split `Authorization`, both compare `OWNER_TOKEN`. Now that both use
+   `timingSafeStrEqual`, factor the "parse bearer + is-owner?" step into one
+   helper to keep the two in lockstep.
+
+6. **Doc/code name drift.** Completion docs reference `divertToPending` /
+   `applyPendingChange` — those are now the real names (good), but earlier notes
+   also referenced a `n()` export. Keep `docs/plan-*-complete.md` symbol names in
+   sync with the code; stale symbol names cost a grep on every revisit.
+
+7. **`context` tool `SELECT *`.** Project an explicit column list (see security
+   sweep LOW) — also makes the tool's output contract stable for the UI.
+
+## Things that are good (keep doing)
+- Tool registry (`createRegistry`) is a clean extension point — the new
+  `securityRegistry` reused it with zero changes.
+- Audit emission + recursive redaction is consistent across repos.
+- SSRF `safeFetch` is genuinely well built (DNS-pin TOCTOU defeat + per-hop
+  re-validation).
+- Test isolation on `void_test` with a prod-guard in `tests/helpers/setup.js`.