docs: security sweep, code review, Yerin design, Plan 6 brainstorm brief

- security-sweep-2026-06-01.md: fresh sweep of alpha.6 (1 fixed, findings + carry-overs)
- code-review-2026-06-01.md: optimisation/cleanliness notes (pool error handler,
  O(n) bcrypt token scan, FTS index alignment, dup auth parsing)
- yerin-security-agent.md: security-agent design + tool roadmap + Orthos role proposal
- plan-6-brainstorm-brief.md: Sacred Valley widget inventory + open design questions
- security-followups.md: marked the pending_changes CHECK finding RESOLVED

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
root
2026-06-01 23:26:46 +10:00
parent 6c393d8069
commit afbf075d26
4 changed files with 323 additions and 0 deletions

View File

@@ -0,0 +1,62 @@
# Void 2.0 — Code Review & Optimisation Notes (2026-06-01, alpha.6)
Full read-through of `lib/` (~4.2k LOC), the public SPA, and the Python workers.
Overall the codebase is in good shape: ~1:1 test-to-code ratio, consistent repo +
audit pattern, clean tool registry, disciplined safe-DOM and SSRF handling. Items
below are improvements, ordered by value. Severity ≠ urgency — most are "before
scale / before more agents", not "broken now".
## Correctness / robustness
1. **`pool.js` has no error handler or statement timeout.** `lib/db/pool.js`
creates a `pg.Pool` with no `pool.on('error', …)`. An error on an *idle*
pooled client (e.g. DB restart, replication failover on the .215 cluster)
emits an `error` event with no listener → **process crash**. Add a logging
error handler. Also consider `statement_timeout` / `query_timeout` so a
pathological query can't pin a connection indefinitely (the pool is only
`max: 10`).
2. **`applyPendingChange`'s `upsert` arm assumes a refs-shaped repo.** After
today's fix, `case 'upsert'` calls `repo.upsertByExternal(...)`. Only
`refsRepo` has that method; the action only originates from the refs route so
it's safe today, but it's an implicit coupling. Cheap guard: assert
`typeof repo.upsertByExternal === 'function'` and throw a clear
`ValidationError` otherwise, so a future entity emitting `upsert` fails loud,
not with a bare `TypeError`.
## Performance / scale
3. **`verifyToken` is O(n·bcrypt) per request.** (Also in the security sweep.)
Loads every non-revoked token and bcrypt-compares each. Fine at 12 agents,
quadratic-feeling as agents grow. Selector+verifier split makes it O(1). This
is the single highest-value optimisation once Yerin/others get their own tokens.
4. **Hybrid-search FTS uses inline `to_tsvector(...)` expressions.** `search.js`
computes `to_tsvector('english', title || …)` at query time per branch. This
only avoids a seq scan if a **GIN index on the identical expression** exists
for each table. Verify the expression indexes match these exact expressions
(incl. the `coalesce(...)` shapes); a drift means silent seq scans. Worth a
`EXPLAIN` check on prod-sized data.
## Cleanliness / maintainability
5. **`owner.js` and `agent_auth.js` duplicate bearer parsing + owner check.**
Both split `Authorization`, both compare `OWNER_TOKEN`. Now that both use
`timingSafeStrEqual`, factor the "parse bearer + is-owner?" step into one
helper to keep the two in lockstep.
6. **Doc/code name drift.** Completion docs reference `divertToPending` /
`applyPendingChange` — those are now the real names (good), but earlier notes
also referenced a `n()` export. Keep `docs/plan-*-complete.md` symbol names in
sync with the code; stale symbol names cost a grep on every revisit.
7. **`context` tool `SELECT *`.** Project an explicit column list (see security
sweep LOW) — also makes the tool's output contract stable for the UI.
## Things that are good (keep doing)
- Tool registry (`createRegistry`) is a clean extension point — the new
`securityRegistry` reused it with zero changes.
- Audit emission + recursive redaction is consistent across repos.
- SSRF `safeFetch` is genuinely well built (DNS-pin TOCTOU defeat + per-hop
re-validation).
- Test isolation on `void_test` with a prod-guard in `tests/helpers/setup.js`.