Files

root afbf075d26 docs: security sweep, code review, Yerin design, Plan 6 brainstorm brief

- security-sweep-2026-06-01.md: fresh sweep of alpha.6 (1 fixed, findings + carry-overs)
- code-review-2026-06-01.md: optimisation/cleanliness notes (pool error handler,
  O(n) bcrypt token scan, FTS index alignment, dup auth parsing)
- yerin-security-agent.md: security-agent design + tool roadmap + Orthos role proposal
- plan-6-brainstorm-brief.md: Sacred Valley widget inventory + open design questions
- security-followups.md: marked the pending_changes CHECK finding RESOLVED

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-01 23:26:46 +10:00

3.3 KiB

Raw Blame History

Void 2.0 — Code Review & Optimisation Notes (2026-06-01, alpha.6)

Full read-through of lib/ (~4.2k LOC), the public SPA, and the Python workers. Overall the codebase is in good shape: ~1:1 test-to-code ratio, consistent repo + audit pattern, clean tool registry, disciplined safe-DOM and SSRF handling. Items below are improvements, ordered by value. Severity ≠ urgency — most are "before scale / before more agents", not "broken now".

Correctness / robustness

pool.js has no error handler or statement timeout. lib/db/pool.js creates a pg.Pool with no pool.on('error', …). An error on an idle pooled client (e.g. DB restart, replication failover on the .215 cluster) emits an error event with no listener → process crash. Add a logging error handler. Also consider statement_timeout / query_timeout so a pathological query can't pin a connection indefinitely (the pool is only max: 10).
applyPendingChange's upsert arm assumes a refs-shaped repo. After today's fix, case 'upsert' calls repo.upsertByExternal(...). Only refsRepo has that method; the action only originates from the refs route so it's safe today, but it's an implicit coupling. Cheap guard: assert typeof repo.upsertByExternal === 'function' and throw a clear ValidationError otherwise, so a future entity emitting upsert fails loud, not with a bare TypeError.

Performance / scale

verifyToken is O(n·bcrypt) per request. (Also in the security sweep.) Loads every non-revoked token and bcrypt-compares each. Fine at 1–2 agents, quadratic-feeling as agents grow. Selector+verifier split makes it O(1). This is the single highest-value optimisation once Yerin/others get their own tokens.
Hybrid-search FTS uses inline to_tsvector(...) expressions. search.js computes to_tsvector('english', title || …) at query time per branch. This only avoids a seq scan if a GIN index on the identical expression exists for each table. Verify the expression indexes match these exact expressions (incl. the coalesce(...) shapes); a drift means silent seq scans. Worth a EXPLAIN check on prod-sized data.

Cleanliness / maintainability

owner.js and agent_auth.js duplicate bearer parsing + owner check. Both split Authorization, both compare OWNER_TOKEN. Now that both use timingSafeStrEqual, factor the "parse bearer + is-owner?" step into one helper to keep the two in lockstep.
Doc/code name drift. Completion docs reference divertToPending / applyPendingChange — those are now the real names (good), but earlier notes also referenced a n() export. Keep docs/plan-*-complete.md symbol names in sync with the code; stale symbol names cost a grep on every revisit.
context tool SELECT *. Project an explicit column list (see security sweep LOW) — also makes the tool's output contract stable for the UI.

Things that are good (keep doing)

Tool registry (createRegistry) is a clean extension point — the new securityRegistry reused it with zero changes.
Audit emission + recursive redaction is consistent across repos.
SSRF safeFetch is genuinely well built (DNS-pin TOCTOU defeat + per-hop re-validation).
Test isolation on void_test with a prod-guard in tests/helpers/setup.js.

3.3 KiB Raw Blame History Unescape Escape