Files
Void-Homelab/docs/code-review-2026-06-01.md

3.7 KiB
Raw Permalink Blame History

Void 2.0 — Code Review & Optimisation Notes (2026-06-01, alpha.6)

Full read-through of lib/ (~4.2k LOC), the public SPA, and the Python workers. Overall the codebase is in good shape: ~1:1 test-to-code ratio, consistent repo + audit pattern, clean tool registry, disciplined safe-DOM and SSRF handling. Items below are improvements, ordered by value. Severity ≠ urgency — most are "before scale / before more agents", not "broken now".

Resolved 2026-06-02: #1 (pool error handler), #2 (upsert-arm guard), #3 (verifyToken O(1) selector+verifier), and #7 (context column allow-list) are done — see the security sweep doc + their tests. Remaining open: #4 (FTS index alignment — needs a prod EXPLAIN), #5 (dedupe bearer parsing), #6 (doc/code symbol drift).

Correctness / robustness

  1. pool.js has no error handler or statement timeout. lib/db/pool.js creates a pg.Pool with no pool.on('error', …). An error on an idle pooled client (e.g. DB restart, replication failover on the .215 cluster) emits an error event with no listener → process crash. Add a logging error handler. Also consider statement_timeout / query_timeout so a pathological query can't pin a connection indefinitely (the pool is only max: 10).

  2. applyPendingChange's upsert arm assumes a refs-shaped repo. After today's fix, case 'upsert' calls repo.upsertByExternal(...). Only refsRepo has that method; the action only originates from the refs route so it's safe today, but it's an implicit coupling. Cheap guard: assert typeof repo.upsertByExternal === 'function' and throw a clear ValidationError otherwise, so a future entity emitting upsert fails loud, not with a bare TypeError.

Performance / scale

  1. verifyToken is O(n·bcrypt) per request. (Also in the security sweep.) Loads every non-revoked token and bcrypt-compares each. Fine at 12 agents, quadratic-feeling as agents grow. Selector+verifier split makes it O(1). This is the single highest-value optimisation once Yerin/others get their own tokens.

  2. Hybrid-search FTS uses inline to_tsvector(...) expressions. search.js computes to_tsvector('english', title || …) at query time per branch. This only avoids a seq scan if a GIN index on the identical expression exists for each table. Verify the expression indexes match these exact expressions (incl. the coalesce(...) shapes); a drift means silent seq scans. Worth a EXPLAIN check on prod-sized data.

Cleanliness / maintainability

  1. owner.js and agent_auth.js duplicate bearer parsing + owner check. Both split Authorization, both compare OWNER_TOKEN. Now that both use timingSafeStrEqual, factor the "parse bearer + is-owner?" step into one helper to keep the two in lockstep.

  2. Doc/code name drift. Completion docs reference divertToPending / applyPendingChange — those are now the real names (good), but earlier notes also referenced a n() export. Keep docs/plan-*-complete.md symbol names in sync with the code; stale symbol names cost a grep on every revisit.

  3. context tool SELECT *. Project an explicit column list (see security sweep LOW) — also makes the tool's output contract stable for the UI.

Things that are good (keep doing)

  • Tool registry (createRegistry) is a clean extension point — the new securityRegistry reused it with zero changes.
  • Audit emission + recursive redaction is consistent across repos.
  • SSRF safeFetch is genuinely well built (DNS-pin TOCTOU defeat + per-hop re-validation).
  • Test isolation on void_test with a prod-guard in tests/helpers/setup.js.