docs: security sweep, code review, Yerin design, Plan 6 brainstorm brief

- security-sweep-2026-06-01.md: fresh sweep of alpha.6 (1 fixed, findings + carry-overs) - code-review-2026-06-01.md: optimisation/cleanliness notes (pool error handler, O(n) bcrypt token scan, FTS index alignment, dup auth parsing) - yerin-security-agent.md: security-agent design + tool roadmap + Orthos role proposal - plan-6-brainstorm-brief.md: Sacred Valley widget inventory + open design questions - security-followups.md: marked the pending_changes CHECK finding RESOLVED Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 23:26:46 +10:00
parent 6c393d8069
commit afbf075d26
4 changed files with 323 additions and 0 deletions
--- a/docs/security-sweep-2026-06-01.md
+++ b/docs/security-sweep-2026-06-01.md
@@ -0,0 +1,97 @@
+# Void 2.0 — Security Sweep (2026-06-01, alpha.6)
+
+Fresh security pass over the live alpha.6 surfaces. Severity is for the current
+deployment context: **single owner token, owner-only authn, one suggest-tier
+companion agent, LAN + Cloudflare-Access perimeter**. Several items move up in
+severity once additional agents or wider exposure land.
+
+Legend: ✅ fixed this pass · 🔧 recommended (needs your call / migration) · ℹ️ note.
+
+---
+
+## ✅ FIXED — Owner token compared in non-constant time  (was MEDIUM)
+`lib/auth/owner.js` and `lib/api/middleware/agent_auth.js` compared the bearer
+token with `===` / `!==`. String `===` short-circuits on the first differing
+byte, leaking token length and prefix through response timing — over enough
+samples this enables byte-by-byte recovery of `OWNER_TOKEN`.
+
+**Fix:** new `lib/auth/timingSafeStrEqual` (constant-time via
+`crypto.timingSafeEqual` with a length pre-check so it never throws on a
+length mismatch). Both auth paths now use it. Tests: `tests/auth/safe_compare.test.js`.
+
+---
+
+## 🔧 HIGH — `verifyToken` does an O(n) bcrypt scan over every token
+`lib/db/repos/agents.js::verifyToken` loads **all** non-revoked agent tokens and
+runs `bcrypt.compare` against each (cost factor 12 ≈ 250 ms/compare).
+
+- **Auth latency scales linearly with token count.** With N agents/tokens, every
+  authenticated request pays N bcrypt comparisons.
+- **DoS lever:** an attacker who can hit an authenticated endpoint with a `vk_`-
+  prefixed token forces a full-table bcrypt scan per request.
+
+**Recommended fix (needs a migration — left for your sign-off):** give each token
+a non-secret lookup key. Store `token_id = first 8 chars of the random body`
+(or a separate indexed `selector` column), index it, and `bcrypt.compare` exactly
+the one row it points at. Keeps bcrypt's offline-cracking resistance while making
+verification O(1). This is the standard "selector + verifier" split.
+
+---
+
+## 🔧 HIGH — `void` DB role still has SUPERUSER  (carried over)
+Documented in `security-followups.md`: the `void` role was granted SUPERUSER so
+the test harness could `CREATE EXTENSION`. On prod (CT 311 → CT 310 DB) this is
+far more privilege than the app needs. Revoke on prod; create extensions once as
+a superuser during bootstrap, then run the app as a non-superuser role.
+
+---
+
+## 🔧 MEDIUM — Companion subprocess inherits the full server environment
+`lib/ai/claude_cli.js` clones `process.env` for the `claude` child and only
+deletes `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. The child therefore also
+inherits `OWNER_TOKEN`, `DB_PASS`/connection strings, and the Karakeep secrets.
+
+Today the companion is constrained to `mcp__void__*` tools only (built-ins like
+Bash/Read are stripped via `--tools`), so it has no primitive to read its own
+env — contained in practice. But it is one config slip (re-enabling a built-in)
+away from full secret exposure.
+
+**Recommended (defense in depth):** pass an explicit allow-list env to the child
+(HOME, PATH, the few `CLAUDE_*` / `VOID_CLAUDE_HOME` vars, and only the MCP
+server's own needs) rather than the whole environment.
+
+---
+
+## ℹ️ LOW — `context` tool returns `SELECT *` of the active entity
+`lib/ai/agent/tools/context.js` returns every column of the active row to the
+agent. For `resources` that includes `monitoring`/`metadata` JSON, which may hold
+connection hints or `vault_path` pointers. Not a secret-value leak today (the
+resolver keeps values out of the row), but project a column allow-list before
+Yerin (or any future agent) queries resource rows broadly.
+
+---
+
+## ✅ Reviewed and sound (no action)
+- **SSRF guard** (`lib/ingest/safe_fetch.js`): http/https-only, blocks loopback /
+  RFC1918 / link-local / CGNAT(100.64) / 0.0.0.0 / IPv6 ULA+link-local+v4-mapped,
+  validates **all** DNS records, pins resolved IPs into the undici dispatcher to
+  defeat TOCTOU rebinding, and re-validates every redirect hop. Solid.
+- **Karakeep webhook HMAC** (`lib/api/routes/ingest.js`): `timingSafeEqual` over
+  the raw body, guarded against length-mismatch throw, fails closed on missing
+  secret/sig. Good.
+- **Audit redaction** (`lib/db/repos/audit.js`): redacts token/password/api_key/
+  secret/authorization recursively at write time.
+- **Capability model** (`lib/auth/capability.js`): agents default-deny; mutations
+  require `write`+scope or fall to `suggest`; unknown actions deny.
+- **XSS**: safe-DOM invariant in `public/dom.js`; assistant markdown only via the
+  DOMPurify-sanitized path.
+- **Bearer-only API** (no cookies) ⇒ CSRF not applicable.
+
+---
+
+## Carried-over from `security-followups.md` (design decisions, your call)
+- **[HIGH]** Polymorphic `entity_tags`/`entity_links`/`attachments` lack `space_id`
+  (cross-tenant linkage possible at the DB layer). Defensible for single-tenant
+  alpha; gate at the REST/MCP layer until multi-tenant is load-bearing.
+- **[MEDIUM]** `tags.name` UNIQUE is global, not per-space.
+- **[MEDIUM]** No cascade on polymorphic parent delete.