docs: security sweep, code review, Yerin design, Plan 6 brainstorm brief

- security-sweep-2026-06-01.md: fresh sweep of alpha.6 (1 fixed, findings + carry-overs)
- code-review-2026-06-01.md: optimisation/cleanliness notes (pool error handler,
  O(n) bcrypt token scan, FTS index alignment, dup auth parsing)
- yerin-security-agent.md: security-agent design + tool roadmap + Orthos role proposal
- plan-6-brainstorm-brief.md: Sacred Valley widget inventory + open design questions
- security-followups.md: marked the pending_changes CHECK finding RESOLVED

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This commit is contained in:
root
2026-06-01 23:26:46 +10:00
parent 6c393d8069
commit afbf075d26
4 changed files with 323 additions and 0 deletions

View File

@@ -0,0 +1,97 @@
# Void 2.0 — Security Sweep (2026-06-01, alpha.6)
Fresh security pass over the live alpha.6 surfaces. Severity is for the current
deployment context: **single owner token, owner-only authn, one suggest-tier
companion agent, LAN + Cloudflare-Access perimeter**. Several items move up in
severity once additional agents or wider exposure land.
Legend: ✅ fixed this pass · 🔧 recommended (needs your call / migration) · note.
---
## ✅ FIXED — Owner token compared in non-constant time (was MEDIUM)
`lib/auth/owner.js` and `lib/api/middleware/agent_auth.js` compared the bearer
token with `===` / `!==`. String `===` short-circuits on the first differing
byte, leaking token length and prefix through response timing — over enough
samples this enables byte-by-byte recovery of `OWNER_TOKEN`.
**Fix:** new `lib/auth/timingSafeStrEqual` (constant-time via
`crypto.timingSafeEqual` with a length pre-check so it never throws on a
length mismatch). Both auth paths now use it. Tests: `tests/auth/safe_compare.test.js`.
---
## 🔧 HIGH — `verifyToken` does an O(n) bcrypt scan over every token
`lib/db/repos/agents.js::verifyToken` loads **all** non-revoked agent tokens and
runs `bcrypt.compare` against each (cost factor 12 ≈ 250 ms/compare).
- **Auth latency scales linearly with token count.** With N agents/tokens, every
authenticated request pays N bcrypt comparisons.
- **DoS lever:** an attacker who can hit an authenticated endpoint with a `vk_`-
prefixed token forces a full-table bcrypt scan per request.
**Recommended fix (needs a migration — left for your sign-off):** give each token
a non-secret lookup key. Store `token_id = first 8 chars of the random body`
(or a separate indexed `selector` column), index it, and `bcrypt.compare` exactly
the one row it points at. Keeps bcrypt's offline-cracking resistance while making
verification O(1). This is the standard "selector + verifier" split.
---
## 🔧 HIGH — `void` DB role still has SUPERUSER (carried over)
Documented in `security-followups.md`: the `void` role was granted SUPERUSER so
the test harness could `CREATE EXTENSION`. On prod (CT 311 → CT 310 DB) this is
far more privilege than the app needs. Revoke on prod; create extensions once as
a superuser during bootstrap, then run the app as a non-superuser role.
---
## 🔧 MEDIUM — Companion subprocess inherits the full server environment
`lib/ai/claude_cli.js` clones `process.env` for the `claude` child and only
deletes `ANTHROPIC_API_KEY` / `ANTHROPIC_AUTH_TOKEN`. The child therefore also
inherits `OWNER_TOKEN`, `DB_PASS`/connection strings, and the Karakeep secrets.
Today the companion is constrained to `mcp__void__*` tools only (built-ins like
Bash/Read are stripped via `--tools`), so it has no primitive to read its own
env — contained in practice. But it is one config slip (re-enabling a built-in)
away from full secret exposure.
**Recommended (defense in depth):** pass an explicit allow-list env to the child
(HOME, PATH, the few `CLAUDE_*` / `VOID_CLAUDE_HOME` vars, and only the MCP
server's own needs) rather than the whole environment.
---
## LOW — `context` tool returns `SELECT *` of the active entity
`lib/ai/agent/tools/context.js` returns every column of the active row to the
agent. For `resources` that includes `monitoring`/`metadata` JSON, which may hold
connection hints or `vault_path` pointers. Not a secret-value leak today (the
resolver keeps values out of the row), but project a column allow-list before
Yerin (or any future agent) queries resource rows broadly.
---
## ✅ Reviewed and sound (no action)
- **SSRF guard** (`lib/ingest/safe_fetch.js`): http/https-only, blocks loopback /
RFC1918 / link-local / CGNAT(100.64) / 0.0.0.0 / IPv6 ULA+link-local+v4-mapped,
validates **all** DNS records, pins resolved IPs into the undici dispatcher to
defeat TOCTOU rebinding, and re-validates every redirect hop. Solid.
- **Karakeep webhook HMAC** (`lib/api/routes/ingest.js`): `timingSafeEqual` over
the raw body, guarded against length-mismatch throw, fails closed on missing
secret/sig. Good.
- **Audit redaction** (`lib/db/repos/audit.js`): redacts token/password/api_key/
secret/authorization recursively at write time.
- **Capability model** (`lib/auth/capability.js`): agents default-deny; mutations
require `write`+scope or fall to `suggest`; unknown actions deny.
- **XSS**: safe-DOM invariant in `public/dom.js`; assistant markdown only via the
DOMPurify-sanitized path.
- **Bearer-only API** (no cookies) ⇒ CSRF not applicable.
---
## Carried-over from `security-followups.md` (design decisions, your call)
- **[HIGH]** Polymorphic `entity_tags`/`entity_links`/`attachments` lack `space_id`
(cross-tenant linkage possible at the DB layer). Defensible for single-tenant
alpha; gate at the REST/MCP layer until multi-tenant is load-bearing.
- **[MEDIUM]** `tags.name` UNIQUE is global, not per-space.
- **[MEDIUM]** No cascade on polymorphic parent delete.