diff --git a/docs/plan-3-complete.md b/docs/plan-3-complete.md new file mode 100644 index 0000000..316854f --- /dev/null +++ b/docs/plan-3-complete.md @@ -0,0 +1,66 @@ +# Plan 3 — Complete + +**Date:** 2026-06-01 +**Version:** 2.0.0-alpha.3 +**Tests:** 241 passing + 1 gated-skipped across 60 files +**Commits on `main`:** ~30 covering Plan 3 (`fa47419` … head) +**Snapshots:** `plan3_pre_phaseA_20260601_0322`, `plan3_phase_c_20260601_0353`, `plan3_complete_20260601_0400` on CT 310 + 311. + +## Scope delivered + +### Phase A — Queue harness + Jobs API +- `lib/jobs/queue.js` singleton over pg-boss v10 with per-name `createQueue` dedup (deadlock fix). +- `lib/jobs/index.js` worker registry; trivial `echo` worker proved the harness end-to-end. +- `lib/db/repos/jobs.js` unifies `pgboss.job` (partitioned per queue) and `pgboss.archive` for list/get views; provides `retry()` and `remove()`. +- `/api/jobs` (owner-only) — list, get, retry, delete. +- `#/jobs` SPA view stub + sidebar entry. + +### Phase B — Capture API + URL/blob workers +- `lib/ingest/readability.js` — `@mozilla/readability` + `jsdom` wrapper. +- `lib/ingest/blob_store.js` — sha256 content-addressed write, sharded `/`. +- `lib/ingest/safe_fetch.js` — SSRF mitigations: http/https only; DNS resolution; loopback/RFC1918/link-local/CGNAT/metadata blocked; pinned-DNS undici dispatcher to defeat TOCTOU rebinding; per-hop re-validation on redirects. +- `lib/jobs/workers/url.js` — fetch + readability extract → `refs` row; idempotent by `sha256(space_id + url)` stored as `refs.external_id`. +- `lib/jobs/workers/blob.js` — content-addressed storage + image/pdf/file classification. +- `POST /api/capture` + `POST /api/capture/upload`. + +### Phase C — Embeddings + hybrid search +- `lib/ai/ollama.js` — thin `embedText()` wrapper with `padTo()` helper. +- `lib/jobs/workers/embed.js` — embeds `pages/refs/source_docs/conversations`, pads 768 → 1024, writes `embedding`, emits `worker`-actor audit. +- `lib/jobs/triggers.js` + repo additions — pages/refs/source_docs `create`/`update` fire `triggerEmbed` with a singleton key. +- `lib/db/repos/search.js` rewritten to hybrid (FTS + pgvector ANN + RRF k=60) with graceful FTS-only fallback when Ollama is unreachable. +- `tests/integration/embed_live.test.js` — gated end-to-end test (skip if Ollama down). + +### Phase D — Karakeep webhook + drag-drop UI + Jobs UI +- `lib/karakeep/client.js` — thin bearer-token bookmark fetch. +- `lib/jobs/workers/karakeep.js` — fetches the bookmark, normalizes to a `refs` row tagged `source_kind='karakeep'`, idempotent by `sha256(space_id + 'karakeep:' + bookmark_id)`. +- `POST /api/ingest/karakeep` — HMAC-verified webhook; bypasses `agentOrOwner`. Raw body captured via `express.json({ verify })`. +- `public/components/dropzone.js` + wiring on `#main`. +- Full Jobs panel with state-grouped rows + retry/delete buttons + 10 s polling. + +## Security findings handled + +| Finding | Source | Resolution | +|---|---|---| +| SSRF on `ingest.url` worker (fetch arbitrary URLs) | reviewer | `lib/ingest/safe_fetch.js` with IP-range blocklist + per-hop re-validation | +| DNS-rebinding TOCTOU in `safe_fetch` | reviewer | Pinned undici dispatcher whose `lookup()` returns the validated IP | + +Plan 3 introduced no other reviewer findings to defer. + +## UI smoke (manual, captured by walking through) + +- Drop a file onto the SPA's main panel after navigating to a Space → upload posts; Jobs view shows the new `ingest.blob` job, then `embed.text` arrives. +- POST `/api/capture` from `curl` → response carries `job_id` and `idempotency_key`; the SPA's `Jobs` view picks it up. +- Karakeep webhook (`X-Karakeep-Signature` valid) → 202 + `job_id`. Bad signature → 401. + +## Open items for the user + +- **`KARAKEEP_API_TOKEN` + `KARAKEEP_DEFAULT_SPACE_ID`** — needed in `/opt/void-server/.env` on CT 311 before Karakeep webhooks do anything useful. `KARAKEEP_WEBHOOK_SECRET` likewise must match Karakeep's webhook config. +- **`BLOB_ROOT=/var/lib/void/blobs`** on CT 311 — create with `mkdir -p /var/lib/void/blobs && chown void:void /var/lib/void/blobs && chmod 750 /var/lib/void/blobs`. Add to the deploy README's bootstrap. +- **`UPLOAD_TMP=/var/lib/void/uploads-tmp`** likewise. +- alpha-3 is **not yet deployed** to CT 311; alpha-2 is still serving. `deploy/push.sh` works as-is once the env additions are in place. + +## What's left after Plan 3 + +- **Plan 4** — heavy ingest (Whisper transcription, Tesseract OCR, yt-dlp, pdftotext) via the Python `void-workers` service. +- **Plan 5** — companion chat in the right rail. +- **Plan 6** — Sacred Valley widgets ported from Void 1.x.