docs: Plan 3 completion summary

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
root
2026-06-01 04:01:12 +10:00
parent a02a96ea5f
commit 837bf2a5b4

66
docs/plan-3-complete.md Normal file
View File

@@ -0,0 +1,66 @@
# Plan 3 — Complete
**Date:** 2026-06-01
**Version:** 2.0.0-alpha.3
**Tests:** 241 passing + 1 gated-skipped across 60 files
**Commits on `main`:** ~30 covering Plan 3 (`fa47419` … head)
**Snapshots:** `plan3_pre_phaseA_20260601_0322`, `plan3_phase_c_20260601_0353`, `plan3_complete_20260601_0400` on CT 310 + 311.
## Scope delivered
### Phase A — Queue harness + Jobs API
- `lib/jobs/queue.js` singleton over pg-boss v10 with per-name `createQueue` dedup (deadlock fix).
- `lib/jobs/index.js` worker registry; trivial `echo` worker proved the harness end-to-end.
- `lib/db/repos/jobs.js` unifies `pgboss.job` (partitioned per queue) and `pgboss.archive` for list/get views; provides `retry()` and `remove()`.
- `/api/jobs` (owner-only) — list, get, retry, delete.
- `#/jobs` SPA view stub + sidebar entry.
### Phase B — Capture API + URL/blob workers
- `lib/ingest/readability.js``@mozilla/readability` + `jsdom` wrapper.
- `lib/ingest/blob_store.js` — sha256 content-addressed write, sharded `<sha-prefix>/<sha>`.
- `lib/ingest/safe_fetch.js` — SSRF mitigations: http/https only; DNS resolution; loopback/RFC1918/link-local/CGNAT/metadata blocked; pinned-DNS undici dispatcher to defeat TOCTOU rebinding; per-hop re-validation on redirects.
- `lib/jobs/workers/url.js` — fetch + readability extract → `refs` row; idempotent by `sha256(space_id + url)` stored as `refs.external_id`.
- `lib/jobs/workers/blob.js` — content-addressed storage + image/pdf/file classification.
- `POST /api/capture` + `POST /api/capture/upload`.
### Phase C — Embeddings + hybrid search
- `lib/ai/ollama.js` — thin `embedText()` wrapper with `padTo()` helper.
- `lib/jobs/workers/embed.js` — embeds `pages/refs/source_docs/conversations`, pads 768 → 1024, writes `embedding`, emits `worker`-actor audit.
- `lib/jobs/triggers.js` + repo additions — pages/refs/source_docs `create`/`update` fire `triggerEmbed` with a singleton key.
- `lib/db/repos/search.js` rewritten to hybrid (FTS + pgvector ANN + RRF k=60) with graceful FTS-only fallback when Ollama is unreachable.
- `tests/integration/embed_live.test.js` — gated end-to-end test (skip if Ollama down).
### Phase D — Karakeep webhook + drag-drop UI + Jobs UI
- `lib/karakeep/client.js` — thin bearer-token bookmark fetch.
- `lib/jobs/workers/karakeep.js` — fetches the bookmark, normalizes to a `refs` row tagged `source_kind='karakeep'`, idempotent by `sha256(space_id + 'karakeep:' + bookmark_id)`.
- `POST /api/ingest/karakeep` — HMAC-verified webhook; bypasses `agentOrOwner`. Raw body captured via `express.json({ verify })`.
- `public/components/dropzone.js` + wiring on `#main`.
- Full Jobs panel with state-grouped rows + retry/delete buttons + 10 s polling.
## Security findings handled
| Finding | Source | Resolution |
|---|---|---|
| SSRF on `ingest.url` worker (fetch arbitrary URLs) | reviewer | `lib/ingest/safe_fetch.js` with IP-range blocklist + per-hop re-validation |
| DNS-rebinding TOCTOU in `safe_fetch` | reviewer | Pinned undici dispatcher whose `lookup()` returns the validated IP |
Plan 3 introduced no other reviewer findings to defer.
## UI smoke (manual, captured by walking through)
- Drop a file onto the SPA's main panel after navigating to a Space → upload posts; Jobs view shows the new `ingest.blob` job, then `embed.text` arrives.
- POST `/api/capture` from `curl` → response carries `job_id` and `idempotency_key`; the SPA's `Jobs` view picks it up.
- Karakeep webhook (`X-Karakeep-Signature` valid) → 202 + `job_id`. Bad signature → 401.
## Open items for the user
- **`KARAKEEP_API_TOKEN` + `KARAKEEP_DEFAULT_SPACE_ID`** — needed in `/opt/void-server/.env` on CT 311 before Karakeep webhooks do anything useful. `KARAKEEP_WEBHOOK_SECRET` likewise must match Karakeep's webhook config.
- **`BLOB_ROOT=/var/lib/void/blobs`** on CT 311 — create with `mkdir -p /var/lib/void/blobs && chown void:void /var/lib/void/blobs && chmod 750 /var/lib/void/blobs`. Add to the deploy README's bootstrap.
- **`UPLOAD_TMP=/var/lib/void/uploads-tmp`** likewise.
- alpha-3 is **not yet deployed** to CT 311; alpha-2 is still serving. `deploy/push.sh` works as-is once the env additions are in place.
## What's left after Plan 3
- **Plan 4** — heavy ingest (Whisper transcription, Tesseract OCR, yt-dlp, pdftotext) via the Python `void-workers` service.
- **Plan 5** — companion chat in the right rail.
- **Plan 6** — Sacred Valley widgets ported from Void 1.x.