docs: Plan 3 completion summary
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
66
docs/plan-3-complete.md
Normal file
66
docs/plan-3-complete.md
Normal file
@@ -0,0 +1,66 @@
|
||||
# Plan 3 — Complete
|
||||
|
||||
**Date:** 2026-06-01
|
||||
**Version:** 2.0.0-alpha.3
|
||||
**Tests:** 241 passing + 1 gated-skipped across 60 files
|
||||
**Commits on `main`:** ~30 covering Plan 3 (`fa47419` … head)
|
||||
**Snapshots:** `plan3_pre_phaseA_20260601_0322`, `plan3_phase_c_20260601_0353`, `plan3_complete_20260601_0400` on CT 310 + 311.
|
||||
|
||||
## Scope delivered
|
||||
|
||||
### Phase A — Queue harness + Jobs API
|
||||
- `lib/jobs/queue.js` singleton over pg-boss v10 with per-name `createQueue` dedup (deadlock fix).
|
||||
- `lib/jobs/index.js` worker registry; trivial `echo` worker proved the harness end-to-end.
|
||||
- `lib/db/repos/jobs.js` unifies `pgboss.job` (partitioned per queue) and `pgboss.archive` for list/get views; provides `retry()` and `remove()`.
|
||||
- `/api/jobs` (owner-only) — list, get, retry, delete.
|
||||
- `#/jobs` SPA view stub + sidebar entry.
|
||||
|
||||
### Phase B — Capture API + URL/blob workers
|
||||
- `lib/ingest/readability.js` — `@mozilla/readability` + `jsdom` wrapper.
|
||||
- `lib/ingest/blob_store.js` — sha256 content-addressed write, sharded `<sha-prefix>/<sha>`.
|
||||
- `lib/ingest/safe_fetch.js` — SSRF mitigations: http/https only; DNS resolution; loopback/RFC1918/link-local/CGNAT/metadata blocked; pinned-DNS undici dispatcher to defeat TOCTOU rebinding; per-hop re-validation on redirects.
|
||||
- `lib/jobs/workers/url.js` — fetch + readability extract → `refs` row; idempotent by `sha256(space_id + url)` stored as `refs.external_id`.
|
||||
- `lib/jobs/workers/blob.js` — content-addressed storage + image/pdf/file classification.
|
||||
- `POST /api/capture` + `POST /api/capture/upload`.
|
||||
|
||||
### Phase C — Embeddings + hybrid search
|
||||
- `lib/ai/ollama.js` — thin `embedText()` wrapper with `padTo()` helper.
|
||||
- `lib/jobs/workers/embed.js` — embeds `pages/refs/source_docs/conversations`, pads 768 → 1024, writes `embedding`, emits `worker`-actor audit.
|
||||
- `lib/jobs/triggers.js` + repo additions — pages/refs/source_docs `create`/`update` fire `triggerEmbed` with a singleton key.
|
||||
- `lib/db/repos/search.js` rewritten to hybrid (FTS + pgvector ANN + RRF k=60) with graceful FTS-only fallback when Ollama is unreachable.
|
||||
- `tests/integration/embed_live.test.js` — gated end-to-end test (skip if Ollama down).
|
||||
|
||||
### Phase D — Karakeep webhook + drag-drop UI + Jobs UI
|
||||
- `lib/karakeep/client.js` — thin bearer-token bookmark fetch.
|
||||
- `lib/jobs/workers/karakeep.js` — fetches the bookmark, normalizes to a `refs` row tagged `source_kind='karakeep'`, idempotent by `sha256(space_id + 'karakeep:' + bookmark_id)`.
|
||||
- `POST /api/ingest/karakeep` — HMAC-verified webhook; bypasses `agentOrOwner`. Raw body captured via `express.json({ verify })`.
|
||||
- `public/components/dropzone.js` + wiring on `#main`.
|
||||
- Full Jobs panel with state-grouped rows + retry/delete buttons + 10 s polling.
|
||||
|
||||
## Security findings handled
|
||||
|
||||
| Finding | Source | Resolution |
|
||||
|---|---|---|
|
||||
| SSRF on `ingest.url` worker (fetch arbitrary URLs) | reviewer | `lib/ingest/safe_fetch.js` with IP-range blocklist + per-hop re-validation |
|
||||
| DNS-rebinding TOCTOU in `safe_fetch` | reviewer | Pinned undici dispatcher whose `lookup()` returns the validated IP |
|
||||
|
||||
Plan 3 introduced no other reviewer findings to defer.
|
||||
|
||||
## UI smoke (manual, captured by walking through)
|
||||
|
||||
- Drop a file onto the SPA's main panel after navigating to a Space → upload posts; Jobs view shows the new `ingest.blob` job, then `embed.text` arrives.
|
||||
- POST `/api/capture` from `curl` → response carries `job_id` and `idempotency_key`; the SPA's `Jobs` view picks it up.
|
||||
- Karakeep webhook (`X-Karakeep-Signature` valid) → 202 + `job_id`. Bad signature → 401.
|
||||
|
||||
## Open items for the user
|
||||
|
||||
- **`KARAKEEP_API_TOKEN` + `KARAKEEP_DEFAULT_SPACE_ID`** — needed in `/opt/void-server/.env` on CT 311 before Karakeep webhooks do anything useful. `KARAKEEP_WEBHOOK_SECRET` likewise must match Karakeep's webhook config.
|
||||
- **`BLOB_ROOT=/var/lib/void/blobs`** on CT 311 — create with `mkdir -p /var/lib/void/blobs && chown void:void /var/lib/void/blobs && chmod 750 /var/lib/void/blobs`. Add to the deploy README's bootstrap.
|
||||
- **`UPLOAD_TMP=/var/lib/void/uploads-tmp`** likewise.
|
||||
- alpha-3 is **not yet deployed** to CT 311; alpha-2 is still serving. `deploy/push.sh` works as-is once the env additions are in place.
|
||||
|
||||
## What's left after Plan 3
|
||||
|
||||
- **Plan 4** — heavy ingest (Whisper transcription, Tesseract OCR, yt-dlp, pdftotext) via the Python `void-workers` service.
|
||||
- **Plan 5** — companion chat in the right rail.
|
||||
- **Plan 6** — Sacred Valley widgets ported from Void 1.x.
|
||||
Reference in New Issue
Block a user