4.4 KiB
4.4 KiB
Plan 3 — Complete
Date: 2026-06-01
Version: 2.0.0-alpha.3
Tests: 241 passing + 1 gated-skipped across 60 files
Commits on main: ~30 covering Plan 3 (fa47419 … head)
Snapshots: plan3_pre_phaseA_20260601_0322, plan3_phase_c_20260601_0353, plan3_complete_20260601_0400 on CT 310 + 311.
Scope delivered
Phase A — Queue harness + Jobs API
lib/jobs/queue.jssingleton over pg-boss v10 with per-namecreateQueuededup (deadlock fix).lib/jobs/index.jsworker registry; trivialechoworker proved the harness end-to-end.lib/db/repos/jobs.jsunifiespgboss.job(partitioned per queue) andpgboss.archivefor list/get views; providesretry()andremove()./api/jobs(owner-only) — list, get, retry, delete.#/jobsSPA view stub + sidebar entry.
Phase B — Capture API + URL/blob workers
lib/ingest/readability.js—@mozilla/readability+jsdomwrapper.lib/ingest/blob_store.js— sha256 content-addressed write, sharded<sha-prefix>/<sha>.lib/ingest/safe_fetch.js— SSRF mitigations: http/https only; DNS resolution; loopback/RFC1918/link-local/CGNAT/metadata blocked; pinned-DNS undici dispatcher to defeat TOCTOU rebinding; per-hop re-validation on redirects.lib/jobs/workers/url.js— fetch + readability extract →refsrow; idempotent bysha256(space_id + url)stored asrefs.external_id.lib/jobs/workers/blob.js— content-addressed storage + image/pdf/file classification.POST /api/capture+POST /api/capture/upload.
Phase C — Embeddings + hybrid search
lib/ai/ollama.js— thinembedText()wrapper withpadTo()helper.lib/jobs/workers/embed.js— embedspages/refs/source_docs/conversations, pads 768 → 1024, writesembedding, emitsworker-actor audit.lib/jobs/triggers.js+ repo additions — pages/refs/source_docscreate/updatefiretriggerEmbedwith a singleton key.lib/db/repos/search.jsrewritten to hybrid (FTS + pgvector ANN + RRF k=60) with graceful FTS-only fallback when Ollama is unreachable.tests/integration/embed_live.test.js— gated end-to-end test (skip if Ollama down).
Phase D — Karakeep webhook + drag-drop UI + Jobs UI
lib/karakeep/client.js— thin bearer-token bookmark fetch.lib/jobs/workers/karakeep.js— fetches the bookmark, normalizes to arefsrow taggedsource_kind='karakeep', idempotent bysha256(space_id + 'karakeep:' + bookmark_id).POST /api/ingest/karakeep— HMAC-verified webhook; bypassesagentOrOwner. Raw body captured viaexpress.json({ verify }).public/components/dropzone.js+ wiring on#main.- Full Jobs panel with state-grouped rows + retry/delete buttons + 10 s polling.
Security findings handled
| Finding | Source | Resolution |
|---|---|---|
SSRF on ingest.url worker (fetch arbitrary URLs) |
reviewer | lib/ingest/safe_fetch.js with IP-range blocklist + per-hop re-validation |
DNS-rebinding TOCTOU in safe_fetch |
reviewer | Pinned undici dispatcher whose lookup() returns the validated IP |
Plan 3 introduced no other reviewer findings to defer.
UI smoke (manual, captured by walking through)
- Drop a file onto the SPA's main panel after navigating to a Space → upload posts; Jobs view shows the new
ingest.blobjob, thenembed.textarrives. - POST
/api/capturefromcurl→ response carriesjob_idandidempotency_key; the SPA'sJobsview picks it up. - Karakeep webhook (
X-Karakeep-Signaturevalid) → 202 +job_id. Bad signature → 401.
Open items for the user
KARAKEEP_API_TOKEN+KARAKEEP_DEFAULT_SPACE_ID— needed in/opt/void-server/.envon CT 311 before Karakeep webhooks do anything useful.KARAKEEP_WEBHOOK_SECRETlikewise must match Karakeep's webhook config.BLOB_ROOT=/var/lib/void/blobson CT 311 — create withmkdir -p /var/lib/void/blobs && chown void:void /var/lib/void/blobs && chmod 750 /var/lib/void/blobs. Add to the deploy README's bootstrap.UPLOAD_TMP=/var/lib/void/uploads-tmplikewise.- alpha-3 is not yet deployed to CT 311; alpha-2 is still serving.
deploy/push.shworks as-is once the env additions are in place.
What's left after Plan 3
- Plan 4 — heavy ingest (Whisper transcription, Tesseract OCR, yt-dlp, pdftotext) via the Python
void-workersservice. - Plan 5 — companion chat in the right rail.
- Plan 6 — Sacred Valley widgets ported from Void 1.x.