diff --git a/CHANGELOG.md b/CHANGELOG.md index b318afb..895e9e9 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -3,6 +3,50 @@ All notable changes to Void 2.0 are documented here. Format: [Keep a Changelog](https://keepachangelog.com). +## [2.0.0-alpha.3] — 2026-06-01 + +### Added (Plan 3: Capture pipeline + hybrid search) +- **pg-boss job queue** embedded in void-server (Node). Queue tables live + alongside Void's in the shared void2-db. Tests manage their own boss + lifecycle via `stopBoss()` / `waitForJob()` helpers. +- `/api/jobs` (owner-only) — list / get / retry / delete with state and + name filters. Minimal `#/jobs` SPA view fronts it, polling every 10 s. +- **`/api/capture`** POST — URL → `ingest.url` job. Idempotent by + `sha256(space_id + url)` stored as `refs.external_id`; duplicate POST + returns the existing `ref_id`. +- **`/api/capture/upload`** — multipart file → `ingest.blob` job → + content-addressed `/var/lib/void/blobs//` → + `refs` row. Drag-drop in the SPA wired to the main panel; `space_id` + pre-filled from the last-viewed space. +- **`ingest.url` worker** — `@mozilla/readability` + `jsdom` extract; + fetch protected by `lib/ingest/safe_fetch.js` (SSRF mitigations: + http(s) only; DNS-resolved hostnames checked against loopback / + RFC1918 / link-local / CGNAT / metadata; resolved IP pinned via an + undici dispatcher to defeat DNS rebinding; redirects re-validated). +- **`ingest.blob` worker** — content-addressed storage, + image/pdf/file kind classification. +- **`embed.text` worker** — Ollama `nomic-embed-text` (768 dims) padded + to `vector(1024)`; emits a `worker`-actor audit log entry. +- **Repo-level triggers** — pages/refs/source_docs `create` and + `update` enqueue an `embed.text` job with a singleton key so rapid + edits coalesce. No-op when the queue is not running (tests). +- **Hybrid `/api/search`** — FTS + pgvector ANN unioned with reciprocal + rank fusion (k=60). Vector branch silently skipped when Ollama times + out, leaving FTS-only results — graceful degrade. +- **`/api/ingest/karakeep`** — HMAC-verified webhook. Enqueues + `ingest.karakeep` for `bookmark.created`; worker fetches the bookmark + via Karakeep's API, normalizes to a `refs` row tagged + `source_kind='karakeep'`. + +### Deferred (Plan 4+) + +- Python `void-workers` service for Whisper / Tesseract OCR / yt-dlp + (heavy ML). +- AI Space/Project suggestion on capture. +- Embedding chunks table (whole-doc embedding only in Plan 3). +- pdftotext for born-digital PDFs. +- `pg LISTEN/NOTIFY` real-time Jobs UI. + ## [2.0.0-alpha.2] — 2026-06-01 ### Added (Plan 2: API surface + UI shell) diff --git a/package.json b/package.json index ffd2e87..9c3d9c8 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "void-server", - "version": "2.0.0-alpha.2", + "version": "2.0.0-alpha.3", "type": "module", "private": true, "scripts": { diff --git a/server.js b/server.js index 5aa7917..a70234b 100644 --- a/server.js +++ b/server.js @@ -7,7 +7,7 @@ import * as queue from './lib/jobs/queue.js'; import { registerWorkers } from './lib/jobs/index.js'; import { router as ingestRouter } from './lib/api/routes/ingest.js'; -const VERSION = '2.0.0-alpha.2'; +const VERSION = '2.0.0-alpha.3'; export function createApp() { const app = express(); diff --git a/tests/server.test.js b/tests/server.test.js index e39bced..e7561ca 100644 --- a/tests/server.test.js +++ b/tests/server.test.js @@ -17,7 +17,7 @@ describe('server', () => { const res = await request(app).get('/health'); expect(res.status).toBe(200); expect(res.body.db_ok).toBe(true); - expect(res.body.version).toBe('2.0.0-alpha.2'); + expect(res.body.version).toBe('2.0.0-alpha.3'); }); it('GET /api/spaces without token returns 401', async () => {