# Changelog All notable changes to Void 2.0 are documented here. Format: [Keep a Changelog](https://keepachangelog.com). ## [2.0.0-alpha.4] — 2026-06-01 ### Added (Plan 4: Python void-workers) - **`void-workers.service`** — Python 3.13 service alongside `void-server` on CT 311. psycopg-based pg-boss client matches Node's claim/finish semantics via `SELECT ... FOR UPDATE SKIP LOCKED`. Forces `client_encoding=UTF8` on every connection (void2-db cluster is SQL_ASCII). - **`extract.pdf`** — `pdftotext -layout` first; per-page `pdftoppm` rasterization + Tesseract OCR fallback when extraction yields < 200 chars. - **`extract.image`** — Tesseract OCR (English) for images stored in the blob store. - **`ingest.video`** — `yt-dlp` metadata + audio extract + faster-whisper (`small.en` default). CUDA at startup; CPU fallback when HA failover to Z3 (no GPU) happens. URLs validated as http(s) and `--` separator passed to yt-dlp to defeat argv smuggling. - **`sync.source_doc`** — fetches `upstream_url` via Python `safe_fetch` (port of the Node helper) + sha256-diffs against the prior body_sha in metadata; updates body_text only when content changed. - **Node `blob.js`** fans out to `extract.pdf` / `extract.image` after creating PDF / image refs. - **Node `capture.js`** routes `youtube.com` / `youtu.be` / `vimeo.com` URLs to `ingest.video` instead of `ingest.url`. - **Daily cron** (`lib/cron/sync_source_docs.js`) enqueues `sync.source_doc` jobs at 03:00 local for every `source_docs` row with `sync_source='url'`. - **CT 311 infrastructure**: resized to 6 cores / 8 GB RAM, NVIDIA RTX A2000 device-nodes passed through (shared with CT 102's Ollama). - **`deploy/push-workers.sh`** + `deploy/void-workers.service` — push the workers package, chown to `voidworkers`, recreate the venv, install deps under `su voidworkers -c`, restart the unit. ## [2.0.0-alpha.3] — 2026-06-01 ### Added (Plan 3: Capture pipeline + hybrid search) - **pg-boss job queue** embedded in void-server (Node). Queue tables live alongside Void's in the shared void2-db. Tests manage their own boss lifecycle via `stopBoss()` / `waitForJob()` helpers. - `/api/jobs` (owner-only) — list / get / retry / delete with state and name filters. Minimal `#/jobs` SPA view fronts it, polling every 10 s. - **`/api/capture`** POST — URL → `ingest.url` job. Idempotent by `sha256(space_id + url)` stored as `refs.external_id`; duplicate POST returns the existing `ref_id`. - **`/api/capture/upload`** — multipart file → `ingest.blob` job → content-addressed `/var/lib/void/blobs//` → `refs` row. Drag-drop in the SPA wired to the main panel; `space_id` pre-filled from the last-viewed space. - **`ingest.url` worker** — `@mozilla/readability` + `jsdom` extract; fetch protected by `lib/ingest/safe_fetch.js` (SSRF mitigations: http(s) only; DNS-resolved hostnames checked against loopback / RFC1918 / link-local / CGNAT / metadata; resolved IP pinned via an undici dispatcher to defeat DNS rebinding; redirects re-validated). - **`ingest.blob` worker** — content-addressed storage, image/pdf/file kind classification. - **`embed.text` worker** — Ollama `nomic-embed-text` (768 dims) padded to `vector(1024)`; emits a `worker`-actor audit log entry. - **Repo-level triggers** — pages/refs/source_docs `create` and `update` enqueue an `embed.text` job with a singleton key so rapid edits coalesce. No-op when the queue is not running (tests). - **Hybrid `/api/search`** — FTS + pgvector ANN unioned with reciprocal rank fusion (k=60). Vector branch silently skipped when Ollama times out, leaving FTS-only results — graceful degrade. - **`/api/ingest/karakeep`** — HMAC-verified webhook. Enqueues `ingest.karakeep` for `bookmark.created`; worker fetches the bookmark via Karakeep's API, normalizes to a `refs` row tagged `source_kind='karakeep'`. ### Deferred (Plan 4+) - Python `void-workers` service for Whisper / Tesseract OCR / yt-dlp (heavy ML). - AI Space/Project suggestion on capture. - Embedding chunks table (whole-doc embedding only in Plan 3). - pdftotext for born-digital PDFs. - `pg LISTEN/NOTIFY` real-time Jobs UI. ## [2.0.0-alpha.2] — 2026-06-01 ### Added (Plan 2: API surface + UI shell) - REST routes for the full entity tree: - `/api/spaces`, `/api/projects`, `/api/tasks` (with project + space scoping) - `/api/pages` + page revisions + `/api/pages/:id/backlinks` - `/api/refs` + `/api/refs/upsert` - `/api/resources` + dependencies + change history - `/api/resources/:id/source-docs` + `/api/source-docs/:id/resync` (gated by `ENABLE_RESYNC`) - `/api/agents` (owner-only) + agent token mint/revoke - `/api/conversations` + nested `/messages` - `/api/tags` + entity-scoped attach/detach via `/api/:entity_type/:entity_id/tags` - `/api/links` (POST/GET from|to/DELETE) for polymorphic entity links - `/api/pending-changes` + approve/reject with dispatch table covering page/project/task/ref/resource/source_doc × create/update/delete - `/api/audit/entity/:type/:id` + `/api/audit/actor` - `/api/search` unified FTS across pages, refs, source docs, messages - Agent bearer auth middleware + capability tiering: owner allow, agent `write+scope` → allow, agent `suggest` → 202 + pending row, else 403. - Approve and reject emit explicit `approve` / `reject` entries in the audit log with the original agent id preserved in the diff. - Static SPA shell served from `public/`: - Three-column Cradle aesthetic (blackflame palette, Cinzel display headings, Cormorant Garamond body) - Hash-based router with views for home / space / project / page / reference / resource / search / inbox / sacred valley - `dom.js` safe builders — no `innerHTML` on API data anywhere; the explicit `html:` opt-in is used only by the markdown editor's preview pane, which sanitizes with DOMPurify - Sidebar Spaces tree with lazy project expansion, bottom Navigate section, pending-count badge shared with the topbar bell via a tiny `state.js` event bus - Topbar: brand, capture modal stub, global search (Enter → `#/search?q=`), pending bell, owner toggle - Page editor: split-pane markdown via marked + DOMPurify, save PATCHes `/api/pages/:id`, backlinks card - Reference detail: media block (image / YouTube embed / link), summary, metadata table, tag attach/detach, linked-from list - Resource detail: status header, dependencies + source docs + runbook pages columns, change history - Inbox: pending changes grouped by agent, approve → navigate to the resulting entity - Test coverage: 185 tests across 43 files (113 new for Plan 2 routes + search + GET / shell smoke). ### Security follow-ups (deferred) - Polymorphic IDOR risk on entity_links / entity_tags / attachments — acceptable today since the entire API is owner-token gated and there is one tenant; see `docs/security-followups.md` for the tighten-now vs defer decision. - `pending_changes.action` CHECK constraint blocks `'upsert'` / `'add_dependency'` / `'remove_dependency'` actions emitted by some routes' `divertToPending` paths. Latent — only fires when an agent at suggest tier hits those specific endpoints. Mitigation options documented in `docs/security-followups.md`. ## [Unreleased] ### Added - Initial repo scaffolding ### Added (Plan 1: Foundation) - LXC provisioning for `void2-db` (Postgres 16 + pgvector) and `void2-app` - Schema migrations 001-006 covering core, knowledge, resources, agents, cross-cutting, audit - Repos with capability-checked `actor` parameter and audit trail - Real audit log with redaction of sensitive keys (token, password, api_key, etc.) - `pending_changes` table for agent suggestions awaiting owner approval - Capability check module (allow / suggest / deny) for user vs agent actors - Owner-token bearer auth - Express server with `/health` and smoke `/api/spaces` - Test coverage: 72 tests across migrations, repos, capability, owner middleware, server