155 lines
7.9 KiB
Markdown
155 lines
7.9 KiB
Markdown
# Changelog
|
||
|
||
All notable changes to Void 2.0 are documented here.
|
||
Format: [Keep a Changelog](https://keepachangelog.com).
|
||
|
||
## [2.0.0-alpha.4] — 2026-06-01
|
||
|
||
### Added (Plan 4: Python void-workers)
|
||
|
||
- **`void-workers.service`** — Python 3.13 service alongside `void-server`
|
||
on CT 311. psycopg-based pg-boss client matches Node's claim/finish
|
||
semantics via `SELECT ... FOR UPDATE SKIP LOCKED`. Forces
|
||
`client_encoding=UTF8` on every connection (void2-db cluster is
|
||
SQL_ASCII).
|
||
- **`extract.pdf`** — `pdftotext -layout` first; per-page `pdftoppm`
|
||
rasterization + Tesseract OCR fallback when extraction yields
|
||
< 200 chars.
|
||
- **`extract.image`** — Tesseract OCR (English) for images stored in
|
||
the blob store.
|
||
- **`ingest.video`** — `yt-dlp` metadata + audio extract + faster-whisper
|
||
(`small.en` default). CUDA at startup; CPU fallback when HA failover
|
||
to Z3 (no GPU) happens. URLs validated as http(s) and `--` separator
|
||
passed to yt-dlp to defeat argv smuggling.
|
||
- **`sync.source_doc`** — fetches `upstream_url` via Python `safe_fetch`
|
||
(port of the Node helper) + sha256-diffs against the prior body_sha
|
||
in metadata; updates body_text only when content changed.
|
||
- **Node `blob.js`** fans out to `extract.pdf` / `extract.image` after
|
||
creating PDF / image refs.
|
||
- **Node `capture.js`** routes `youtube.com` / `youtu.be` / `vimeo.com`
|
||
URLs to `ingest.video` instead of `ingest.url`.
|
||
- **Daily cron** (`lib/cron/sync_source_docs.js`) enqueues
|
||
`sync.source_doc` jobs at 03:00 local for every `source_docs` row
|
||
with `sync_source='url'`.
|
||
- **CT 311 infrastructure**: resized to 6 cores / 8 GB RAM, NVIDIA
|
||
RTX A2000 device-nodes passed through (shared with CT 102's Ollama).
|
||
- **`deploy/push-workers.sh`** + `deploy/void-workers.service` — push
|
||
the workers package, chown to `voidworkers`, recreate the venv, install
|
||
deps under `su voidworkers -c`, restart the unit.
|
||
|
||
## [2.0.0-alpha.3] — 2026-06-01
|
||
|
||
### Added (Plan 3: Capture pipeline + hybrid search)
|
||
- **pg-boss job queue** embedded in void-server (Node). Queue tables live
|
||
alongside Void's in the shared void2-db. Tests manage their own boss
|
||
lifecycle via `stopBoss()` / `waitForJob()` helpers.
|
||
- `/api/jobs` (owner-only) — list / get / retry / delete with state and
|
||
name filters. Minimal `#/jobs` SPA view fronts it, polling every 10 s.
|
||
- **`/api/capture`** POST — URL → `ingest.url` job. Idempotent by
|
||
`sha256(space_id + url)` stored as `refs.external_id`; duplicate POST
|
||
returns the existing `ref_id`.
|
||
- **`/api/capture/upload`** — multipart file → `ingest.blob` job →
|
||
content-addressed `/var/lib/void/blobs/<sha-prefix>/<sha>` →
|
||
`refs` row. Drag-drop in the SPA wired to the main panel; `space_id`
|
||
pre-filled from the last-viewed space.
|
||
- **`ingest.url` worker** — `@mozilla/readability` + `jsdom` extract;
|
||
fetch protected by `lib/ingest/safe_fetch.js` (SSRF mitigations:
|
||
http(s) only; DNS-resolved hostnames checked against loopback /
|
||
RFC1918 / link-local / CGNAT / metadata; resolved IP pinned via an
|
||
undici dispatcher to defeat DNS rebinding; redirects re-validated).
|
||
- **`ingest.blob` worker** — content-addressed storage,
|
||
image/pdf/file kind classification.
|
||
- **`embed.text` worker** — Ollama `nomic-embed-text` (768 dims) padded
|
||
to `vector(1024)`; emits a `worker`-actor audit log entry.
|
||
- **Repo-level triggers** — pages/refs/source_docs `create` and
|
||
`update` enqueue an `embed.text` job with a singleton key so rapid
|
||
edits coalesce. No-op when the queue is not running (tests).
|
||
- **Hybrid `/api/search`** — FTS + pgvector ANN unioned with reciprocal
|
||
rank fusion (k=60). Vector branch silently skipped when Ollama times
|
||
out, leaving FTS-only results — graceful degrade.
|
||
- **`/api/ingest/karakeep`** — HMAC-verified webhook. Enqueues
|
||
`ingest.karakeep` for `bookmark.created`; worker fetches the bookmark
|
||
via Karakeep's API, normalizes to a `refs` row tagged
|
||
`source_kind='karakeep'`.
|
||
|
||
### Deferred (Plan 4+)
|
||
|
||
- Python `void-workers` service for Whisper / Tesseract OCR / yt-dlp
|
||
(heavy ML).
|
||
- AI Space/Project suggestion on capture.
|
||
- Embedding chunks table (whole-doc embedding only in Plan 3).
|
||
- pdftotext for born-digital PDFs.
|
||
- `pg LISTEN/NOTIFY` real-time Jobs UI.
|
||
|
||
## [2.0.0-alpha.2] — 2026-06-01
|
||
|
||
### Added (Plan 2: API surface + UI shell)
|
||
- REST routes for the full entity tree:
|
||
- `/api/spaces`, `/api/projects`, `/api/tasks` (with project + space scoping)
|
||
- `/api/pages` + page revisions + `/api/pages/:id/backlinks`
|
||
- `/api/refs` + `/api/refs/upsert`
|
||
- `/api/resources` + dependencies + change history
|
||
- `/api/resources/:id/source-docs` + `/api/source-docs/:id/resync` (gated by `ENABLE_RESYNC`)
|
||
- `/api/agents` (owner-only) + agent token mint/revoke
|
||
- `/api/conversations` + nested `/messages`
|
||
- `/api/tags` + entity-scoped attach/detach via `/api/:entity_type/:entity_id/tags`
|
||
- `/api/links` (POST/GET from|to/DELETE) for polymorphic entity links
|
||
- `/api/pending-changes` + approve/reject with dispatch table covering
|
||
page/project/task/ref/resource/source_doc × create/update/delete
|
||
- `/api/audit/entity/:type/:id` + `/api/audit/actor`
|
||
- `/api/search` unified FTS across pages, refs, source docs, messages
|
||
- Agent bearer auth middleware + capability tiering: owner allow, agent
|
||
`write+scope` → allow, agent `suggest` → 202 + pending row, else 403.
|
||
- Approve and reject emit explicit `approve` / `reject` entries in the
|
||
audit log with the original agent id preserved in the diff.
|
||
- Static SPA shell served from `public/`:
|
||
- Three-column Cradle aesthetic (blackflame palette, Cinzel display
|
||
headings, Cormorant Garamond body)
|
||
- Hash-based router with views for home / space / project / page /
|
||
reference / resource / search / inbox / sacred valley
|
||
- `dom.js` safe builders — no `innerHTML` on API data anywhere; the
|
||
explicit `html:` opt-in is used only by the markdown editor's
|
||
preview pane, which sanitizes with DOMPurify
|
||
- Sidebar Spaces tree with lazy project expansion, bottom Navigate
|
||
section, pending-count badge shared with the topbar bell via a tiny
|
||
`state.js` event bus
|
||
- Topbar: brand, capture modal stub, global search (Enter →
|
||
`#/search?q=`), pending bell, owner toggle
|
||
- Page editor: split-pane markdown via marked + DOMPurify, save
|
||
PATCHes `/api/pages/:id`, backlinks card
|
||
- Reference detail: media block (image / YouTube embed / link),
|
||
summary, metadata table, tag attach/detach, linked-from list
|
||
- Resource detail: status header, dependencies + source docs +
|
||
runbook pages columns, change history
|
||
- Inbox: pending changes grouped by agent, approve → navigate to the
|
||
resulting entity
|
||
- Test coverage: 185 tests across 43 files (113 new for Plan 2 routes +
|
||
search + GET / shell smoke).
|
||
|
||
### Security follow-ups (deferred)
|
||
- Polymorphic IDOR risk on entity_links / entity_tags / attachments —
|
||
acceptable today since the entire API is owner-token gated and there
|
||
is one tenant; see `docs/security-followups.md` for the tighten-now
|
||
vs defer decision.
|
||
- `pending_changes.action` CHECK constraint blocks `'upsert'` /
|
||
`'add_dependency'` / `'remove_dependency'` actions emitted by some
|
||
routes' `divertToPending` paths. Latent — only fires when an agent at
|
||
suggest tier hits those specific endpoints. Mitigation options
|
||
documented in `docs/security-followups.md`.
|
||
|
||
## [Unreleased]
|
||
|
||
### Added
|
||
- Initial repo scaffolding
|
||
|
||
### Added (Plan 1: Foundation)
|
||
- LXC provisioning for `void2-db` (Postgres 16 + pgvector) and `void2-app`
|
||
- Schema migrations 001-006 covering core, knowledge, resources, agents, cross-cutting, audit
|
||
- Repos with capability-checked `actor` parameter and audit trail
|
||
- Real audit log with redaction of sensitive keys (token, password, api_key, etc.)
|
||
- `pending_changes` table for agent suggestions awaiting owner approval
|
||
- Capability check module (allow / suggest / deny) for user vs agent actors
|
||
- Owner-token bearer auth
|
||
- Express server with `/health` and smoke `/api/spaces`
|
||
- Test coverage: 72 tests across migrations, repos, capability, owner middleware, server
|