Files
Void-Homelab/CHANGELOG.md
2026-06-01 10:25:31 +10:00

7.9 KiB
Raw Blame History

Changelog

All notable changes to Void 2.0 are documented here. Format: Keep a Changelog.

[2.0.0-alpha.4] — 2026-06-01

Added (Plan 4: Python void-workers)

  • void-workers.service — Python 3.13 service alongside void-server on CT 311. psycopg-based pg-boss client matches Node's claim/finish semantics via SELECT ... FOR UPDATE SKIP LOCKED. Forces client_encoding=UTF8 on every connection (void2-db cluster is SQL_ASCII).
  • extract.pdfpdftotext -layout first; per-page pdftoppm rasterization + Tesseract OCR fallback when extraction yields < 200 chars.
  • extract.image — Tesseract OCR (English) for images stored in the blob store.
  • ingest.videoyt-dlp metadata + audio extract + faster-whisper (small.en default). CUDA at startup; CPU fallback when HA failover to Z3 (no GPU) happens. URLs validated as http(s) and -- separator passed to yt-dlp to defeat argv smuggling.
  • sync.source_doc — fetches upstream_url via Python safe_fetch (port of the Node helper) + sha256-diffs against the prior body_sha in metadata; updates body_text only when content changed.
  • Node blob.js fans out to extract.pdf / extract.image after creating PDF / image refs.
  • Node capture.js routes youtube.com / youtu.be / vimeo.com URLs to ingest.video instead of ingest.url.
  • Daily cron (lib/cron/sync_source_docs.js) enqueues sync.source_doc jobs at 03:00 local for every source_docs row with sync_source='url'.
  • CT 311 infrastructure: resized to 6 cores / 8 GB RAM, NVIDIA RTX A2000 device-nodes passed through (shared with CT 102's Ollama).
  • deploy/push-workers.sh + deploy/void-workers.service — push the workers package, chown to voidworkers, recreate the venv, install deps under su voidworkers -c, restart the unit.

[2.0.0-alpha.3] — 2026-06-01

  • pg-boss job queue embedded in void-server (Node). Queue tables live alongside Void's in the shared void2-db. Tests manage their own boss lifecycle via stopBoss() / waitForJob() helpers.
  • /api/jobs (owner-only) — list / get / retry / delete with state and name filters. Minimal #/jobs SPA view fronts it, polling every 10 s.
  • /api/capture POST — URL → ingest.url job. Idempotent by sha256(space_id + url) stored as refs.external_id; duplicate POST returns the existing ref_id.
  • /api/capture/upload — multipart file → ingest.blob job → content-addressed /var/lib/void/blobs/<sha-prefix>/<sha>refs row. Drag-drop in the SPA wired to the main panel; space_id pre-filled from the last-viewed space.
  • ingest.url worker@mozilla/readability + jsdom extract; fetch protected by lib/ingest/safe_fetch.js (SSRF mitigations: http(s) only; DNS-resolved hostnames checked against loopback / RFC1918 / link-local / CGNAT / metadata; resolved IP pinned via an undici dispatcher to defeat DNS rebinding; redirects re-validated).
  • ingest.blob worker — content-addressed storage, image/pdf/file kind classification.
  • embed.text worker — Ollama nomic-embed-text (768 dims) padded to vector(1024); emits a worker-actor audit log entry.
  • Repo-level triggers — pages/refs/source_docs create and update enqueue an embed.text job with a singleton key so rapid edits coalesce. No-op when the queue is not running (tests).
  • Hybrid /api/search — FTS + pgvector ANN unioned with reciprocal rank fusion (k=60). Vector branch silently skipped when Ollama times out, leaving FTS-only results — graceful degrade.
  • /api/ingest/karakeep — HMAC-verified webhook. Enqueues ingest.karakeep for bookmark.created; worker fetches the bookmark via Karakeep's API, normalizes to a refs row tagged source_kind='karakeep'.

Deferred (Plan 4+)

  • Python void-workers service for Whisper / Tesseract OCR / yt-dlp (heavy ML).
  • AI Space/Project suggestion on capture.
  • Embedding chunks table (whole-doc embedding only in Plan 3).
  • pdftotext for born-digital PDFs.
  • pg LISTEN/NOTIFY real-time Jobs UI.

[2.0.0-alpha.2] — 2026-06-01

Added (Plan 2: API surface + UI shell)

  • REST routes for the full entity tree:
    • /api/spaces, /api/projects, /api/tasks (with project + space scoping)
    • /api/pages + page revisions + /api/pages/:id/backlinks
    • /api/refs + /api/refs/upsert
    • /api/resources + dependencies + change history
    • /api/resources/:id/source-docs + /api/source-docs/:id/resync (gated by ENABLE_RESYNC)
    • /api/agents (owner-only) + agent token mint/revoke
    • /api/conversations + nested /messages
    • /api/tags + entity-scoped attach/detach via /api/:entity_type/:entity_id/tags
    • /api/links (POST/GET from|to/DELETE) for polymorphic entity links
    • /api/pending-changes + approve/reject with dispatch table covering page/project/task/ref/resource/source_doc × create/update/delete
    • /api/audit/entity/:type/:id + /api/audit/actor
    • /api/search unified FTS across pages, refs, source docs, messages
  • Agent bearer auth middleware + capability tiering: owner allow, agent write+scope → allow, agent suggest → 202 + pending row, else 403.
  • Approve and reject emit explicit approve / reject entries in the audit log with the original agent id preserved in the diff.
  • Static SPA shell served from public/:
    • Three-column Cradle aesthetic (blackflame palette, Cinzel display headings, Cormorant Garamond body)
    • Hash-based router with views for home / space / project / page / reference / resource / search / inbox / sacred valley
    • dom.js safe builders — no innerHTML on API data anywhere; the explicit html: opt-in is used only by the markdown editor's preview pane, which sanitizes with DOMPurify
    • Sidebar Spaces tree with lazy project expansion, bottom Navigate section, pending-count badge shared with the topbar bell via a tiny state.js event bus
    • Topbar: brand, capture modal stub, global search (Enter → #/search?q=), pending bell, owner toggle
    • Page editor: split-pane markdown via marked + DOMPurify, save PATCHes /api/pages/:id, backlinks card
    • Reference detail: media block (image / YouTube embed / link), summary, metadata table, tag attach/detach, linked-from list
    • Resource detail: status header, dependencies + source docs + runbook pages columns, change history
    • Inbox: pending changes grouped by agent, approve → navigate to the resulting entity
  • Test coverage: 185 tests across 43 files (113 new for Plan 2 routes + search + GET / shell smoke).

Security follow-ups (deferred)

  • Polymorphic IDOR risk on entity_links / entity_tags / attachments — acceptable today since the entire API is owner-token gated and there is one tenant; see docs/security-followups.md for the tighten-now vs defer decision.
  • pending_changes.action CHECK constraint blocks 'upsert' / 'add_dependency' / 'remove_dependency' actions emitted by some routes' divertToPending paths. Latent — only fires when an agent at suggest tier hits those specific endpoints. Mitigation options documented in docs/security-followups.md.

[Unreleased]

Added

  • Initial repo scaffolding

Added (Plan 1: Foundation)

  • LXC provisioning for void2-db (Postgres 16 + pgvector) and void2-app
  • Schema migrations 001-006 covering core, knowledge, resources, agents, cross-cutting, audit
  • Repos with capability-checked actor parameter and audit trail
  • Real audit log with redaction of sensitive keys (token, password, api_key, etc.)
  • pending_changes table for agent suggestions awaiting owner approval
  • Capability check module (allow / suggest / deny) for user vs agent actors
  • Owner-token bearer auth
  • Express server with /health and smoke /api/spaces
  • Test coverage: 72 tests across migrations, repos, capability, owner middleware, server