Files
Void-Homelab/CHANGELOG.md

310 lines
23 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Changelog
All notable changes to Void 2.0 are documented here.
Format: [Keep a Changelog](https://keepachangelog.com).
## 2.0.0-alpha.20 — Page ordering + sectioned space view
- **Explicit page ordering** (`migration 020`, `lib/db/repos/pages.js`): pages gain a `position integer` column; `listBySpace` now orders `position, title` instead of alphabetical-only, with a covering index `(space_id, position, title)`. `position` is patchable via `PUT /api/pages/:id`. Backfills all rows to `0` (preserves prior title order until positions are set).
- **Sectioned page tree** (`public/views/space.js`): the flat pages table is replaced by a `parent_id`-grouped tree — top-level pages render as section headers with their children/grandchildren nested. Backward-compatible with flat (un-nested) spaces. Enables the Wiki to read as ordered, sectioned documentation rather than an alphabetical dump.
## 2.0.0-alpha.19 — Whisper GPU sharing + mobile chat Send button + registry
- **Whisper on GPU with graceful CPU fallback** (`workers/void_workers/model.py`): the STT worker uses the in-container NVIDIA driver on the GPU node, and **falls back to CPU on any load failure** (e.g. shared-card VRAM exhaustion) so a transcription never hard-fails. (Passthrough alone gave device nodes but no `libcuda` — the matching userspace driver was installed inside CT 311; see [[gpu-cpu-fallback-for-ha]].)
- **Cooperative GPU sharing with Ollama** (`workers/void_workers/gpu.py`): before loading Whisper on CUDA, the worker asks the co-resident Ollama (CT 102, same A2000) to unload its models (`GET /api/ps` + `POST /api/generate keep_alive:0`) and waits for the card to clear; Ollama reloads on its next request. Best-effort, stdlib-only; toggle `OLLAMA_FREE_BEFORE_STT`, endpoint `OLLAMA_URL`.
- **Mobile chat Send button**: the agent composers (Companion, Yerin, Little Blue) gained a themed Send button — mobile soft keyboards have no reliable Enter-to-send. Wired via `wireAgentChat`'s `sendBtnEl`; Enter-to-send kept for desktop.
- **Service registry**: added **Chaptarr** (Readarr fork, ebooks + audiobooks; mediastack `chaptarr.hynesy.com`) to the homelab health band.
## 2.0.0-alpha.18 — Plan 8b cutover: `void.hynesy.com` now serves Void 2
- **Go-live.** `void.hynesy.com` (CT 301 → Void 1) is repointed at **Void 2** (CT 311, `.216:3000`) at the Traefik edge. Void 1 is now **legacy** — CT 301 stays running untouched as an instant-rollback fallback; nothing is retired or renamed yet. The `-alpha` tag is intentionally **kept** pending owner sign-off.
- **CF Access multi-aud** (`lib/auth/cf_access.js`): `CF_ACCESS_AUD` now accepts a **comma-separated allow-list** so a request through *either* CF Access app — `void.hynesy.com` (aud `0e7190f4…`) or `void2-app.hynesy.com` (aud `a381f270…`) — is honoured as owner. Still fails closed; an unlisted aud is rejected. Prod env updated to carry both auds.
- Cutover is fully reversible: revert the Traefik `void` service URL to `http://192.168.1.11:2424` and `docker restart traefik`.
## 2.0.0-alpha.17 — Settings, project management, terminal, AI Usage, "The Void" space + UI polish
- **Settings** (`#/settings`): API tokens (mint/list/revoke), Agents list with an expandable **profile viewer** (persona/"soul" + capabilities/scopes via `GET /api/agents/:id/profile`), Orthos Mode placeholder.
- **Per-space project management**: Void-1-style expandable cards with inline **status**, **Details**, **Tasks**, **Linked references**, **↻ Research** (Eithan stub → `POST /api/projects/:id/research`), Edit/New modal, Delete-with-confirm. Migration 019 adds research fields; `GET /api/projects/:id/links` resolves linked pages/refs.
- **Terminal tab** (`#/terminal`): embedded blackflame `ttyd` → persistent `tmux`/`claude` on CT 300; works via Traefik (CF-Access) **and** the LAN IP (app proxies `/terminal` + its WebSocket to ttyd).
- **AI Usage** Sacred Valley card + `GET /api/ai-usage` — summarises the Homelab Monitor (Claude tokens + local OpenClaw/Ollama p50/p95).
- **"The Void" space**: Void 1.x / Void 2.0 / Void 3.0 as projects (tasks + linked references), charting the project's evolution.
- **Migration**: BookStack re-imported with Book Chapter Page hierarchy; Void 1 project `research_notes` backfilled.
- **UI**: page header actions (Edit/Revisions/Export), breadcrumbs, themed markdown tables, `Cache-Control: no-cache`, live sidebar active-sync, **hybrid sidebar** (Spaces/Agents/Navigate + active pill + agent dots), themed scrollbars + topbar, **+1 font bump**, Sentinel → **Yerin** (red).
## 2.0.0-alpha.16 — Little Blue + action framework (Agent Layer brick 2)
- **Little Blue**, the caretaker fix-it agent, is online at `#/little-blue`: chat + a manual Actions panel. She can **restart whitelisted services** and **power-manage Proxmox guests**`safe` actions run on her word, `risky` ones queue for your approval.
- **Least-privilege action framework:** a version-controlled whitelist (`config/actions.json`), two server-side-enforced channels (scoped **Proxmox API token** + **SSH forced-command wrapper**), tiered approval, and a full `agent_actions` audit trail. Infra creds live ONLY in the main server; Little Blue's MCP child proposes actions via the local API with a scoped token — it can only name a whitelisted id, never a command.
## 2.0.0-alpha.15 — Yerin online (Agent Layer brick 1)
- **Yerin**, the read-only security agent, is now a usable agent: a global `#/sentinel` chat surface backed by her 5 security tools (audit/agents/pending/exposure/tokens). She investigates + reports; she never acts.
- Extracted the **shared agent-chat foundation**`runAgentTurn` (backend) + `agent_chat` (frontend) — now used by both Dross and Yerin. Personas live in `lib/ai/personas/`.
## 2.0.0-alpha.14 — MCP HTTP transport for external agents
- **MCP Streamable HTTP** at `/mcp`: external agents can connect over the network, authenticated by a Space-scoped Void agent bearer (owner / CF-Access identities are rejected here — external agents never inherit owner powers; CF Access service tokens gate the hostname at the edge).
- **Read + suggest-only:** a dedicated external registry exposes `search` / `read` / `context` + `propose_change` (which always routes to the pending-changes inbox, `applied:false`). Kept separate from Dross's registry so future companion tools never auto-leak.
- The `read` tool now **enforces Space membership** for bound callers; reads are hard-scoped to the agent's bound Space (client-supplied space args are ignored). Per-token rate limit + audit on every external tool call.
## 2.0.0-alpha.13 — Finer Sacred Valley tile scaling
- Cards now sit on a 12-column grid with a per-card width **/+ stepper** (span 112) in edit mode, replacing the coarse S/M/L. "Small" defaults to 1/6 width (half its previous size) so clock/weather aren't oversized.
- Layout `sizes` now store an integer column span (legacy 's'/'m'/'l' still accepted).
## 2.0.0-alpha.12 — Editable Sacred Valley layout
- "Edit layout" mode on the dashboard: per-card **resize** (S/M/L column span), **show/hide** (with a hidden-cards tray to re-add), clearer **drag-to-reorder** via a dedicated grip handle, and a **Reset** to defaults.
- All changes persist through the existing `/api/dashboard/layout` (order/sizes/hidden) — no backend changes.
## 2.0.0-alpha.11 — DB-backed service registry + LAN auto-discovery
- The health-band registry is now in Postgres (`monitored_services`, migration 015) instead of the hand-edited `config/services.json` — which becomes a one-time boot seed (auto-populated if the table is empty).
- Owner CRUD over the registry: `POST/PATCH/DELETE /api/health/services` (add/edit/enable/disable/remove); `GET /api/health/services` is now DB-backed.
- LAN auto-discovery: `discover.lan` pg-boss worker (pure-Node TCP sweep + HTTP-title probe, no nmap) + `POST /api/health/discover`. Found host:ports become **disabled `discovered` candidates** that never clobber curated entries; `GET /api/health/services/discovered` lists them.
- Dashboard: a "Scan" button + a "Discovered (N new)" section in Little Blue's band, with one-click promote.
## 2.0.0-alpha.10 — Cloudflare Access SSO as owner auth
- Browser requests through the CF tunnel no longer need the owner token copied onto each device: a cryptographically-verified Cloudflare Access JWT (`Cf-Access-Jwt-Assertion`) for an allow-listed email now counts as the owner (`lib/auth/cf_access.js`, wired into `agentOrOwner`).
- Security: verifies signature against the team JWKS + audience (app AUD) + email allow-list; the plain email header is never trusted alone. Fails closed → falls back to the owner token (LAN-direct `:3000` path and dev/tests unaffected).
- Opt-in via env: `CF_ACCESS_TEAM_DOMAIN`, `CF_ACCESS_AUD`, `CF_ACCESS_OWNER_EMAILS` (absent → feature disabled).
## 2.0.0-alpha.9 — Hardening pass (Void 3.0 quick wins)
- Security: prod `void` DB role revoked SUPERUSER (CT 310; `vector` marked trusted so the test harness still creates it as non-superuser). An app-process compromise no longer escalates to full-cluster compromise.
- Security: the `claude` companion subprocess now gets an explicit env allow-list (`buildChildEnv`) instead of the full `process.env``OWNER_TOKEN`/`DATABASE_URL`/Karakeep/ANTHROPIC secrets no longer reach the CLI. MCP tools are unaffected (they get DB env via the explicit `--mcp-config`).
- Correctness: pending-change **approve** now claims the change (atomic `WHERE status='pending'`) *before* applying, and reopens it on apply failure — eliminates the re-approvable duplicate after a crash.
- Hardening: `/api/capture/upload` validates `space_id` (UUID + existence); pg pool gets a 30s `statement_timeout`.
- Ops: disabled the failing `syncoid-donatello` timer on Z (pools out pending parts).
- Deferred: per-space tag uniqueness needs a `space_id` column on `tags` → folded into the polymorphic-`space_id` project.
## 2.0.0-alpha.8 — Sacred Valley (Plan 6)
- Two-band #/sacred-valley dashboard: draggable data cards (clock, weather, host-perf, speedtest, jobs, inbox, search) with server-persisted layout (custom CSS-grid reorder, no resize).
- Little Blue Health band: config service registry, 60s pg-boss health checks, grouped status tiles, locally-cached service icons (no CDN leak).
- New endpoints: /api/dashboard/layout, /api/weather, /api/host, /api/speedtest/{history,run}, /api/health/{services,check}, /api/icons/:slug.png.
- Migrations 012 (dashboard_layout), 013 (speedtest_results), 014 (service_status).
## [2.0.0-alpha.7] — 2026-06-02
### Security & hardening
- **`pending_changes.action` CHECK fix** (migration 009): `upsert` is now a valid
suggestion action (approval dispatches to `refsRepo.upsertByExternal`); resource
dependency mutations (`add_dependency`/`remove_dependency`) are now owner-only.
- **Constant-time owner-token comparison** (`lib/auth/safe_compare.js`) — replaces
`===`, closing a timing side-channel on `OWNER_TOKEN`.
- **O(1) token verification** (migration 010): selector+verifier split replaces the
O(n) bcrypt scan over all tokens. New format `vk_<selector>.<verifier>`; legacy
tokens still verify. Dropped the useless `idx_agent_tokens_hash`.
- **`pool.js` error handler** — an idle pooled-client error no longer crashes the
process.
- **`context` tool** projects a safe column allow-list for resources (no
`monitoring`/`metadata` blobs); now also handles `resource` views.
- **`applyPendingChange`** guards the `upsert` arm (clear `ValidationError`).
### Added (Yerin — security agent)
- Read-only `securityRegistry` (`lib/ai/agent/tools/security/`) with five tools:
`audit_log`, `agent_inventory`, `pending_review`, `resource_exposure`,
`token_audit` — no secret material in any output.
- Migration 011 seeds the read-only `yerin` agent.
- The stdio MCP server selects its toolset via `VOID_TOOL_REGISTRY`
(`security` → Yerin's tools; default → Dross's companion tools).
## [2.0.0-alpha.6] — 2026-06-01
### Changed (Plan 5b: companion backend → Claude CLI subprocess)
- **Companion model backend switched from the Anthropic API to the `claude`
CLI subprocess**, authenticated by the owner's **Claude Max subscription**
(no API key — the Agent SDK can't use subscription auth headlessly, and Max
doesn't issue API keys). Mirrors Void 1.0's `lib/agent.js`: spawn `claude`
with `ANTHROPIC_API_KEY`/`ANTHROPIC_AUTH_TOKEN` stripped so it uses the
logged-in subscription. The CLI owns the agentic loop; the four companion
tools are exposed to it via a local **stdio MCP server** (`lib/mcp/`).
- `lib/ai/claude_cli.js` — spawns `claude --print --output-format stream-json
--include-partial-messages --append-system-prompt … (--session-id | --resume)
--mcp-config … --strict-mcp-config --tools … --allowedTools …`, maps stream-json
→ `{delta,tool,tool_result,result,error}`. Prompt fed via **stdin** (variadic
`--tools` would eat a positional). Multi-turn continuity via `--resume`.
- `lib/mcp/companion-stdio.js` — stdio MCP server re-exposing `companionRegistry`;
per-turn Space/agent context passed via env in the `--mcp-config`.
- `propose_change` now stamps the current Space onto created space-scoped
entities (model can't know the Space uuid).
- CT 311 runs the `claude` CLI (logged in as `void`, `HOME=/var/lib/void`).
- Built-in CLI tools (Bash/Read/Write/…) disabled via `--tools`; the companion
has only the four `mcp__void__*` tools.
- The old `@anthropic-ai/sdk` API-key path (`lib/ai/anthropic.js`, `runTurn`)
is retained in-tree but no longer the companion's execution path.
## [2.0.0-alpha.5] — 2026-06-01
### Added (Plan 5: Companion chat)
- **Right-rail companion chat** — an always-visible, per-Space AI assistant.
Label-led turns (YOU / Companion) with left/right alignment, live
tool-activity chips, streamed answers (markdown via DOMPurify), and inline
approve/reject draft cards. Loads its space's history on first paint via the
`space-active` state event.
- **Lean agent runtime** (`lib/ai/agent/runtime.js`) on the Anthropic SDK
directly — no Mastra. `runTurn` drives a tool-use loop (max-iteration
guarded), streams text deltas, and emits `tool` / `delta` / `draft` events.
`callModel` is injectable (the SSE endpoint takes a fake in tests, so the
suite never hits the network).
- **Extensible shared tool registry** (`lib/ai/agent/registry.js`) with four
v1 tools: `search` (hybrid FTS), `read`, `context` (resolves the active
view), and `propose_change`. Adding a tool is a one-line `registerTool`;
a future MCP server re-exposes the same defs.
- **`propose_change` never applies** — it only writes a `pending_changes` row,
capability-gated via `canAct` (default `suggest`). Prompt-injection
containment is structural: a poisoned document can at most produce a draft
the owner must approve. Drafts render inline in chat AND in the Inbox (same
row; approving from either flips it).
- **Companion API** — `GET /api/spaces/:id/companion` (history) and
`POST /api/spaces/:id/companion/turn` (SSE). One ambient conversation per
Space (`conversations.space_id` via migration 007); one assistant message
per turn with the tool trace + draft ids in `metadata`.
- **`@anthropic-ai/sdk`** dependency; key resolved via the `env:`/`file:`
`vault_path` resolver (`lib/ai/secret.js`) — Vaultwarden swap still deferred.
- Default model `claude-sonnet-4-6`, overridable per-agent (`agents.model`)
and via `ANTHROPIC_MODEL` — the seam for scope-C local personas.
## [2.0.0-alpha.4] — 2026-06-01
### Added (Plan 4: Python void-workers)
- **`void-workers.service`** — Python 3.13 service alongside `void-server`
on CT 311. psycopg-based pg-boss client matches Node's claim/finish
semantics via `SELECT ... FOR UPDATE SKIP LOCKED`. Forces
`client_encoding=UTF8` on every connection (void2-db cluster is
SQL_ASCII).
- **`extract.pdf`** — `pdftotext -layout` first; per-page `pdftoppm`
rasterization + Tesseract OCR fallback when extraction yields
< 200 chars.
- **`extract.image`** — Tesseract OCR (English) for images stored in
the blob store.
- **`ingest.video`** — `yt-dlp` metadata + audio extract + faster-whisper
(`small.en` default). CUDA at startup; CPU fallback when HA failover
to Z3 (no GPU) happens. URLs validated as http(s) and `--` separator
passed to yt-dlp to defeat argv smuggling.
- **`sync.source_doc`** — fetches `upstream_url` via Python `safe_fetch`
(port of the Node helper) + sha256-diffs against the prior body_sha
in metadata; updates body_text only when content changed.
- **Node `blob.js`** fans out to `extract.pdf` / `extract.image` after
creating PDF / image refs.
- **Node `capture.js`** routes `youtube.com` / `youtu.be` / `vimeo.com`
URLs to `ingest.video` instead of `ingest.url`.
- **Daily cron** (`lib/cron/sync_source_docs.js`) enqueues
`sync.source_doc` jobs at 03:00 local for every `source_docs` row
with `sync_source='url'`.
- **CT 311 infrastructure**: resized to 6 cores / 8 GB RAM, NVIDIA
RTX A2000 device-nodes passed through (shared with CT 102's Ollama).
- **`deploy/push-workers.sh`** + `deploy/void-workers.service` — push
the workers package, chown to `voidworkers`, recreate the venv, install
deps under `su voidworkers -c`, restart the unit.
## [2.0.0-alpha.3] — 2026-06-01
### Added (Plan 3: Capture pipeline + hybrid search)
- **pg-boss job queue** embedded in void-server (Node). Queue tables live
alongside Void's in the shared void2-db. Tests manage their own boss
lifecycle via `stopBoss()` / `waitForJob()` helpers.
- `/api/jobs` (owner-only) — list / get / retry / delete with state and
name filters. Minimal `#/jobs` SPA view fronts it, polling every 10 s.
- **`/api/capture`** POST — URL → `ingest.url` job. Idempotent by
`sha256(space_id + url)` stored as `refs.external_id`; duplicate POST
returns the existing `ref_id`.
- **`/api/capture/upload`** — multipart file → `ingest.blob` job →
content-addressed `/var/lib/void/blobs/<sha-prefix>/<sha>` →
`refs` row. Drag-drop in the SPA wired to the main panel; `space_id`
pre-filled from the last-viewed space.
- **`ingest.url` worker** — `@mozilla/readability` + `jsdom` extract;
fetch protected by `lib/ingest/safe_fetch.js` (SSRF mitigations:
http(s) only; DNS-resolved hostnames checked against loopback /
RFC1918 / link-local / CGNAT / metadata; resolved IP pinned via an
undici dispatcher to defeat DNS rebinding; redirects re-validated).
- **`ingest.blob` worker** — content-addressed storage,
image/pdf/file kind classification.
- **`embed.text` worker** — Ollama `nomic-embed-text` (768 dims) padded
to `vector(1024)`; emits a `worker`-actor audit log entry.
- **Repo-level triggers** — pages/refs/source_docs `create` and
`update` enqueue an `embed.text` job with a singleton key so rapid
edits coalesce. No-op when the queue is not running (tests).
- **Hybrid `/api/search`** — FTS + pgvector ANN unioned with reciprocal
rank fusion (k=60). Vector branch silently skipped when Ollama times
out, leaving FTS-only results — graceful degrade.
- **`/api/ingest/karakeep`** — HMAC-verified webhook. Enqueues
`ingest.karakeep` for `bookmark.created`; worker fetches the bookmark
via Karakeep's API, normalizes to a `refs` row tagged
`source_kind='karakeep'`.
### Deferred (Plan 4+)
- Python `void-workers` service for Whisper / Tesseract OCR / yt-dlp
(heavy ML).
- AI Space/Project suggestion on capture.
- Embedding chunks table (whole-doc embedding only in Plan 3).
- pdftotext for born-digital PDFs.
- `pg LISTEN/NOTIFY` real-time Jobs UI.
## [2.0.0-alpha.2] — 2026-06-01
### Added (Plan 2: API surface + UI shell)
- REST routes for the full entity tree:
- `/api/spaces`, `/api/projects`, `/api/tasks` (with project + space scoping)
- `/api/pages` + page revisions + `/api/pages/:id/backlinks`
- `/api/refs` + `/api/refs/upsert`
- `/api/resources` + dependencies + change history
- `/api/resources/:id/source-docs` + `/api/source-docs/:id/resync` (gated by `ENABLE_RESYNC`)
- `/api/agents` (owner-only) + agent token mint/revoke
- `/api/conversations` + nested `/messages`
- `/api/tags` + entity-scoped attach/detach via `/api/:entity_type/:entity_id/tags`
- `/api/links` (POST/GET from|to/DELETE) for polymorphic entity links
- `/api/pending-changes` + approve/reject with dispatch table covering
page/project/task/ref/resource/source_doc × create/update/delete
- `/api/audit/entity/:type/:id` + `/api/audit/actor`
- `/api/search` unified FTS across pages, refs, source docs, messages
- Agent bearer auth middleware + capability tiering: owner allow, agent
`write+scope` → allow, agent `suggest` → 202 + pending row, else 403.
- Approve and reject emit explicit `approve` / `reject` entries in the
audit log with the original agent id preserved in the diff.
- Static SPA shell served from `public/`:
- Three-column Cradle aesthetic (blackflame palette, Cinzel display
headings, Cormorant Garamond body)
- Hash-based router with views for home / space / project / page /
reference / resource / search / inbox / sacred valley
- `dom.js` safe builders — no `innerHTML` on API data anywhere; the
explicit `html:` opt-in is used only by the markdown editor's
preview pane, which sanitizes with DOMPurify
- Sidebar Spaces tree with lazy project expansion, bottom Navigate
section, pending-count badge shared with the topbar bell via a tiny
`state.js` event bus
- Topbar: brand, capture modal stub, global search (Enter →
`#/search?q=`), pending bell, owner toggle
- Page editor: split-pane markdown via marked + DOMPurify, save
PATCHes `/api/pages/:id`, backlinks card
- Reference detail: media block (image / YouTube embed / link),
summary, metadata table, tag attach/detach, linked-from list
- Resource detail: status header, dependencies + source docs +
runbook pages columns, change history
- Inbox: pending changes grouped by agent, approve → navigate to the
resulting entity
- Test coverage: 185 tests across 43 files (113 new for Plan 2 routes +
search + GET / shell smoke).
### Security follow-ups (deferred)
- Polymorphic IDOR risk on entity_links / entity_tags / attachments —
acceptable today since the entire API is owner-token gated and there
is one tenant; see `docs/security-followups.md` for the tighten-now
vs defer decision.
- `pending_changes.action` CHECK constraint blocks `'upsert'` /
`'add_dependency'` / `'remove_dependency'` actions emitted by some
routes' `divertToPending` paths. Latent — only fires when an agent at
suggest tier hits those specific endpoints. Mitigation options
documented in `docs/security-followups.md`.
## [Unreleased]
### Added
- Initial repo scaffolding
### Added (Plan 1: Foundation)
- LXC provisioning for `void2-db` (Postgres 16 + pgvector) and `void2-app`
- Schema migrations 001-006 covering core, knowledge, resources, agents, cross-cutting, audit
- Repos with capability-checked `actor` parameter and audit trail
- Real audit log with redaction of sensitive keys (token, password, api_key, etc.)
- `pending_changes` table for agent suggestions awaiting owner approval
- Capability check module (allow / suggest / deny) for user vs agent actors
- Owner-token bearer auth
- Express server with `/health` and smoke `/api/spaces`
- Test coverage: 72 tests across migrations, repos, capability, owner middleware, server