docs: move void-v2 specs + plans into the repo

All Void 2.0 superpowers specs and implementation plans now live at docs/superpowers/{specs,plans}/ inside the repo. Previously they were at /project/docs/superpowers/ which was not under git. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:11:32 +10:00
parent 24ce601d94
commit 54ba68a11c
6 changed files with 7334 additions and 0 deletions
--- a/docs/superpowers/specs/2026-05-31-void-v2-design.md
+++ b/docs/superpowers/specs/2026-05-31-void-v2-design.md
@@ -0,0 +1,656 @@
+# Void 2.0 — Homelab Orchestrator & Knowledge Foundation
+
+**Status:** IN PROGRESS — brainstorming, not yet a complete design
+**Started:** 2026-05-31
+**Owner:** mrhynesy@gmail.com
+
+> This document is being filled in section by section as brainstorming progresses.
+> Sections below marked `[locked]` are user-approved decisions. Sections marked
+> `[pending]` are the remaining design work to complete before this becomes a
+> proper spec ready for the writing-plans skill.
+
+---
+
+## Vision [locked]
+
+Replace the current scattered homelab state (Void dashboard, Karakeep bookmarks,
+BookStack wiki, `/root/.claude/plans/*.md`, auto-memory entries, ad-hoc browser
+tab groups) with a single **Void 2.0** — a homelab orchestrator that:
+
+- Acts as the canonical home for projects, tasks, knowledge, and deployed-resource
+  state
+- Ingests websites, videos, PDFs, screenshots, and files into a unified library
+- Mirrors upstream documentation locally for offline + agent access
+- Surfaces all of it to Claude and local AI agents via MCP, with per-agent
+  permission tiers
+- Preserves the Void's Cradle-themed aesthetic and agent personas
+- Stays available during planned host maintenance via `pct migrate`
+  (no automatic failover)
+- Maintains privacy + security with selective remote access
+
+Primary capture pain being solved: **"multiple grouped Chrome tabs as a poor
+project-management substitute."** Void 2.0 makes that proper.
+
+---
+
+## Direction & HA Shape [locked]
+
+**Chosen direction:** Foundation-first Void 2.0 (Option 2 from initial framing).
+Not an evolution of Void — a clean rebuild with Void as the visible UI on top.
+
+**HA model:** Planned-maintenance only. User instructs the stack before host
+shutdown; Proxmox live-migrates the LXCs to another node (~10-60s pause). No
+automatic failover, no quorum, no clustering complexity.
+
+**Infrastructure:**
+
+| LXC | Purpose | Stateful? |
+|---|---|---|
+| `void2-db` | Postgres + pgvector | Yes — the canonical store |
+| `void2-app` | Node API + Python workers + Void UI + cron | No (data in `void2-db`) |
+
+Future-improvements list (parked):
+- Build own bookmark capture front-end to replace Karakeep
+- Extract MCP server to its own LXC if it grows independent
+- True clustering if "instant failover" becomes a need
+
+---
+
+## Entity Map [locked]
+
+| Entity | Lives in | Contains / links to |
+|---|---|---|
+| **Space** | top-level | Projects, Tasks, Pages, Refs, SourceDocs, Conversations, Resources |
+| **Project** | a Space | Tasks (children); has-many Pages, Refs, SourceDocs, Conversations, Resources |
+| **Task** | a Space, optionally also a Project | Pages, Refs, Conversations |
+| **Page** (authored) | tagged | backlinks, attachments — your notes + AI-assisted commentary |
+| **Reference** (captured) | tagged | source URL, local snapshot, metadata — websites/videos/PDFs/files/images |
+| **Source Doc** (mirrored upstream) | bound to a Resource | version, last-synced, sync source — official docs from publisher |
+| **Conversation** | attaches to Space/Project/Task/Resource | Messages — first-class, multi-agent |
+| **Resource** (deployed service, rich) | a Space | dependencies, credentials refs, source docs, runbook pages, change history, monitoring config |
+
+**Relationships are explicit, not implied.** Any entity can attach to any other
+via typed links (`project_pages`, `task_refs`, `resource_source_docs`, etc.).
+
+---
+
+## Capture Pipeline [locked — day-one inputs]
+
+Day-one capture inputs:
+1. **URLs / bookmarks** — Karakeep stays as inbox; webhook flows new bookmarks
+   into Void 2.0 as References (with AI-suggested Project/Space tagging)
+2. **YouTube / web videos** — `yt-dlp` for metadata + transcript; local Whisper
+   if no transcript; AI summary + chapters via Ollama
+3. **PDFs / documents** — text extract or Tesseract OCR; AI summary; full text
+   indexed
+4. **Screenshots / images** — Tesseract OCR; AI summary
+5. **Generic files** — blob storage on host; indexed by name + tags
+
+All AI summarization runs against local Ollama (CT 102).
+
+---
+
+## Agent Model [locked]
+
+**Per-agent capability tiers.** Each AI agent (Claude, Mercy, Orthos, Dross,
+Eithan, Lindon, Yerin, Little Blue, future agents) has its own capability record.
+
+- **Default for all agents:** `read` + `suggest`. Agents can search/read
+  anything. Writes are *drafts* in a "pending changes" inbox the user approves.
+- **Promotable per agent:** `write` capability, scoped (e.g., Mercy gets
+  write-on-Pages but not Resources)
+- **Audit log:** every agent action recorded with `agent_id` + timestamp + diff
+
+MCP surface exposes Void 2.0 to Claude Code, Open WebUI, OpenClaw, and future
+agents through the same interface.
+
+---
+
+## Build Approach [locked]
+
+**Approach A — Greenfield modular monolith.**
+
+- New repo at `/project/src/void-v2`
+- Two processes on `void2-app` LXC:
+  - **`void-server`** (Node) — REST API + MCP + Void UI + cron + light ingest
+    (Karakeep webhook)
+  - **`void-workers`** (Python) — heavy ML ingest: yt-dlp, Whisper, Tesseract,
+    PDF extract, embeddings via Ollama
+- Postgres + pgvector on `void2-db` LXC
+- Copy across from current Void (without inheriting its structure): agent
+  persona files, blackflame theme CSS, Cradle naming, cron task list, schema
+  YAMLs as initial Resource seed data
+- Old Void on CT 301 keeps running until cutover; then archived
+
+---
+
+## Architecture Details [locked]
+
+### Two processes, one job queue, strict boundaries
+
+**`void-server` (Node)** owns: HTTP API, MCP server, Void UI, cron, agent
+runtime, light ingest (Karakeep webhook, manual paste). Internal layout:
+
+```
+lib/
+  db/          Postgres pool, migrations, repos/ (one file per entity)
+  api/         HTTP routes (thin — just call repos)
+  mcp/         MCP server, tool definitions, per-agent capability checks
+  ingest/      Karakeep webhook, manual capture
+  jobs/        Enqueue heavy work for workers (pg-boss client)
+  cron/        Scheduler + one file per task
+  agents/      Cradle persona runtime (Claude subprocess + Ollama via Mastra)
+```
+
+**Boundary rule:** HTTP and MCP both reach data only via `repos/`. No raw SQL in
+routes. Same repos enforce per-agent capability checks. This is what makes any
+later extraction (e.g., MCP as its own service) painless.
+
+**`void-workers` (Python)** owns heavy ML ingest. One worker per kind:
+`video.py` (yt-dlp + Whisper), `pdf.py` (pdftotext / Tesseract), `image.py`
+(Tesseract), `file.py` (blob + indexing), `sourcedoc.py` (mirror upstream docs).
+They poll the job queue, claim work, write results to DB.
+
+### Job queue: pg-boss
+
+Postgres-backed, Node + Python clients. We don't add Redis/RabbitMQ — the DB is
+already there. Failed jobs retry with backoff, then land in a dead-letter table.
+
+**Redis rejected** — Postgres-on-local-LXC is sub-millisecond for indexed
+queries; the bottlenecks in Void 2.0 will be Ollama/Whisper/OCR (seconds–minutes),
+not the DB. Adding Redis would buy invisible perf wins at the cost of cache
+invalidation complexity and another LXC to manage. Reconsider only if profiling
+shows a specific bottleneck.
+
+### Caching, if needed
+
+- **In-process LRU** (JS `Map` with size cap) inside `void-server` for hot
+  lookups. Zero ops cost.
+- **`pg LISTEN/NOTIFY`** for real-time UI updates (transcription progress, etc.)
+  if/when we want them. Built into Postgres — no extra service.
+
+### Cron
+
+Lives only in `void-server` (single process — no leader election needed).
+Light tasks run in-process; heavy tasks enqueue worker jobs.
+
+### Audit log
+
+Append-only. Every mutating call (HTTP, MCP, cron, worker) writes one row:
+`actor_kind`, `actor_id`, `entity_type`, `entity_id`, `action`, `diff`,
+`occurred_at`. Powers: pending-changes inbox for agent drafts, Resource change
+history, "who did what when" forensics.
+
+---
+
+## Schema [locked]
+
+All ids `uuid` (`gen_random_uuid()`). All entities have `created_at` /
+`updated_at`. Vector columns are `vector(1024)` everywhere — embeddings from
+`nomic-embed-text` (768 dims) padded with zeros so model swap to a 1024-dim
+model is a re-embed pass, not a migration. Slugs unique per-Space.
+Single implicit user for now; audit columns store `actor_kind` + `actor_id` so
+multi-user is a non-breaking later migration.
+
+### Core entity tables
+
+| Table | Key columns |
+|---|---|
+| `spaces` | slug, name, description, theme |
+| `projects` | space_id, slug, name, status, started_at, completed_at |
+| `tasks` | space_id, project_id (nullable), title, body, status, priority, due_at, position |
+| `pages` | space_id, slug, title, body_md, body_html, parent_id, embedding |
+| `page_revisions` | page_id, body_md, edited_by, created_at |
+| `refs` | space_id, kind (`url\|video\|pdf\|image\|file`), source_url, title, summary, body_text, blob_path, metadata, embedding, source_kind, external_id |
+| `source_docs` | resource_id, name, upstream_url, version, format, sync_source, local_path, last_synced, embedding |
+| `resources` | space_id, slug, name, runtime_type (`lxc\|vm\|docker\|bare-metal`), host, url, version, status, monitoring (jsonb) |
+| `resource_dependencies` | resource_id, depends_on, kind |
+| `resource_credentials` | resource_id, label, vault_path, kind, notes |
+| `conversations` | title, agent_id, participants, summary, embedding |
+| `messages` | conversation_id, role, agent_id, body, metadata |
+| `agents` | slug, name, kind, model, persona_path, capabilities (jsonb), scopes (jsonb) |
+
+### Cross-cutting tables
+
+| Table | Purpose |
+|---|---|
+| `tags` | normalized tag list (name, description, color) |
+| `entity_tags` | (entity_type, entity_id, tag_id) — polymorphic tagging |
+| `entity_links` | (from_type, from_id, to_type, to_id, relation) — any-to-any linkage |
+| `attachments` | (entity_type, entity_id, filename, mime_type, blob_path, checksum) |
+| `audit_log` | append-only mutation history |
+| `pending_changes` | agent draft inbox awaiting approval |
+| `pg-boss` tables | managed by the queue lib |
+
+### Default lifecycle states
+
+- Project: `idea | active | paused | done | abandoned`
+- Task: `todo | doing | blocked | done`
+- Resource: `running | stopped | down | unknown`
+
+(State transitions and automation defined in the Status section, later.)
+
+### Search strategy
+
+- **Full-text** — Postgres `tsvector` + GIN on `pages.body_md`,
+  `refs.title+summary+body_text`, `source_docs.body_text`, `messages.body`.
+  One query, all knowledge types.
+- **Semantic** — pgvector HNSW indexes on `pages.embedding`, `refs.embedding`,
+  `source_docs.embedding`, `conversations.embedding`. Embeddings generated by
+  Ollama at write time, async via worker.
+- **Combined** — search API does FTS + vector in parallel, fuses with
+  reciprocal-rank fusion. Filters by Space, Project, tags, kind.
+
+### Key design decisions
+
+1. **Polymorphic links over dedicated junction tables** — one `entity_links`
+   table instead of ~20 pairwise junctions. Loses Postgres-enforced FK
+   integrity on polymorphic columns; pays back in flexibility. Periodic
+   integrity-check query catches orphans.
+2. **Audit log is the only mutation history** — no per-entity history tables.
+   Powers pending-changes inbox, Resource change history, and forensics from
+   one mechanism.
+3. **`page_revisions` is the exception** — full markdown snapshots, not diffs.
+   Disk is cheap; debugging a corrupted page from a 12-step diff chain is not.
+4. **JSONB for variable shape** — `metadata` columns on `refs` (kind-specific),
+   `resources` (monitoring config), `agents` (capabilities, scopes). Add fields
+   without migrations.
+
+---
+
+## API Surface [locked]
+
+### REST (Void UI ↔ void-server)
+
+Standard CRUD per entity under `/api/`, JSON in/out, errors as
+`{error: {code, message, details}}`. Pagination via `?limit=&offset=`.
+
+Endpoint groups: spaces, projects, tasks, pages (+ revisions, backlinks),
+refs, source_docs (+ resync), resources (+ dependencies, changes),
+conversations (+ messages), agents, search (unified FTS + vector with RRF),
+tags, links, pending-changes (approve/reject), audit, capture
+(karakeep webhook, manual url, file upload, youtube), jobs (observability).
+
+**Auth:** Bearer token. Single owner token for the Void UI. Per-agent tokens in
+a separate `agent_tokens` table (hashed). Audit log records `actor_kind` +
+`actor_id` on every mutation.
+
+### MCP (AI agents ↔ void-server)
+
+Smaller, task-oriented surface — not full CRUD. Tools enforce per-agent
+capabilities; default-tier agents get writes routed to `pending_changes`.
+
+Initial tools:
+`void.search`, `void.get_entity`, `void.list_projects`, `void.list_tasks`,
+`void.related`, `void.read_conversation`, `void.resource_status`,
+`void.draft_page`, `void.draft_task`, `void.draft_ref`,
+`void.append_journal`, `void.suggest_link`, `void.update_entity`.
+
+**Transport:** both stdio (for Claude Code spawned subprocess) and HTTP/SSE
+(for Open WebUI, OpenClaw, remote agents). Same tool definitions, two
+transports. Capability checks happen in tool handlers, which call the same
+`repos/` as REST — one source of truth, two front doors.
+
+---
+
+## Capture Workers [locked]
+
+### Job kinds (one Python module per kind)
+
+`ingest.karakeep`, `ingest.url`, `ingest.youtube`, `ingest.video`,
+`ingest.pdf`, `ingest.image`, `ingest.file`, `sync.source_doc`, `embed.text`,
+`summarize.conversation`.
+
+### Job lifecycle
+
+```
+queued → claimed → running → done
+                 ↘ failed → retry (exp backoff: 10s, 60s, 5m) → dead-letter
+```
+
+Workers atomically claim via pg-boss, validate input, check idempotency,
+do work, write results in a transaction (entity row + audit log + downstream
+enqueues), mark done. Transient errors retry; permanent errors dead-letter
+immediately.
+
+### Idempotency
+
+Every job carries `idempotency_key`. For URL/Karakeep ingest:
+`key = sha256(source_url + space_id)`. If a successful job with that key
+exists, no-op.
+
+### Concurrency (per-kind queues)
+
+| Kind | Limit | Reason |
+|---|---|---|
+| `ingest.youtube`, `ingest.video` | **1** | Whisper GPU-bound on A2000 6GB |
+| `ingest.pdf`, `ingest.image` | 2 | Tesseract CPU-bound |
+| `ingest.url`, `ingest.karakeep`, `ingest.file` | 4 | Network/disk-bound |
+| `sync.source_doc` | 1 | One source at a time; don't hammer upstream |
+| `embed.text`, `summarize.conversation` | 2 | Ollama-bound |
+
+### Blob storage
+
+Content-addressed on local disk: `/var/lib/void/blobs/<sha-prefix>/<sha>`.
+Deduplicates identical files. ZFS dataset replicated to Leonardo via existing
+syncoid daily. MinIO is a future option, not day-one.
+
+### Dead-letter & monitoring
+
+pg-boss managed dead-letter table. Void UI "Jobs" panel shows pending,
+running, recent completions, dead-letter with retry/delete actions.
+
+### Downstream chaining
+
+Finished jobs enqueue more jobs in the same transaction (e.g., source doc
+sync → embed each chunk). Keeps everything resumable: if Ollama is down,
+the entity saves without embedding, embed retries later.
+
+---
+
+## UI / Orchestrator Shape [locked]
+
+### Shell
+
+Three columns, Cradle aesthetic preserved (blackflame palette, Cradle naming).
+
+- **Sidebar:** Spaces tree on top (collapsible, drag-to-reorder); global views
+  below — Sacred Valley, Agents, Inbox (pending changes with count), Resources
+  cross-space, full Search
+- **Main pane:** context-dependent view (Space, Project, Page editor, Reference
+  detail, Resource detail, Search, Sacred Valley, Inbox, Conversation)
+- **Right rail:** always-visible context-aware chat companion, collapsible to
+  slim tab. Agent scoped to current view; per-Space default agent. Drag-handle
+  to resize.
+- **Top bar:** universal capture button (paste/drop → AI suggests Space+Project
+  → confirm), global search, pending-changes bell with count, user/agent toggle
+
+### Views (main pane)
+
+| View | Purpose |
+|---|---|
+| Space | Overview of projects, tasks, refs, pages, resources in that space |
+| Project | Header (status/dates), Tasks, References, Pages, Conversations, Resources |
+| Page editor | Markdown editor with split preview, FTS in-page, attach upload |
+| Reference detail | Media preview + AI summary + metadata + tags + linked-from |
+| Resource detail | Health header + dependencies graph + Source Docs + runbook Pages + change history |
+| Search | Unified FTS + vector results, grouped by type, sidebar filters |
+| Sacred Valley | Current gridstack dashboard, preserved (weather, speedtest, host-perf, briefings, service health) |
+| Inbox | Pending changes grouped by agent, with diff viewer + approve/reject |
+| Conversation | Full-window chat when right-rail isn't enough |
+
+### Defaults
+
+- **Landing page:** last-viewed Space, falling back to a "Home" overview of
+  recent activity across all Spaces
+- **Sacred Valley:** kept as a named sidebar view (not the default homepage)
+- **Right-rail chat:** always visible, context-aware, collapsible
+- **Capture button:** paste-anything modal → AI infers kind (URL/file/text)
+  → suggests Space+Project from content + tags → user confirms or overrides
+
+### Pending Changes Inbox
+
+Items grouped by agent. Each shows entity-type icon + agent's reason + diff
+viewer + approve/reject. Approving runs the mutation through the same repo as
+a direct write would (single code path).
+
+---
+
+## Security & Auth [locked]
+
+### Authentication layers
+
+| Layer | Mechanism | Scope |
+|---|---|---|
+| Owner via browser/mobile | Cloudflare Access (Google IDP, restricted email) → CF Tunnel → Void 2.0 | Full owner |
+| AI agents via MCP | Bearer tokens, bcrypt-hashed in `agent_tokens`. Scoped by `agents.capabilities + scopes` | Per-agent tiered |
+| void2-app → void2-db | Dedicated Postgres user, limited grants, LAN-only | Service account |
+| void2-app → Ollama | LAN, no auth | LAN-only |
+
+### Remote-access boundary
+
+| Surface | Reachable how | Behind CF Access? |
+|---|---|---|
+| `void.hynesy.com` (UI) | CF Tunnel | Yes — Google auth, your email |
+| `mcp.void.hynesy.com` (MCP HTTP/SSE for remote agents) | CF Tunnel | Yes — CF Access Service Tokens |
+| Internal MCP (Claude Code, Open WebUI on CT 103) | Direct LAN | No — local |
+| Postgres | LAN-only, firewalled | n/a |
+
+### Secrets handling
+
+- Bootstrap secrets in `.env` files on each LXC, `chmod 600`, owned by service user
+- `resource_credentials.vault_path` is a *pointer string* (`env:NAME`,
+  `file:/path`, or future `vault:id`). Void 2.0 resolver reads from env or file.
+  Schema unchanged if/when we swap to Vaultwarden — only the resolver changes.
+- Agent tokens shown plaintext **once** at creation, then bcrypt-hashed.
+- No secrets in audit log (per-entity redaction before write).
+
+### Privacy posture
+
+- All AI inference local by default (Ollama on CT 102)
+- Claude API calls cross to Anthropic — documented egress channel; PII flagging
+  not in v1
+- Audit log retains every mutation for forensics
+
+### Backup posture
+
+- ZFS daily syncoid replication of `void2-db` + blob datasets to Leonardo
+- Postgres `pg_dump` cron daily (restore-test friendly, independent of ZFS)
+- Encrypted ZFS datasets for any off-site replica targets later (Farm)
+
+### Out of scope (v1)
+
+mTLS between internal services, field-level encryption in DB, HSMs, PII
+detection before LLM egress.
+
+---
+
+## Future Improvements (deferred)
+
+These are intentionally **not** day-one work. Tracked so they don't get
+forgotten:
+
+- **Vaultwarden secrets store** — user explicitly asked to be reminded. Day-one
+  resolver was designed so this is a swap, not a schema change. See
+  [auto-memory: project_void_v2_vaultwarden_followup].
+- **Own bookmark capture front-end** to replace Karakeep
+- **MinIO** for blob storage (S3-compatible access from elsewhere)
+- **Extract MCP** to its own LXC if it grows independently
+- **True clustering / instant failover** (Patroni) if zero-downtime maintenance becomes needed
+- **PII detection** before Anthropic API egress
+- **Mobile-optimized capture flow** (PWA install, share-target intent on Android)
+- **Local STT** (Whisper) for voice notes as a capture kind
+- **RSS / email** ingest
+
+---
+
+## Naming & Versioning [locked]
+
+This project is **Void 2.0** — a full remaster of the existing Void
+(retroactively "Void 1.x") with the same Cradle aesthetic, expanded into a
+homelab orchestrator + canonical knowledge store. "Codex" is **not** a name —
+just a way we referenced the data-layer concept during brainstorming. There
+is no `Codex` brand or module; the data layer is `lib/db/` / `lib/repos/`
+inside `void-server`.
+
+### Repo / process / LXC naming
+
+- **Repo:** `/project/src/void-v2`
+- **Processes:** `void-server` (Node), `void-workers` (Python)
+- **LXCs during cutover:** `void2-db`, `void2-app` (the `2` suffix avoids
+  clashing with current CT 301 `void`). After CT 301 retirement: rename to
+  plain `void-db`, `void-app`.
+- **Domains:** `void.hynesy.com` (UI), `mcp.void.hynesy.com` (MCP HTTP/SSE)
+- **MCP tool prefix:** `void.search`, `void.draft_page`, etc.
+
+### Version strategy
+
+Semver: `MAJOR.MINOR.PATCH`.
+- **2.0.0** — initial Void 2.0 release after Void 1.x retirement
+- Minor bumps for added features, patch bumps for fixes
+- Major bumps reserved for architecture/schema changes that require migrations
+
+### CHANGELOG
+
+`CHANGELOG.md` at the root of `/project/src/void-v2`, following the [Keep a
+Changelog](https://keepachangelog.com) convention. Entry for **2.0.0**
+captures the differences from Void 1.x at a high level (architecture, schema,
+capture pipeline, agent model, naming). Subsequent releases get their own
+sections. Each entry: Added / Changed / Deprecated / Removed / Fixed.
+
+A separate `docs/VERSION_HISTORY.md` carries the **narrative** version
+history — when each release happened, the headline thinking behind it,
+deferred items rolled in, lessons. Lives alongside the design spec for
+long-term archaeology. Each `MAJOR.x.x` release gets a section.
+
+---
+
+## Migration / Cutover Plan [locked]
+
+### Existing data inventory
+
+| Source | Location | Volume | Maps to |
+|---|---|---|---|
+| Void 1.x SQLite | CT 301 | wiki_pages (~25), messages, projects, conversations | Void 2.0 `pages`, `messages` (grouped into `conversations`), `projects` |
+| BookStack | CT 104 MariaDB | ~17+ pages, hierarchy | `pages` (parent_id preserved); dedupe vs already-imported wiki_pages |
+| Karakeep | CT 100 | bookmarks + AI summaries + tags | `refs` (kind=url), `external_id` = karakeep id |
+| `/root/.claude/plans/*.md` | filesystem | 5 plan files | `pages` under each plan's Project |
+| Void 1.x agent personas | `/project/src/void/characters/` | 7 agents × 3 files | `agents.persona_path` |
+| Void 1.x schema YAMLs | `/project/src/void/schemas/` | 11 services | `resources` seed data + `resources.monitoring` jsonb |
+| Void 1.x code (theme, cron logic) | source | selective | Reused inside `void-server` |
+| Auto-memory entries | `/root/.claude/projects/-project/memory/*.md` | ~30 entries | **Mirrored** — see below |
+
+### Migration script structure
+
+Python migration tool in `void-workers/migrate/` with sub-commands:
+
+```
+void-migrate bookstack      --source-db <conn>
+void-migrate karakeep       --source-db <conn>
+void-migrate void1-sqlite   --source-db <path>
+void-migrate plans          --source-dir /root/.claude/plans/
+void-migrate memory         --source-dir /root/.claude/projects/-project/memory/
+void-migrate void1-schemas  --source-dir /project/src/void/schemas/
+void-migrate void1-personas --source-dir /project/src/void/characters/
+```
+
+Each command is **idempotent** — uses source IDs / file paths as `external_id`
+so re-runs upsert rather than duplicate.
+
+### Auto-memory: one-way mirror (files stay primary)
+
+Auto-memory files remain the source-of-truth — Claude Code's harness reads them
+directly across sessions. A worker mirrors them into Void 2.0 as Pages under a
+"Memory" Space:
+
+- Mirror runs on file change (inotify) and nightly as safety net
+- Pages get `external_id = file path`, idempotent upsert
+- Edits in Void 2.0 UI flow back to files via a `::memory-update` marker
+  (same pattern Path B established)
+- Auto-memory remains canonical; Void 2.0 view is searchable, MCP-readable,
+  visible in the UI
+
+### Cutover: stand up alongside, big-bang switch with grace period
+
+1. Build Void 2.0 on new LXCs (`void2-db`, `void2-app`) without touching CT 301
+2. Run migration scripts (read-only access to BookStack + Karakeep + Void 1.x DBs)
+3. Verify counts + spot-check content
+4. **Cutover day:** swap `void.hynesy.com` CF tunnel target from CT 301 to
+   `void2-app`
+5. **Grace period (30 days):** CT 301 stays read-only as fallback
+6. **Retire CT 301:** snapshot, stop, rename `void2-*` LXCs to `void-*`
+
+### Cron / scheduled task migration
+
+Existing Void 1.x cron (Dross briefing, Yerin alerts, Little Blue heal, hourly
+speedtest, Orthos council) ports directly to `void-server/lib/cron/tasks/`.
+Same logic, same timing, against Void 2.0's data.
+
+---
+
+## Testing Approach [locked]
+
+| Layer | Coverage | How |
+|---|---|---|
+| Unit | Repos, capability checks, helpers (slug gen, idempotency keys, embedding pad/truncate) | Node: vitest. Python: pytest. |
+| Integration | REST + MCP tools against a test DB | Postgres-in-docker; schema applied from migrations; reset per test |
+| E2E | Happy paths: create Space/Project, capture URL, search, approve pending change, attach ref | Playwright against running test instance |
+| Manual (runbook'd) | Capture workers (Whisper, OCR), agent runtime (Claude subprocess + Ollama), CF Access flows | `docs/testing/manual.md` — too heavy or external for CI |
+| Migration scripts | All `void-migrate` sub-commands | Fixture DBs for BookStack + Void 1.x + Karakeep; assert counts + spot-check content |
+
+**Coverage target:** ~70% on `lib/` modules. Lower on routes/UI — covered by
+integration + E2E instead. No coverage chasing.
+
+**CI:** GitHub Actions if you mirror to a remote; local pre-push hook otherwise.
+Runs unit + integration on every change to `void-server` or `void-workers`.
+
+---
+
+## Status / Lifecycle Model [locked]
+
+| Entity | States | Transitions | Automation |
+|---|---|---|---|
+| Project | `idea`, `active`, `paused`, `done`, `abandoned` | Free (any-to-any) | None; manual |
+| Task | `todo`, `doing`, `blocked`, `done` | Free | `done` sets `completed_at` |
+| Resource | `running`, `stopped`, `down`, `unknown` | Auto + manual override | Health check cron updates; manual override pins until `maintenance_until` |
+| Conversation | `open`, `summarized`, `archived` | Auto with overrides | `summarize.conversation` worker moves to `summarized` after 24h idle |
+| Reference | `ingested`, `indexed`, `enriched` | Worker-driven | Pipeline: capture → FTS indexed → embedded + AI summary done |
+| Pending Change | `pending`, `approved`, `rejected` | User-driven | None |
+
+**Free transitions** everywhere user-facing. Homelab work is rarely linear; the
+audit log captures every transition.
+
+**Resource status reconciliation:** health check cron writes `status` and
+`last_check`. Manual override during planned maintenance pins state until a
+`maintenance_until` timestamp.
+
+---
+
+## Pending Sections — to complete before this is plan-ready
+
+(All sections locked. Spec ready for user review.)
+
+---
+
+## Decision Log
+
+| Date | Decision | Why |
+|---|---|---|
+| 2026-05-30 | Foundation-first Void 2.0 over evolve-Void | Long-term HA requirement makes single-LXC SQLite a dead end |
+| 2026-05-30 | 2 LXCs, planned-migration HA | User confirmed instant failover not needed |
+| 2026-05-30 | Postgres + pgvector (no separate Qdrant) | Simpler — one DB does relational + vector |
+| 2026-05-30 | Three-tier Space → Project → Task with sibling tasks | Matches how user organizes; allows ad-hoc TODOs |
+| 2026-05-30 | Pages + References + Source Docs as three knowledge types | Authored vs captured vs upstream-mirrored are genuinely different |
+| 2026-05-30 | Conversations first-class, attach to other entities | "Create project from chat" + AI needs prior conversation context |
+| 2026-05-30 | Rich Resource entity (dependencies, creds refs, change history) | User wants real orchestrator, not just inventory |
+| 2026-05-30 | Keep Karakeep as bookmark inbox; webhook into Void 2.0 | Karakeep works; building own is a deferred improvement |
+| 2026-05-30 | Day-one capture: URLs, videos, PDFs, images, files | Full pipeline, no half-measures |
+| 2026-05-30 | Agents: read+suggest default, per-agent tiered promotion | Balance usefulness with safety |
+| 2026-05-30 | Greenfield Void 2.0 (Approach A), copy valuable bits from Void | Clean break from accumulated Void shape |
+| 2026-05-31 | Two-process layout (Node server + Python workers) on one LXC | Right-tool-per-job; Python for ML, Node for API/UI/cron |
+| 2026-05-31 | pg-boss job queue (not Redis/RabbitMQ) | Postgres is already there; one fewer service |
+| 2026-05-31 | Skip Redis cache | DB isn't the bottleneck; Ollama/Whisper/OCR are. Reconsider only if profiling shows it. |
+| 2026-05-31 | Audit log is append-only, polymorphic | One mechanism for change history + agent action tracking + pending-changes inbox |
+| 2026-05-31 | `vector(1024)` everywhere with zero-padding for 768-dim embeds | Model swap is a re-embed pass, not a DDL migration |
+| 2026-05-31 | Polymorphic `entity_links` over ~20 pairwise junction tables | Flexibility wins at this scale; periodic integrity check covers FK gap |
+| 2026-05-31 | Single implicit user; audit columns ready for multi-user later | Multi-user is a non-breaking migration if ever needed |
+| 2026-05-31 | MCP exposes task-oriented tools, not raw CRUD | Smaller surface for agents = safer + clearer semantics |
+| 2026-05-31 | MCP supports both stdio + HTTP/SSE | Covers Claude Code (stdio) and network agents (HTTP) without bridges |
+| 2026-05-31 | pg-boss with per-kind concurrency limits | GPU/CPU/network workloads have different parallelism needs |
+| 2026-05-31 | Idempotency keys on all ingest jobs | Webhook replays + manual retries shouldn't duplicate content |
+| 2026-05-31 | Content-addressed blob store; ZFS replicated via syncoid | Free dedup + your existing replication covers it |
+| 2026-05-31 | Whisper concurrency stays at 1 | Conservative; tune after deploy if A2000 has headroom |
+| 2026-05-31 | Three-column shell (sidebar / main / right-rail chat) | Matches orchestrator + chat-with-context workflow |
+| 2026-05-31 | Sacred Valley kept as sidebar view, not landing page | Frees landing for last-viewed Space; dashboard still one click away |
+| 2026-05-31 | Right-rail chat always visible, context-aware | Friction-free 'ask Mercy about this' across all views |
+| 2026-05-31 | Universal capture button with AI Space/Project suggestion | One capture surface for all content kinds; reduces friction over per-page add-ref |
+| 2026-05-31 | CF Access on UI + MCP-HTTP; LAN-direct for internal agents | Matches owner-via-internet + agent-on-LAN access patterns |
+| 2026-05-31 | Env+file vault_path resolver day-one; Vaultwarden swap later | Pragmatic start; resolver swap doesn't change schema |
+| 2026-05-31 | Agent tokens bcrypt-hashed, plaintext shown once | Standard bearer-token hygiene |
+| 2026-05-31 | mTLS / field-level encryption deferred from v1 | Single-trust-domain LAN homelab; ZFS-at-rest covers it for now |
+| 2026-05-31 | Renamed from "Codex" to **Void 2.0** | Preserve Cradle aesthetic + naming continuity from Void 1.x |
+| 2026-05-31 | CHANGELOG.md (Keep a Changelog) + VERSION_HISTORY.md (narrative) | User wants major-version comparison + readable narrative archaeology |
+| 2026-05-31 | Auto-memory: one-way mirror, files stay primary | Harness keeps working; knowledge stays unified |
+| 2026-05-31 | Big-bang cutover with 30-day grace period on CT 301 | Minimal complexity; safety net against forgotten data |
+| 2026-05-31 | Free state transitions; audit log records every change | Homelab work is rarely linear; don't over-validate |
+| 2026-05-31 | Test coverage target ~70% on lib/, manual runbook for ML/agent flows | Where automation cost exceeds value, document instead |
--- a/docs/superpowers/specs/2026-06-01-void-v2-plan3-capture.md
+++ b/docs/superpowers/specs/2026-06-01-void-v2-plan3-capture.md
@@ -0,0 +1,295 @@
+# Void 2.0 — Plan 3 Design Spec: Capture pipeline + hybrid search
+
+**Date:** 2026-06-01
+**Builds on:** Plan 1 (Foundation, complete) and Plan 2 (API + UI shell, complete, version 2.0.0-alpha.2).
+**Master spec:** `docs/superpowers/specs/2026-05-31-void-v2-design.md` — many decisions inherit from there.
+
+## Goal
+
+Wire the Plan 2 SPA's stub Capture button to a real ingest pipeline. Add a pg-boss-backed job queue, capture entry points (URL POST + Karakeep webhook + drag-drop attachment), a URL worker that turns links into `refs`, an embeddings worker that writes vectors into the existing `embedding` columns, and a hybrid FTS+vector search that replaces the Plan 2 FTS-only `/api/search`.
+
+## Out of scope (Plan 4 and later)
+
+- Whisper transcription, Tesseract OCR, yt-dlp video ingestion, scanned-PDF OCR.
+- The Python `void-workers` service. Plan 3 stays single-process Node.
+- AI Space/Project suggestion on capture (defer; capture takes explicit `space_id`).
+- Embedding chunks table — Plan 3 uses one whole-doc embedding per entity row; chunks land later once we can measure recall on a real corpus.
+- MCP server surface. Plan 5+.
+
+## Decisions locked by brainstorm
+
+| Question | Answer |
+|---|---|
+| Plan 3 slice | Node-side: pg-boss + `/api/capture` POST + Karakeep webhook + URL worker + embed.text worker + hybrid search + Jobs panel. Defers ML-heavy ingest to Plan 4. |
+| Capture entry points | `/api/capture` POST + Karakeep webhook + drag-drop upload. Inbound email skipped. |
+| Embedding granularity | Whole-doc per entity row. Add chunks table later. |
+| Search rollout | `/api/search` replaced in-place with hybrid (FTS + vector via RRF). Vector branch graceful-degrades to FTS-only if Ollama is down or the row lacks an embedding. |
+| AI Space/Project suggestion | Deferred. Capture requires `space_id`. SPA preselects the user's last-used space from `localStorage`. |
+| Jobs visibility | `/api/jobs?status=` + `/api/jobs/:id/retry` + `/api/jobs/:id/delete` + a minimal `#/jobs` SPA view (table grouped by status, 10 s polling, retry/delete per row). |
+| Sequencing | Phase A → B → C → D (matches Plan 2 phasing). Each phase ends green and demoable. |
+
+## Architecture
+
+```
+                     ┌──────────────────────────────────────────┐
+                     │  void-server  (CT 311, Node, single proc)│
+                     │                                          │
+   /api/capture ───▶ │  routes/capture.js                       │
+   /api/ingest/      │  routes/ingest.js (Karakeep webhook)     │
+     karakeep ─────▶ │      │                                   │
+   drag-drop  ─────▶ │      ▼                                   │
+                     │  jobs/queue.js (pg-boss client)          │
+                     │      │                                   │
+                     │      ▼                                   │
+                     │  workers/  (in-process pollers)          │
+                     │   ├─ url.js                              │
+                     │   ├─ karakeep.js                         │
+                     │   ├─ embed.js   (Ollama HTTP)            │
+                     │   └─ blob.js    (drag-drop attachments)  │
+                     │      │                                   │
+                     │      ▼                                   │
+                     │  lib/db/repos/ (existing) + repos/jobs.js│
+                     │      │                                   │
+                     └──────┼───────────────────────────────────┘
+                            │
+              ┌─────────────┼──────────────┐
+              ▼             ▼              ▼
+       ┌──────────┐  ┌──────────────┐  ┌──────────────┐
+       │ Postgres │  │  Ollama      │  │ Blob FS      │
+       │ (CT 310, │  │  (CT 102,    │  │ /var/lib/    │
+       │ pgvector │  │   nomic-     │  │  void/blobs/ │
+       │ + pgboss │  │   embed-text)│  │              │
+       │ tables)  │  └──────────────┘  └──────────────┘
+       └──────────┘
+```
+
+**Process model.** Workers and HTTP handlers share the void-server Node process. pg-boss polls Postgres on its own interval; HTTP requests enqueue jobs and return immediately with a `job_id`. No separate worker process — that's Plan 4 when the Python service arrives.
+
+**External dependencies.** Postgres (already there), Ollama on CT 102 at `http://192.168.1.185:11434` (running, `nomic-embed-text` pulled, 768-dim embeddings verified 2026-06-01). Graceful-degrade still applies if it goes down later. Blob storage is local FS on CT 311's root pool, content-addressed.
+
+**No new entity tables.** refs / pages / source_docs / attachments are reused. The `embedding vector(1024)` columns exist from Plan 1 (migration 002 + 004). pg-boss creates its own schema (`pgboss.*`) on first run.
+
+## Phase A — Queue + worker harness + Jobs API
+
+**New files:**
+- `lib/jobs/queue.js` — singleton pg-boss client; `start()`, `enqueue(name, data, opts)`, `subscribe(name, handler, opts)`.
+- `lib/jobs/index.js` — registers all worker handlers on start; called from `server.js` boot.
+- `lib/jobs/workers/echo.js` — trivial worker used to prove the harness. Removed at end of Phase D.
+- `lib/api/routes/jobs.js` — `GET /api/jobs?state=`, `GET /api/jobs/:id`, `POST /api/jobs/:id/retry`, `DELETE /api/jobs/:id`. Owner-only.
+- `tests/jobs/queue.test.js` — pg-boss roundtrip: enqueue → handler runs → result.
+- `tests/api/jobs.test.js` — list/retry/delete via HTTP.
+
+**Modify:**
+- `server.js` — call `jobs.start()` on boot, `jobs.shutdown()` on SIGTERM.
+- `package.json` — add `pg-boss@^10`.
+- `lib/api/index.js` — mount `/api/jobs`.
+- `public/router.js` + `public/app.js` + add `public/views/jobs.js` — minimal Jobs view (placeholder for now; fleshed in Phase D).
+
+**pg-boss config.** One pg-boss instance per process. Uses the existing `DATABASE_URL`. Default `pg-boss` schema name. `newJobCheckIntervalSeconds: 2` (alpha-tier; tighten later if needed). `archiveCompletedAfterSeconds: 86_400` (1 day archive). `deleteAfterDays: 7`.
+
+**Concurrency limits** per the master spec, surfaced via `subscribe(name, handler, {teamSize, teamConcurrency})`:
+
+| Worker name | Team size | Reason |
+|---|---|---|
+| `ingest.url` | 4 | Network-bound |
+| `ingest.karakeep` | 4 | Network-bound |
+| `ingest.blob` | 2 | Disk + sha256 hashing |
+| `embed.text` | 2 | Ollama-bound (single GPU on CT 102) |
+
+**Retry policy.** Per-worker `retryLimit: 5`, `retryBackoff: true`, `retryDelay: 10` (seconds). Effective backoff sequence: 10 s, 20 s, 40 s, 80 s, 160 s, then dead-letter. The spec called out 10 s / 60 s / 5 m but pg-boss only exposes exponential backoff with a base delay; the resulting curve is close enough.
+
+**Dead-letter.** pg-boss's archive table (`pgboss.archive`) keeps failed jobs. `/api/jobs?state=failed` queries it. Manual retry copies to active.
+
+**Commit:** `feat(jobs): pg-boss harness + Jobs API`.
+
+## Phase B — Capture API + URL worker + blob storage
+
+**Capture POST.** `POST /api/capture` (owner or agent with write tier):
+
+```json
+{
+  "space_id": "uuid",
+  "url": "https://example.com/article",
+  "hint": { "project_id": "uuid?", "title": "string?", "tags": ["string"] }
+}
+```
+
+Response 202 with `{ job_id, idempotency_key, ref_id?: uuid }`. Idempotency key is `sha256(space_id + url)`. If a ref already exists for that key, the response carries the existing `ref_id` and `job_id: null` (no new job enqueued).
+
+**URL worker.** `lib/jobs/workers/url.js` for `ingest.url`:
+
+1. Compute idempotency key. If a `refs` row already exists with `source_kind='url'` and `external_id=<key>`, return its id.
+2. `fetch(url)` with `User-Agent: void-ingest/2.0` and 15 s timeout.
+3. Run readability extraction (npm `@mozilla/readability` + `jsdom`). Pull `title`, `byline`, `excerpt`, `textContent`, `siteName`.
+4. Insert a `refs` row: `kind='url'`, `source_url=url`, `title=readability.title`, `summary=readability.excerpt`, `body_text=readability.textContent` (truncate to 200 kB), `source_kind='url'`, `external_id=<idempotency_key>`, `metadata={ site_name, byline, content_length }`.
+5. Return the ref. Embedding is handled by Phase C's repo-level trigger that wraps `refs.create`; in Phase B alone the ref simply lacks an embedding until Phase C ships.
+
+**Drag-drop.** `POST /api/capture/upload` (multipart, owner or agent write):
+
+- Field `file` — the binary.
+- Field `space_id` — required.
+- Field `meta` (json) — optional `{ title, kind, tags }`.
+
+Multer stages uploads in `/var/lib/void/uploads-tmp/` (size cap 100 MB per file) and the worker moves the file into the content-addressed blob store on success.
+
+Worker `ingest.blob`:
+
+1. Stream the upload to a temp file. Hash with sha256 as it streams.
+2. If `/var/lib/void/blobs/<sha-prefix>/<sha>` exists, this is a duplicate; reuse the existing path.
+3. Otherwise move the temp file into place.
+4. Determine `kind` from `Content-Type` / extension: `image` for image/*, `pdf` for application/pdf, `file` for everything else. Video/audio fall through to `file` in Plan 3 (Plan 4 picks them up).
+5. Insert a `refs` row: `kind=<derived>`, `blob_path=<path>`, `title=filename || sha`, plus metadata.
+6. Insert via `refs.create`; Phase C's trigger picks up the embed automatically. In Phase B, no embed runs.
+
+**Blob storage.** New directory `/var/lib/void/blobs/` on CT 311, owned by `void:void`, mode 750. Layout `<first-2-chars-of-sha>/<full-sha>`. Deploy bootstrap step adds the dir creation. Already on `localzfs` so replication picks it up.
+
+**Files:**
+- `lib/api/routes/capture.js` — both endpoints + multer config.
+- `lib/jobs/workers/url.js`, `lib/jobs/workers/blob.js`.
+- `lib/ingest/readability.js` — wraps `@mozilla/readability` for testability.
+- `lib/ingest/blob_store.js` — sha + path resolution + write.
+- `tests/api/capture.test.js`, `tests/jobs/workers/url.test.js`, `tests/jobs/workers/blob.test.js`.
+
+**Deps to add:** `pg-boss`, `@mozilla/readability`, `jsdom`, `multer`.
+
+**Commit:** `feat(jobs): capture API + URL + blob workers`.
+
+## Phase C — Embeddings + hybrid search
+
+**Ollama client.** `lib/ai/ollama.js`:
+
+```js
+async function embedText(text, model = 'nomic-embed-text') {
+  const res = await fetch(`${OLLAMA_URL}/api/embeddings`, {
+    method: 'POST',
+    headers: { 'Content-Type': 'application/json' },
+    body: JSON.stringify({ model, prompt: text }),
+    signal: AbortSignal.timeout(60_000)
+  });
+  if (!res.ok) throw new OllamaError(res.status, await res.text());
+  const j = await res.json();
+  return j.embedding; // 768-dim
+}
+```
+
+`OLLAMA_URL` env var, default `http://192.168.1.185:11434`. The 768-dim vector is zero-padded to 1024 to match the `vector(1024)` column (per master spec, eases later model swap).
+
+**Embed worker.** `embed.text` job payload `{ entity_type, entity_id }`. Worker:
+
+1. Load the entity row.
+2. Build the embedding string:
+   - `page`: `${title}\n\n${body_md}`, truncated to ~6 k characters (≈ 1.5 k tokens; well under nomic's 8 k context).
+   - `ref`: `${title || ''}\n${summary || ''}\n${body_text || ''}`, same truncation.
+   - `source_doc`: `${name}\n${body_text || ''}`.
+   - `conversation`: `${title || ''}\n${summary || ''}` — short by design; conversations get richer treatment in Plan 5.
+3. Call `embedText`. On `OllamaError` or fetch timeout, throw — pg-boss retry kicks in with exponential backoff.
+4. Zero-pad to 1024, UPDATE the entity's `embedding` column.
+5. Emit an audit log entry `(actor_kind='worker', action='update', entity_type, entity_id, diff={embedding:'updated'})`.
+
+**Re-embed triggers.** Write paths (`repo.create`, `repo.update`) for pages/refs/source_docs already exist. Add a small `lib/jobs/triggers.js` that wraps these — after a successful create/update of an embeddable entity, enqueue `embed.text` with a singleton key `${entity_type}:${entity_id}` so rapid re-edits coalesce. The trigger is called from repo level so MCP and cron paths get it too.
+
+**Hybrid search.** Rewrite `lib/db/repos/search.js::fts` into `search.hybrid({ q, space_id?, kinds?, limit, offset })`:
+
+1. FTS branch — current Plan 2 query unchanged, returns up to `limit * 3` results with `ts_rank`.
+2. Vector branch — embed `q` via Ollama (with a 5 s timeout — search must stay snappy). For each kind, run an ANN query against the matching table's `embedding` column using HNSW (`<=>` cosine distance). Returns up to `limit * 3` per kind. If Ollama times out or errors, skip this branch entirely — log a `search.vector_skipped` event and continue with FTS-only.
+3. RRF fusion — for each unique `(kind, id)`, sum `1 / (60 + rank_fts) + 1 / (60 + rank_vec)`. The `60` constant matches the canonical RRF paper. Sort, slice to `[offset, offset+limit]`.
+4. Vector-only rows (no FTS match) and FTS-only rows (no embedding yet) both participate; missing rank is treated as infinity, giving `1 / inf = 0` from that branch.
+
+Result shape unchanged: `{ kind, id, space_id, title_or_snippet, rank }`. The `rank` field now carries the fused RRF score.
+
+**Files:**
+- `lib/ai/ollama.js` (new).
+- `lib/jobs/workers/embed.js` (new).
+- `lib/jobs/triggers.js` (new).
+- `lib/db/repos/search.js` (rewrite).
+- `tests/ai/ollama.test.js` — fetch mock.
+- `tests/jobs/workers/embed.test.js` — fetch mock; verifies zero-pad + audit.
+- `tests/repos/search.test.js` (existing) — extended with vector-fixture rows + RRF assertions.
+
+**Embedding-test strategy.** Tests insert fixture vectors directly (no Ollama needed). One integration test under `tests/integration/embed_live.test.js` hits a real Ollama, marked `skip()` if `OLLAMA_URL` is unreachable.
+
+**Repos that emit triggers:** pages.create, pages.update, refs.create, refs.update, refs.upsertByExternal, source_docs.create, source_docs.update. Conversation embeds are summary-only and re-fire when `setSummary` is called.
+
+**Commit:** `feat(jobs): embed worker + hybrid search`.
+
+## Phase D — Karakeep webhook + drag-drop UI + Jobs UI
+
+**Karakeep webhook.** `POST /api/ingest/karakeep`. Authenticated by `X-Karakeep-Signature: sha256=<hex>` HMAC of the raw body with `KARAKEEP_WEBHOOK_SECRET` env. If the signature is missing or wrong: 401.
+
+Payload (Karakeep's webhook shape, normalized): `{ event, bookmark_id, tags }`.
+
+For `event === 'bookmark.created'`:
+1. Look up the existing space-mapping from env: `KARAKEEP_DEFAULT_SPACE_ID` (a UUID). Future work: per-tag space routing.
+2. Enqueue `ingest.karakeep` with `{ bookmark_id, space_id }`.
+
+`ingest.karakeep` worker:
+1. Fetch the bookmark via Karakeep's API: `GET https://karakeep.hynesy.com/api/v1/bookmarks/{bookmark_id}` with `KARAKEEP_API_TOKEN`.
+2. Build the same payload an `ingest.url` job would use (URL + title + tags) and call the URL handler directly. Tags propagate to the `entity_tags` table via repo.
+3. If Karakeep returns 404 (bookmark deleted), mark the job done — no error.
+
+**Drag-drop UI.** `public/components/dropzone.js` — wraps a target element, intercepts drag events, POSTs each file to `/api/capture/upload`, shows toast progress. Wire onto `<main>` so dropping anywhere in the main area works. Pre-fills `space_id` with `localStorage.last_space_id` (set when the user navigates to a space view).
+
+**Jobs UI fill-in.** Expand `public/views/jobs.js`:
+- Group rows by `state` (active / completed / failed).
+- Each row: `id (8 chars)`, `name`, `state`, relative `created_at`, `last_error?`, action buttons.
+- Polls `/api/jobs?state=active,failed` every 10 s.
+- Retry button POSTs `/api/jobs/:id/retry`; delete button DELETE `/api/jobs/:id`.
+
+**Files:**
+- `lib/api/routes/ingest.js`.
+- `lib/jobs/workers/karakeep.js`.
+- `lib/karakeep/client.js` — thin wrapper.
+- `public/components/dropzone.js`.
+- `public/views/jobs.js` (expand).
+- `tests/api/ingest.test.js` — HMAC check, valid/invalid signature.
+- `tests/jobs/workers/karakeep.test.js` — Karakeep API mocked via fetch interceptor.
+
+**Commit:** `feat(jobs): Karakeep webhook + drag-drop + Jobs UI`.
+
+## Error handling & idempotency
+
+- **Idempotency keys.** URL and Karakeep workers compute `sha256(space_id + url)` (URL) or `sha256(space_id + 'karakeep:' + bookmark_id)` (Karakeep). Stored as `refs.external_id` with `source_kind` set to `'url'` or `'karakeep'`. The unique index `idx_refs_external_unique` already enforces this from Plan 1. A duplicate ingest finds the existing ref and short-circuits.
+- **Singleton embed jobs.** pg-boss `singletonKey: '${entity_type}:${entity_id}'` so rapid edits coalesce into one pending embed. If a job is already in-flight when a new edit lands, a follow-up is enqueued.
+- **Capture rate limit.** Out of scope. The `agentOrOwner` gate is enough at single-user scale.
+- **Ollama down.** Embed jobs throw, retry under pg-boss backoff. After dead-letter (≈ 5 min cumulative), entity stays without an embedding; hybrid search falls back to FTS for those rows. Operator restores Ollama, then `POST /api/jobs/:id/retry` or wait for the periodic re-embed cron in a future phase.
+- **Karakeep down.** Webhook still accepts. The worker dead-letters; tag mapping replays from the operator manually.
+- **Blob upload partial.** Stream to temp; rename on success only. Failed uploads leave a temp file; a daily cron in Plan 4 sweeps `> 24 h` temps.
+
+## Observability
+
+- Pino structured logs already in place. New log keys: `job_id`, `job_name`, `entity_type`, `entity_id`, `idempotency_key`, `outcome`.
+- `/api/jobs` is the operator surface; the SPA Jobs view fronts it.
+- pg-boss's archive table is the source of truth for completed/failed jobs; no separate audit needed for job lifecycle (the audit log captures entity-level changes the workers cause).
+
+## Testing strategy
+
+- **Unit:** workers and the Ollama client get unit tests with `fetch` mocked (vitest's `vi.fn`).
+- **Repo:** `tests/repos/search.test.js` extended; new `tests/repos/jobs.test.js` covers `pg-boss`-backed list/retry helpers.
+- **API:** capture, ingest, jobs routes via supertest. HMAC signature pass/fail. Idempotency on second capture of the same URL.
+- **Integration (gated):** one test that hits real Ollama; auto-skipped if `OLLAMA_URL` is unreachable. Real pg-boss roundtrips happen inside the existing test DB using `resetDb` + `await pg-boss.stop()` between suites to avoid cross-talk.
+- **No new vitest config.** `fileParallelism: false` already in place from Plan 1 — pg-boss is happier serialized too.
+
+## Migrations
+
+- **No new SQL migrations from Void.** pg-boss creates its own schema on first `start()`.
+- One-time CT 311 ops: create `/var/lib/void/blobs/` and chown `void:void`.
+
+## Deploy delta
+
+- `.env` adds `OLLAMA_URL`, `KARAKEEP_WEBHOOK_SECRET`, `KARAKEEP_API_TOKEN`, `KARAKEEP_API_URL`, `KARAKEEP_DEFAULT_SPACE_ID`. Documented in `deploy/README.md`.
+- `deploy/push.sh` unchanged (rsync still works).
+- Snapshot CT 310 + 311 before deploying Plan 3 (standing rule). The Phase A first-deploy is the "major update" — pg-boss creates new tables in the shared DB.
+
+## Known follow-ups (not Plan 3)
+
+- AI Space/Project suggestion on capture.
+- Embedding chunks table.
+- pdf-text-extract for born-digital PDFs (Plan 4 likely handles this with Tesseract too).
+- Per-tag Karakeep → Space routing instead of one default space.
+- Recurring re-embed cron for rows where `embedding IS NULL`.
+- Real-time Jobs UI via `pg LISTEN/NOTIFY` instead of polling.
+
+## Open items for the user
+
+- **Karakeep secrets.** Plan 3 Phase D needs `KARAKEEP_API_TOKEN` (issued from Karakeep settings) and a chosen `KARAKEEP_DEFAULT_SPACE_ID`. Surfaceable when the phase starts.
+- **The 29-day-old `knowledge_pipeline` memory** (Karakeep → Qdrant → MCP) is now superseded by Void 2.0's pgvector-only architecture. After Plan 3 ships, that memory should be marked obsolete or deleted to avoid future-me reading it as authoritative.