Files
Void-Homelab/docs/superpowers/specs/2026-05-31-void-v2-design.md
root 54ba68a11c docs: move void-v2 specs + plans into the repo
All Void 2.0 superpowers specs and implementation plans now live at
docs/superpowers/{specs,plans}/ inside the repo. Previously they were
at /project/docs/superpowers/ which was not under git.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:11:32 +10:00

657 lines
31 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Void 2.0 — Homelab Orchestrator & Knowledge Foundation
**Status:** IN PROGRESS — brainstorming, not yet a complete design
**Started:** 2026-05-31
**Owner:** mrhynesy@gmail.com
> This document is being filled in section by section as brainstorming progresses.
> Sections below marked `[locked]` are user-approved decisions. Sections marked
> `[pending]` are the remaining design work to complete before this becomes a
> proper spec ready for the writing-plans skill.
---
## Vision [locked]
Replace the current scattered homelab state (Void dashboard, Karakeep bookmarks,
BookStack wiki, `/root/.claude/plans/*.md`, auto-memory entries, ad-hoc browser
tab groups) with a single **Void 2.0** — a homelab orchestrator that:
- Acts as the canonical home for projects, tasks, knowledge, and deployed-resource
state
- Ingests websites, videos, PDFs, screenshots, and files into a unified library
- Mirrors upstream documentation locally for offline + agent access
- Surfaces all of it to Claude and local AI agents via MCP, with per-agent
permission tiers
- Preserves the Void's Cradle-themed aesthetic and agent personas
- Stays available during planned host maintenance via `pct migrate`
(no automatic failover)
- Maintains privacy + security with selective remote access
Primary capture pain being solved: **"multiple grouped Chrome tabs as a poor
project-management substitute."** Void 2.0 makes that proper.
---
## Direction & HA Shape [locked]
**Chosen direction:** Foundation-first Void 2.0 (Option 2 from initial framing).
Not an evolution of Void — a clean rebuild with Void as the visible UI on top.
**HA model:** Planned-maintenance only. User instructs the stack before host
shutdown; Proxmox live-migrates the LXCs to another node (~10-60s pause). No
automatic failover, no quorum, no clustering complexity.
**Infrastructure:**
| LXC | Purpose | Stateful? |
|---|---|---|
| `void2-db` | Postgres + pgvector | Yes — the canonical store |
| `void2-app` | Node API + Python workers + Void UI + cron | No (data in `void2-db`) |
Future-improvements list (parked):
- Build own bookmark capture front-end to replace Karakeep
- Extract MCP server to its own LXC if it grows independent
- True clustering if "instant failover" becomes a need
---
## Entity Map [locked]
| Entity | Lives in | Contains / links to |
|---|---|---|
| **Space** | top-level | Projects, Tasks, Pages, Refs, SourceDocs, Conversations, Resources |
| **Project** | a Space | Tasks (children); has-many Pages, Refs, SourceDocs, Conversations, Resources |
| **Task** | a Space, optionally also a Project | Pages, Refs, Conversations |
| **Page** (authored) | tagged | backlinks, attachments — your notes + AI-assisted commentary |
| **Reference** (captured) | tagged | source URL, local snapshot, metadata — websites/videos/PDFs/files/images |
| **Source Doc** (mirrored upstream) | bound to a Resource | version, last-synced, sync source — official docs from publisher |
| **Conversation** | attaches to Space/Project/Task/Resource | Messages — first-class, multi-agent |
| **Resource** (deployed service, rich) | a Space | dependencies, credentials refs, source docs, runbook pages, change history, monitoring config |
**Relationships are explicit, not implied.** Any entity can attach to any other
via typed links (`project_pages`, `task_refs`, `resource_source_docs`, etc.).
---
## Capture Pipeline [locked — day-one inputs]
Day-one capture inputs:
1. **URLs / bookmarks** — Karakeep stays as inbox; webhook flows new bookmarks
into Void 2.0 as References (with AI-suggested Project/Space tagging)
2. **YouTube / web videos**`yt-dlp` for metadata + transcript; local Whisper
if no transcript; AI summary + chapters via Ollama
3. **PDFs / documents** — text extract or Tesseract OCR; AI summary; full text
indexed
4. **Screenshots / images** — Tesseract OCR; AI summary
5. **Generic files** — blob storage on host; indexed by name + tags
All AI summarization runs against local Ollama (CT 102).
---
## Agent Model [locked]
**Per-agent capability tiers.** Each AI agent (Claude, Mercy, Orthos, Dross,
Eithan, Lindon, Yerin, Little Blue, future agents) has its own capability record.
- **Default for all agents:** `read` + `suggest`. Agents can search/read
anything. Writes are *drafts* in a "pending changes" inbox the user approves.
- **Promotable per agent:** `write` capability, scoped (e.g., Mercy gets
write-on-Pages but not Resources)
- **Audit log:** every agent action recorded with `agent_id` + timestamp + diff
MCP surface exposes Void 2.0 to Claude Code, Open WebUI, OpenClaw, and future
agents through the same interface.
---
## Build Approach [locked]
**Approach A — Greenfield modular monolith.**
- New repo at `/project/src/void-v2`
- Two processes on `void2-app` LXC:
- **`void-server`** (Node) — REST API + MCP + Void UI + cron + light ingest
(Karakeep webhook)
- **`void-workers`** (Python) — heavy ML ingest: yt-dlp, Whisper, Tesseract,
PDF extract, embeddings via Ollama
- Postgres + pgvector on `void2-db` LXC
- Copy across from current Void (without inheriting its structure): agent
persona files, blackflame theme CSS, Cradle naming, cron task list, schema
YAMLs as initial Resource seed data
- Old Void on CT 301 keeps running until cutover; then archived
---
## Architecture Details [locked]
### Two processes, one job queue, strict boundaries
**`void-server` (Node)** owns: HTTP API, MCP server, Void UI, cron, agent
runtime, light ingest (Karakeep webhook, manual paste). Internal layout:
```
lib/
db/ Postgres pool, migrations, repos/ (one file per entity)
api/ HTTP routes (thin — just call repos)
mcp/ MCP server, tool definitions, per-agent capability checks
ingest/ Karakeep webhook, manual capture
jobs/ Enqueue heavy work for workers (pg-boss client)
cron/ Scheduler + one file per task
agents/ Cradle persona runtime (Claude subprocess + Ollama via Mastra)
```
**Boundary rule:** HTTP and MCP both reach data only via `repos/`. No raw SQL in
routes. Same repos enforce per-agent capability checks. This is what makes any
later extraction (e.g., MCP as its own service) painless.
**`void-workers` (Python)** owns heavy ML ingest. One worker per kind:
`video.py` (yt-dlp + Whisper), `pdf.py` (pdftotext / Tesseract), `image.py`
(Tesseract), `file.py` (blob + indexing), `sourcedoc.py` (mirror upstream docs).
They poll the job queue, claim work, write results to DB.
### Job queue: pg-boss
Postgres-backed, Node + Python clients. We don't add Redis/RabbitMQ — the DB is
already there. Failed jobs retry with backoff, then land in a dead-letter table.
**Redis rejected** — Postgres-on-local-LXC is sub-millisecond for indexed
queries; the bottlenecks in Void 2.0 will be Ollama/Whisper/OCR (secondsminutes),
not the DB. Adding Redis would buy invisible perf wins at the cost of cache
invalidation complexity and another LXC to manage. Reconsider only if profiling
shows a specific bottleneck.
### Caching, if needed
- **In-process LRU** (JS `Map` with size cap) inside `void-server` for hot
lookups. Zero ops cost.
- **`pg LISTEN/NOTIFY`** for real-time UI updates (transcription progress, etc.)
if/when we want them. Built into Postgres — no extra service.
### Cron
Lives only in `void-server` (single process — no leader election needed).
Light tasks run in-process; heavy tasks enqueue worker jobs.
### Audit log
Append-only. Every mutating call (HTTP, MCP, cron, worker) writes one row:
`actor_kind`, `actor_id`, `entity_type`, `entity_id`, `action`, `diff`,
`occurred_at`. Powers: pending-changes inbox for agent drafts, Resource change
history, "who did what when" forensics.
---
## Schema [locked]
All ids `uuid` (`gen_random_uuid()`). All entities have `created_at` /
`updated_at`. Vector columns are `vector(1024)` everywhere — embeddings from
`nomic-embed-text` (768 dims) padded with zeros so model swap to a 1024-dim
model is a re-embed pass, not a migration. Slugs unique per-Space.
Single implicit user for now; audit columns store `actor_kind` + `actor_id` so
multi-user is a non-breaking later migration.
### Core entity tables
| Table | Key columns |
|---|---|
| `spaces` | slug, name, description, theme |
| `projects` | space_id, slug, name, status, started_at, completed_at |
| `tasks` | space_id, project_id (nullable), title, body, status, priority, due_at, position |
| `pages` | space_id, slug, title, body_md, body_html, parent_id, embedding |
| `page_revisions` | page_id, body_md, edited_by, created_at |
| `refs` | space_id, kind (`url\|video\|pdf\|image\|file`), source_url, title, summary, body_text, blob_path, metadata, embedding, source_kind, external_id |
| `source_docs` | resource_id, name, upstream_url, version, format, sync_source, local_path, last_synced, embedding |
| `resources` | space_id, slug, name, runtime_type (`lxc\|vm\|docker\|bare-metal`), host, url, version, status, monitoring (jsonb) |
| `resource_dependencies` | resource_id, depends_on, kind |
| `resource_credentials` | resource_id, label, vault_path, kind, notes |
| `conversations` | title, agent_id, participants, summary, embedding |
| `messages` | conversation_id, role, agent_id, body, metadata |
| `agents` | slug, name, kind, model, persona_path, capabilities (jsonb), scopes (jsonb) |
### Cross-cutting tables
| Table | Purpose |
|---|---|
| `tags` | normalized tag list (name, description, color) |
| `entity_tags` | (entity_type, entity_id, tag_id) — polymorphic tagging |
| `entity_links` | (from_type, from_id, to_type, to_id, relation) — any-to-any linkage |
| `attachments` | (entity_type, entity_id, filename, mime_type, blob_path, checksum) |
| `audit_log` | append-only mutation history |
| `pending_changes` | agent draft inbox awaiting approval |
| `pg-boss` tables | managed by the queue lib |
### Default lifecycle states
- Project: `idea | active | paused | done | abandoned`
- Task: `todo | doing | blocked | done`
- Resource: `running | stopped | down | unknown`
(State transitions and automation defined in the Status section, later.)
### Search strategy
- **Full-text** — Postgres `tsvector` + GIN on `pages.body_md`,
`refs.title+summary+body_text`, `source_docs.body_text`, `messages.body`.
One query, all knowledge types.
- **Semantic** — pgvector HNSW indexes on `pages.embedding`, `refs.embedding`,
`source_docs.embedding`, `conversations.embedding`. Embeddings generated by
Ollama at write time, async via worker.
- **Combined** — search API does FTS + vector in parallel, fuses with
reciprocal-rank fusion. Filters by Space, Project, tags, kind.
### Key design decisions
1. **Polymorphic links over dedicated junction tables** — one `entity_links`
table instead of ~20 pairwise junctions. Loses Postgres-enforced FK
integrity on polymorphic columns; pays back in flexibility. Periodic
integrity-check query catches orphans.
2. **Audit log is the only mutation history** — no per-entity history tables.
Powers pending-changes inbox, Resource change history, and forensics from
one mechanism.
3. **`page_revisions` is the exception** — full markdown snapshots, not diffs.
Disk is cheap; debugging a corrupted page from a 12-step diff chain is not.
4. **JSONB for variable shape**`metadata` columns on `refs` (kind-specific),
`resources` (monitoring config), `agents` (capabilities, scopes). Add fields
without migrations.
---
## API Surface [locked]
### REST (Void UI ↔ void-server)
Standard CRUD per entity under `/api/`, JSON in/out, errors as
`{error: {code, message, details}}`. Pagination via `?limit=&offset=`.
Endpoint groups: spaces, projects, tasks, pages (+ revisions, backlinks),
refs, source_docs (+ resync), resources (+ dependencies, changes),
conversations (+ messages), agents, search (unified FTS + vector with RRF),
tags, links, pending-changes (approve/reject), audit, capture
(karakeep webhook, manual url, file upload, youtube), jobs (observability).
**Auth:** Bearer token. Single owner token for the Void UI. Per-agent tokens in
a separate `agent_tokens` table (hashed). Audit log records `actor_kind` +
`actor_id` on every mutation.
### MCP (AI agents ↔ void-server)
Smaller, task-oriented surface — not full CRUD. Tools enforce per-agent
capabilities; default-tier agents get writes routed to `pending_changes`.
Initial tools:
`void.search`, `void.get_entity`, `void.list_projects`, `void.list_tasks`,
`void.related`, `void.read_conversation`, `void.resource_status`,
`void.draft_page`, `void.draft_task`, `void.draft_ref`,
`void.append_journal`, `void.suggest_link`, `void.update_entity`.
**Transport:** both stdio (for Claude Code spawned subprocess) and HTTP/SSE
(for Open WebUI, OpenClaw, remote agents). Same tool definitions, two
transports. Capability checks happen in tool handlers, which call the same
`repos/` as REST — one source of truth, two front doors.
---
## Capture Workers [locked]
### Job kinds (one Python module per kind)
`ingest.karakeep`, `ingest.url`, `ingest.youtube`, `ingest.video`,
`ingest.pdf`, `ingest.image`, `ingest.file`, `sync.source_doc`, `embed.text`,
`summarize.conversation`.
### Job lifecycle
```
queued → claimed → running → done
↘ failed → retry (exp backoff: 10s, 60s, 5m) → dead-letter
```
Workers atomically claim via pg-boss, validate input, check idempotency,
do work, write results in a transaction (entity row + audit log + downstream
enqueues), mark done. Transient errors retry; permanent errors dead-letter
immediately.
### Idempotency
Every job carries `idempotency_key`. For URL/Karakeep ingest:
`key = sha256(source_url + space_id)`. If a successful job with that key
exists, no-op.
### Concurrency (per-kind queues)
| Kind | Limit | Reason |
|---|---|---|
| `ingest.youtube`, `ingest.video` | **1** | Whisper GPU-bound on A2000 6GB |
| `ingest.pdf`, `ingest.image` | 2 | Tesseract CPU-bound |
| `ingest.url`, `ingest.karakeep`, `ingest.file` | 4 | Network/disk-bound |
| `sync.source_doc` | 1 | One source at a time; don't hammer upstream |
| `embed.text`, `summarize.conversation` | 2 | Ollama-bound |
### Blob storage
Content-addressed on local disk: `/var/lib/void/blobs/<sha-prefix>/<sha>`.
Deduplicates identical files. ZFS dataset replicated to Leonardo via existing
syncoid daily. MinIO is a future option, not day-one.
### Dead-letter & monitoring
pg-boss managed dead-letter table. Void UI "Jobs" panel shows pending,
running, recent completions, dead-letter with retry/delete actions.
### Downstream chaining
Finished jobs enqueue more jobs in the same transaction (e.g., source doc
sync → embed each chunk). Keeps everything resumable: if Ollama is down,
the entity saves without embedding, embed retries later.
---
## UI / Orchestrator Shape [locked]
### Shell
Three columns, Cradle aesthetic preserved (blackflame palette, Cradle naming).
- **Sidebar:** Spaces tree on top (collapsible, drag-to-reorder); global views
below — Sacred Valley, Agents, Inbox (pending changes with count), Resources
cross-space, full Search
- **Main pane:** context-dependent view (Space, Project, Page editor, Reference
detail, Resource detail, Search, Sacred Valley, Inbox, Conversation)
- **Right rail:** always-visible context-aware chat companion, collapsible to
slim tab. Agent scoped to current view; per-Space default agent. Drag-handle
to resize.
- **Top bar:** universal capture button (paste/drop → AI suggests Space+Project
→ confirm), global search, pending-changes bell with count, user/agent toggle
### Views (main pane)
| View | Purpose |
|---|---|
| Space | Overview of projects, tasks, refs, pages, resources in that space |
| Project | Header (status/dates), Tasks, References, Pages, Conversations, Resources |
| Page editor | Markdown editor with split preview, FTS in-page, attach upload |
| Reference detail | Media preview + AI summary + metadata + tags + linked-from |
| Resource detail | Health header + dependencies graph + Source Docs + runbook Pages + change history |
| Search | Unified FTS + vector results, grouped by type, sidebar filters |
| Sacred Valley | Current gridstack dashboard, preserved (weather, speedtest, host-perf, briefings, service health) |
| Inbox | Pending changes grouped by agent, with diff viewer + approve/reject |
| Conversation | Full-window chat when right-rail isn't enough |
### Defaults
- **Landing page:** last-viewed Space, falling back to a "Home" overview of
recent activity across all Spaces
- **Sacred Valley:** kept as a named sidebar view (not the default homepage)
- **Right-rail chat:** always visible, context-aware, collapsible
- **Capture button:** paste-anything modal → AI infers kind (URL/file/text)
→ suggests Space+Project from content + tags → user confirms or overrides
### Pending Changes Inbox
Items grouped by agent. Each shows entity-type icon + agent's reason + diff
viewer + approve/reject. Approving runs the mutation through the same repo as
a direct write would (single code path).
---
## Security & Auth [locked]
### Authentication layers
| Layer | Mechanism | Scope |
|---|---|---|
| Owner via browser/mobile | Cloudflare Access (Google IDP, restricted email) → CF Tunnel → Void 2.0 | Full owner |
| AI agents via MCP | Bearer tokens, bcrypt-hashed in `agent_tokens`. Scoped by `agents.capabilities + scopes` | Per-agent tiered |
| void2-app → void2-db | Dedicated Postgres user, limited grants, LAN-only | Service account |
| void2-app → Ollama | LAN, no auth | LAN-only |
### Remote-access boundary
| Surface | Reachable how | Behind CF Access? |
|---|---|---|
| `void.hynesy.com` (UI) | CF Tunnel | Yes — Google auth, your email |
| `mcp.void.hynesy.com` (MCP HTTP/SSE for remote agents) | CF Tunnel | Yes — CF Access Service Tokens |
| Internal MCP (Claude Code, Open WebUI on CT 103) | Direct LAN | No — local |
| Postgres | LAN-only, firewalled | n/a |
### Secrets handling
- Bootstrap secrets in `.env` files on each LXC, `chmod 600`, owned by service user
- `resource_credentials.vault_path` is a *pointer string* (`env:NAME`,
`file:/path`, or future `vault:id`). Void 2.0 resolver reads from env or file.
Schema unchanged if/when we swap to Vaultwarden — only the resolver changes.
- Agent tokens shown plaintext **once** at creation, then bcrypt-hashed.
- No secrets in audit log (per-entity redaction before write).
### Privacy posture
- All AI inference local by default (Ollama on CT 102)
- Claude API calls cross to Anthropic — documented egress channel; PII flagging
not in v1
- Audit log retains every mutation for forensics
### Backup posture
- ZFS daily syncoid replication of `void2-db` + blob datasets to Leonardo
- Postgres `pg_dump` cron daily (restore-test friendly, independent of ZFS)
- Encrypted ZFS datasets for any off-site replica targets later (Farm)
### Out of scope (v1)
mTLS between internal services, field-level encryption in DB, HSMs, PII
detection before LLM egress.
---
## Future Improvements (deferred)
These are intentionally **not** day-one work. Tracked so they don't get
forgotten:
- **Vaultwarden secrets store** — user explicitly asked to be reminded. Day-one
resolver was designed so this is a swap, not a schema change. See
[auto-memory: project_void_v2_vaultwarden_followup].
- **Own bookmark capture front-end** to replace Karakeep
- **MinIO** for blob storage (S3-compatible access from elsewhere)
- **Extract MCP** to its own LXC if it grows independently
- **True clustering / instant failover** (Patroni) if zero-downtime maintenance becomes needed
- **PII detection** before Anthropic API egress
- **Mobile-optimized capture flow** (PWA install, share-target intent on Android)
- **Local STT** (Whisper) for voice notes as a capture kind
- **RSS / email** ingest
---
## Naming & Versioning [locked]
This project is **Void 2.0** — a full remaster of the existing Void
(retroactively "Void 1.x") with the same Cradle aesthetic, expanded into a
homelab orchestrator + canonical knowledge store. "Codex" is **not** a name —
just a way we referenced the data-layer concept during brainstorming. There
is no `Codex` brand or module; the data layer is `lib/db/` / `lib/repos/`
inside `void-server`.
### Repo / process / LXC naming
- **Repo:** `/project/src/void-v2`
- **Processes:** `void-server` (Node), `void-workers` (Python)
- **LXCs during cutover:** `void2-db`, `void2-app` (the `2` suffix avoids
clashing with current CT 301 `void`). After CT 301 retirement: rename to
plain `void-db`, `void-app`.
- **Domains:** `void.hynesy.com` (UI), `mcp.void.hynesy.com` (MCP HTTP/SSE)
- **MCP tool prefix:** `void.search`, `void.draft_page`, etc.
### Version strategy
Semver: `MAJOR.MINOR.PATCH`.
- **2.0.0** — initial Void 2.0 release after Void 1.x retirement
- Minor bumps for added features, patch bumps for fixes
- Major bumps reserved for architecture/schema changes that require migrations
### CHANGELOG
`CHANGELOG.md` at the root of `/project/src/void-v2`, following the [Keep a
Changelog](https://keepachangelog.com) convention. Entry for **2.0.0**
captures the differences from Void 1.x at a high level (architecture, schema,
capture pipeline, agent model, naming). Subsequent releases get their own
sections. Each entry: Added / Changed / Deprecated / Removed / Fixed.
A separate `docs/VERSION_HISTORY.md` carries the **narrative** version
history — when each release happened, the headline thinking behind it,
deferred items rolled in, lessons. Lives alongside the design spec for
long-term archaeology. Each `MAJOR.x.x` release gets a section.
---
## Migration / Cutover Plan [locked]
### Existing data inventory
| Source | Location | Volume | Maps to |
|---|---|---|---|
| Void 1.x SQLite | CT 301 | wiki_pages (~25), messages, projects, conversations | Void 2.0 `pages`, `messages` (grouped into `conversations`), `projects` |
| BookStack | CT 104 MariaDB | ~17+ pages, hierarchy | `pages` (parent_id preserved); dedupe vs already-imported wiki_pages |
| Karakeep | CT 100 | bookmarks + AI summaries + tags | `refs` (kind=url), `external_id` = karakeep id |
| `/root/.claude/plans/*.md` | filesystem | 5 plan files | `pages` under each plan's Project |
| Void 1.x agent personas | `/project/src/void/characters/` | 7 agents × 3 files | `agents.persona_path` |
| Void 1.x schema YAMLs | `/project/src/void/schemas/` | 11 services | `resources` seed data + `resources.monitoring` jsonb |
| Void 1.x code (theme, cron logic) | source | selective | Reused inside `void-server` |
| Auto-memory entries | `/root/.claude/projects/-project/memory/*.md` | ~30 entries | **Mirrored** — see below |
### Migration script structure
Python migration tool in `void-workers/migrate/` with sub-commands:
```
void-migrate bookstack --source-db <conn>
void-migrate karakeep --source-db <conn>
void-migrate void1-sqlite --source-db <path>
void-migrate plans --source-dir /root/.claude/plans/
void-migrate memory --source-dir /root/.claude/projects/-project/memory/
void-migrate void1-schemas --source-dir /project/src/void/schemas/
void-migrate void1-personas --source-dir /project/src/void/characters/
```
Each command is **idempotent** — uses source IDs / file paths as `external_id`
so re-runs upsert rather than duplicate.
### Auto-memory: one-way mirror (files stay primary)
Auto-memory files remain the source-of-truth — Claude Code's harness reads them
directly across sessions. A worker mirrors them into Void 2.0 as Pages under a
"Memory" Space:
- Mirror runs on file change (inotify) and nightly as safety net
- Pages get `external_id = file path`, idempotent upsert
- Edits in Void 2.0 UI flow back to files via a `::memory-update` marker
(same pattern Path B established)
- Auto-memory remains canonical; Void 2.0 view is searchable, MCP-readable,
visible in the UI
### Cutover: stand up alongside, big-bang switch with grace period
1. Build Void 2.0 on new LXCs (`void2-db`, `void2-app`) without touching CT 301
2. Run migration scripts (read-only access to BookStack + Karakeep + Void 1.x DBs)
3. Verify counts + spot-check content
4. **Cutover day:** swap `void.hynesy.com` CF tunnel target from CT 301 to
`void2-app`
5. **Grace period (30 days):** CT 301 stays read-only as fallback
6. **Retire CT 301:** snapshot, stop, rename `void2-*` LXCs to `void-*`
### Cron / scheduled task migration
Existing Void 1.x cron (Dross briefing, Yerin alerts, Little Blue heal, hourly
speedtest, Orthos council) ports directly to `void-server/lib/cron/tasks/`.
Same logic, same timing, against Void 2.0's data.
---
## Testing Approach [locked]
| Layer | Coverage | How |
|---|---|---|
| Unit | Repos, capability checks, helpers (slug gen, idempotency keys, embedding pad/truncate) | Node: vitest. Python: pytest. |
| Integration | REST + MCP tools against a test DB | Postgres-in-docker; schema applied from migrations; reset per test |
| E2E | Happy paths: create Space/Project, capture URL, search, approve pending change, attach ref | Playwright against running test instance |
| Manual (runbook'd) | Capture workers (Whisper, OCR), agent runtime (Claude subprocess + Ollama), CF Access flows | `docs/testing/manual.md` — too heavy or external for CI |
| Migration scripts | All `void-migrate` sub-commands | Fixture DBs for BookStack + Void 1.x + Karakeep; assert counts + spot-check content |
**Coverage target:** ~70% on `lib/` modules. Lower on routes/UI — covered by
integration + E2E instead. No coverage chasing.
**CI:** GitHub Actions if you mirror to a remote; local pre-push hook otherwise.
Runs unit + integration on every change to `void-server` or `void-workers`.
---
## Status / Lifecycle Model [locked]
| Entity | States | Transitions | Automation |
|---|---|---|---|
| Project | `idea`, `active`, `paused`, `done`, `abandoned` | Free (any-to-any) | None; manual |
| Task | `todo`, `doing`, `blocked`, `done` | Free | `done` sets `completed_at` |
| Resource | `running`, `stopped`, `down`, `unknown` | Auto + manual override | Health check cron updates; manual override pins until `maintenance_until` |
| Conversation | `open`, `summarized`, `archived` | Auto with overrides | `summarize.conversation` worker moves to `summarized` after 24h idle |
| Reference | `ingested`, `indexed`, `enriched` | Worker-driven | Pipeline: capture → FTS indexed → embedded + AI summary done |
| Pending Change | `pending`, `approved`, `rejected` | User-driven | None |
**Free transitions** everywhere user-facing. Homelab work is rarely linear; the
audit log captures every transition.
**Resource status reconciliation:** health check cron writes `status` and
`last_check`. Manual override during planned maintenance pins state until a
`maintenance_until` timestamp.
---
## Pending Sections — to complete before this is plan-ready
(All sections locked. Spec ready for user review.)
---
## Decision Log
| Date | Decision | Why |
|---|---|---|
| 2026-05-30 | Foundation-first Void 2.0 over evolve-Void | Long-term HA requirement makes single-LXC SQLite a dead end |
| 2026-05-30 | 2 LXCs, planned-migration HA | User confirmed instant failover not needed |
| 2026-05-30 | Postgres + pgvector (no separate Qdrant) | Simpler — one DB does relational + vector |
| 2026-05-30 | Three-tier Space → Project → Task with sibling tasks | Matches how user organizes; allows ad-hoc TODOs |
| 2026-05-30 | Pages + References + Source Docs as three knowledge types | Authored vs captured vs upstream-mirrored are genuinely different |
| 2026-05-30 | Conversations first-class, attach to other entities | "Create project from chat" + AI needs prior conversation context |
| 2026-05-30 | Rich Resource entity (dependencies, creds refs, change history) | User wants real orchestrator, not just inventory |
| 2026-05-30 | Keep Karakeep as bookmark inbox; webhook into Void 2.0 | Karakeep works; building own is a deferred improvement |
| 2026-05-30 | Day-one capture: URLs, videos, PDFs, images, files | Full pipeline, no half-measures |
| 2026-05-30 | Agents: read+suggest default, per-agent tiered promotion | Balance usefulness with safety |
| 2026-05-30 | Greenfield Void 2.0 (Approach A), copy valuable bits from Void | Clean break from accumulated Void shape |
| 2026-05-31 | Two-process layout (Node server + Python workers) on one LXC | Right-tool-per-job; Python for ML, Node for API/UI/cron |
| 2026-05-31 | pg-boss job queue (not Redis/RabbitMQ) | Postgres is already there; one fewer service |
| 2026-05-31 | Skip Redis cache | DB isn't the bottleneck; Ollama/Whisper/OCR are. Reconsider only if profiling shows it. |
| 2026-05-31 | Audit log is append-only, polymorphic | One mechanism for change history + agent action tracking + pending-changes inbox |
| 2026-05-31 | `vector(1024)` everywhere with zero-padding for 768-dim embeds | Model swap is a re-embed pass, not a DDL migration |
| 2026-05-31 | Polymorphic `entity_links` over ~20 pairwise junction tables | Flexibility wins at this scale; periodic integrity check covers FK gap |
| 2026-05-31 | Single implicit user; audit columns ready for multi-user later | Multi-user is a non-breaking migration if ever needed |
| 2026-05-31 | MCP exposes task-oriented tools, not raw CRUD | Smaller surface for agents = safer + clearer semantics |
| 2026-05-31 | MCP supports both stdio + HTTP/SSE | Covers Claude Code (stdio) and network agents (HTTP) without bridges |
| 2026-05-31 | pg-boss with per-kind concurrency limits | GPU/CPU/network workloads have different parallelism needs |
| 2026-05-31 | Idempotency keys on all ingest jobs | Webhook replays + manual retries shouldn't duplicate content |
| 2026-05-31 | Content-addressed blob store; ZFS replicated via syncoid | Free dedup + your existing replication covers it |
| 2026-05-31 | Whisper concurrency stays at 1 | Conservative; tune after deploy if A2000 has headroom |
| 2026-05-31 | Three-column shell (sidebar / main / right-rail chat) | Matches orchestrator + chat-with-context workflow |
| 2026-05-31 | Sacred Valley kept as sidebar view, not landing page | Frees landing for last-viewed Space; dashboard still one click away |
| 2026-05-31 | Right-rail chat always visible, context-aware | Friction-free 'ask Mercy about this' across all views |
| 2026-05-31 | Universal capture button with AI Space/Project suggestion | One capture surface for all content kinds; reduces friction over per-page add-ref |
| 2026-05-31 | CF Access on UI + MCP-HTTP; LAN-direct for internal agents | Matches owner-via-internet + agent-on-LAN access patterns |
| 2026-05-31 | Env+file vault_path resolver day-one; Vaultwarden swap later | Pragmatic start; resolver swap doesn't change schema |
| 2026-05-31 | Agent tokens bcrypt-hashed, plaintext shown once | Standard bearer-token hygiene |
| 2026-05-31 | mTLS / field-level encryption deferred from v1 | Single-trust-domain LAN homelab; ZFS-at-rest covers it for now |
| 2026-05-31 | Renamed from "Codex" to **Void 2.0** | Preserve Cradle aesthetic + naming continuity from Void 1.x |
| 2026-05-31 | CHANGELOG.md (Keep a Changelog) + VERSION_HISTORY.md (narrative) | User wants major-version comparison + readable narrative archaeology |
| 2026-05-31 | Auto-memory: one-way mirror, files stay primary | Harness keeps working; knowledge stays unified |
| 2026-05-31 | Big-bang cutover with 30-day grace period on CT 301 | Minimal complexity; safety net against forgotten data |
| 2026-05-31 | Free state transitions; audit log records every change | Homelab work is rarely linear; don't over-validate |
| 2026-05-31 | Test coverage target ~70% on lib/, manual runbook for ML/agent flows | Where automation cost exceeds value, document instead |