Files
Void-Homelab/docs/superpowers/specs/2026-06-04-void-migrate-design.md
2026-06-04 22:14:29 +10:00

82 lines
6.2 KiB
Markdown

# void-migrate — Design (Plan 8a)
**Date:** 2026-06-04 · **Component:** Void 2.0 · **Phase:** Plan 8 (Migration & Cutover), sub-project **8a** of 2 · **Status:** Approved (design)
## Goal
A re-runnable Node CLI that imports the data-bearing Void 1.x sources into Void 2 — preserving everything that would be lost when CT 301 / CT 104 / Karakeep retire — so the **8b cutover** (separate, gated) can swap `void.hynesy.com` to void2 and drop `-alpha`.
## Scope (locked)
- **In:** `void1-sqlite` (Void 1 core), `bookstack`, `karakeep`, `plans`.
- **Out:** schemas (DB-backed service registry already replaced it), personas (now a code module), memory-mirror (separate ongoing-sync subsystem; the memory files stay canonical on disk regardless).
- **8b cutover is out of this spec** — verify → tunnel swap → grace → retire CT 301 → rename → drop `-alpha` → tag `2.0.0`.
## Decisions (locked)
1. **Node CLI in the void-2 repo**, reusing the app's repos + validation (not Python-via-API).
2. **Idempotency via a `migration_map` table** — re-running never duplicates. (Refs already have `upsertByExternal` — bookmarks use it.)
3. **One Space per source:** `void1`, `wiki` (BookStack), `bookmarks` (Karakeep), `plans`.
4. **Read paths:** Void 1 via `node:sqlite` (Node 22 built-in, on a copied `void.db`); BookStack + Karakeep via their REST APIs (tokens); plans via the filesystem.
## Sources (verified)
- **Void 1 (CT 301, `/opt/void/data/void.db`, SQLite):** tables incl. `wiki_pages` (+`wiki_revisions`/`wiki_backlinks`), `projects` (+`project_tasks`, `project_journal`, `project_pages`), `conversations` + `messages`. (`mastra.db` = old agent layer, not migrated.)
- **BookStack (CT 104, MariaDB):** ~17 pages under books/chapters — pulled via the BookStack REST API.
- **Karakeep (CT 100, containerized + REST API + Meilisearch):** bookmarks — pulled via the Karakeep REST API.
- **Plans:** `/root/.claude/plans/*.md` on the dev host.
## Architecture
`migrate/` CLI: `node migrate <source> [--dry-run]` and `node migrate verify`. Each importer reads its origin, maps to Void 2 entities, writes through repos. Before creating a non-ref entity it checks `migration_map(source, source_id, entity_type)`; after creating, it records the new id. `--dry-run` logs intended writes without performing them.
```
node migrate void1 --dry-run # preview
node migrate void1 # import → Space 'void1'
node migrate bookstack | karakeep | plans
node migrate verify # source vs migrated counts per type
```
## Components
| File | Responsibility |
|---|---|
| `lib/db/migrations/018_migration_map.sql` | `migration_map(id, source, source_id, entity_type, entity_id, created_at)`, `UNIQUE(source, source_id, entity_type)`. |
| `lib/db/repos/migration_map.js` | `seen(source, srcId, type)` → existing entity id or null; `record(source, srcId, type, entityId)`. |
| `migrate/cli.js` | Arg parse + dispatch (`void1`/`bookstack`/`karakeep`/`plans`/`verify`), `--dry-run`. |
| `migrate/spaces.js` | `ensureSpace(slug, name)` — idempotent by slug; returns space id. |
| `migrate/sources/void1.js` | Open copied `void.db` (`node:sqlite`); map `wiki_pages`→pages, `projects`→projects (+`project_tasks`→tasks, `project_journal`→pages), `conversations`+`messages`→conversations+messages. → `void1`. |
| `migrate/sources/bookstack.js` | BookStack API (`BOOKSTACK_URL`, `BOOKSTACK_TOKEN_ID`, `BOOKSTACK_TOKEN_SECRET`) → pages; book/chapter become parent pages. → `wiki`. |
| `migrate/sources/karakeep.js` | Karakeep API (`KARAKEEP_URL`, `KARAKEEP_TOKEN`) → refs `kind=url` via `refs.upsertByExternal` (`source_kind='karakeep'`, `external_id`=bookmark id). → `bookmarks`. |
| `migrate/sources/plans.js` | Read `PLANS_DIR/*.md` → pages (title = first `# heading` or filename; body = file). → `plans`. |
## Field mappings
- **void1 `wiki_pages`** → page: `title`, `body_md` (from content/markdown column), `slug` (slugified title), space `void1`. map key `(void1, wiki_pages:<id>, page)`.
- **void1 `projects`** → project (`name`, `description`, `status`); `project_tasks`→tasks (`title`, `status`, `project_id`); `project_journal`→pages (journal entries as pages, linked by title).
- **void1 `conversations`/`messages`** → conversations (`title`, space `void1`) + messages (`role`, `body`, ordered).
- **bookstack page** → page (`title`, `body_md` from HTML→markdown or the raw markdown field, `parent_id` = the migrated book/chapter page). map key `(bookstack, page:<id>, page)`.
- **karakeep bookmark** → ref (`kind='url'`, `url`, `title`, `source_kind='karakeep'`, `external_id`=bookmark id) via `upsertByExternal`.
- **plan file** → page (`title`, `body_md`=file contents). map key `(plans, <filename>, page)`.
## Idempotency / dry-run / verify
Non-ref entities: `migration_map.seen` short-circuits a re-run. Refs: `upsertByExternal`. `--dry-run` performs reads + mapping but no writes (logs a per-type would-create count). `verify` queries each source for its counts and `migration_map`/refs for migrated counts, printing a per-source table so completeness is auditable before cutover.
## Error handling
A bad row is logged and skipped (migration continues); the summary reports skipped counts. Source connection failures (API/DB) abort that source with a clear message, leaving prior sources' results intact. Nothing partial is left half-mapped (record the map row only after a successful entity create, in the same path).
## Testing (vitest, serial — `fileParallelism:false`)
- **`migration_map`**: `seen` null→id after `record`; unique prevents dupes.
- **`spaces.ensureSpace`**: creates once, reuses.
- **void1**: a fixture `void.db` (built in-test via `node:sqlite`) with a couple of wiki_pages/projects/tasks/conversations → assert mapped entities + **idempotency** (run importer twice → counts unchanged).
- **bookstack/karakeep**: injected fetch returning fixture API payloads → assert pages/refs created + idempotency.
- **plans**: a fixture `.md` dir → assert pages.
- **verify**: returns the expected per-source counts from seeded data.
## Out of scope (YAGNI)
Live two-way sync; migrating FTS/revisions/usage_log/sessions/api_tokens; HTML fidelity beyond a basic BookStack→markdown; the memory mirror; the 8b cutover ops.