Commit Graph

239 Commits

Author SHA1 Message Date
root
65fd71dc0d fix(workers): yt-dlp argv injection — scheme check + -- separator
The url passed to yt-dlp is user-controllable (via /api/capture). Any
string starting with '-' would be parsed as a flag (e.g.
--config-location=/etc/passwd). Mitigations:
1. Validate scheme is http(s) and hostname is present before subprocess.
2. Pass `--` to yt-dlp so it stops flag parsing before the positional
   URL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:11:57 +10:00
root
b10b68582d feat(api): capture routes YouTube/Vimeo URLs to ingest.video
POST /api/capture with a youtube.com / youtu.be / vimeo.com URL
enqueues ingest.video (Python worker) instead of ingest.url
(Node worker). Detection by URL hostname; idempotency_key + response
shape unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:08:16 +10:00
root
1ba7aae439 feat(workers): ingest.video via yt-dlp + Whisper
yt-dlp pulls metadata (title, description, uploader, thumbnail) and
bestaudio (opus). faster-whisper transcribes; audio file removed after.
Creates a refs row with kind='video' and source_kind='youtube' for
YouTube URLs, generic 'video' otherwise. Idempotent on
sha256(space_id + url) via refs.external_id.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:07:33 +10:00
root
e64f1345f6 feat(workers): whisper loader with CUDA detect + CPU fallback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:06:50 +10:00
root
2adeae555d fix(deploy): push-workers.sh chowns + preserves .env
Rsync ran as root over SSH so files landed root-owned, but workers run
as voidworkers — the service couldn't even reach the venv binary.
Now: chown -R voidworkers after rsync, run venv create + pip install
under `su voidworkers -c`. Also excludes .env, .gitignore, .pytest_cache
so they survive across deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:06:29 +10:00
root
3d82f0e5d5 feat(jobs): blob worker fans out to extract.pdf / extract.image
After creating a ref, the Node-side ingest.blob worker enqueues a
follow-up job for the Python void-workers (Plan 4) to OCR / extract
text. Other kinds (file) get no follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 09:34:06 +10:00
root
f2035c1de6 feat(workers): extract.image via Tesseract
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 05:00:21 +10:00
root
1f0e9a5f1b feat(workers): extract.pdf with Tesseract fallback
pdftotext first; falls back to per-page pdftoppm rasterization +
Tesseract OCR when the extracted text is < 200 chars. Updates
refs.body_text + metadata.extract.{method,chars} via the repo shim;
audit entry emitted with actor_kind='worker'.

born_digital.pdf fixture padded so pdftotext yields > 200 chars and
the test exercises the pdftotext path, not the OCR fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:59:53 +10:00
root
bbb08a677e test(workers): pdf/image test fixtures
born_digital.pdf (pdftotext extractable), scanned.pdf (image-only, OCR
fallback target), eng_text.png (clear Tesseract-readable text).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:57:41 +10:00
root
2a6f7f88ef feat(workers): systemd unit + push-workers.sh
Deploy README extended with workers bootstrap + note on the void2-db
SQL_ASCII cluster requiring client_encoding=UTF8 on Python clients.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:46:58 +10:00
root
fba1ce48e4 feat(workers): runner loop + echo handler
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:43:52 +10:00
root
3e1dcbb7f8 feat(workers): pgboss claim/complete/fail via psycopg
Adds the Boss class — SELECT … FOR UPDATE SKIP LOCKED to atomically
claim, UPDATE state on completion. Retry semantics match pg-boss:
exponential backoff via retry_count / retry_delay / retry_backoff.

Forces client_encoding=UTF8 on every connection. The void2-db cluster
was initialized as SQL_ASCII so psycopg refuses to decode text by
default; UTF8 client_encoding works because the data is already UTF-8.
Node's pg lib is more forgiving and didn't surface this.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:43:26 +10:00
root
6e3798f6d1 feat(workers): Python skeleton + config + structlog
Plan 4 Phase A scaffolding. void-workers package at /workers/, sibling
of /lib/. pyproject.toml pins Python 3.12 with separate extras for
pdf / image / video / test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:41:33 +10:00
root
c4663992ec docs: Plan 4 implementation plan
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:39:55 +10:00
root
7514d9bee6 docs: Plan 4 design spec (Python void-workers)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:33:48 +10:00
root
54ba68a11c docs: move void-v2 specs + plans into the repo
All Void 2.0 superpowers specs and implementation plans now live at
docs/superpowers/{specs,plans}/ inside the repo. Previously they were
at /project/docs/superpowers/ which was not under git.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:11:32 +10:00
root
24ce601d94 fix(ingest): pinnedDispatcher lookup must use undici array form
cb(null, address, family) was returning Invalid IP address: undefined
under undici v6. Returning the full records array (each {address, family})
gives undici what it expects and lets it pick the best family.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:10:47 +10:00
root
837bf2a5b4 docs: Plan 3 completion summary
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:01:12 +10:00
root
a02a96ea5f chore: version 2.0.0-alpha.3 + changelog
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:00:32 +10:00
root
2ad4a32b3a feat(ui): Jobs panel with retry/delete + 10s polling
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:56:52 +10:00
root
063c29a835 feat(ui): drag-drop capture onto the main panel
Drops into #main POST /api/capture/upload one file at a time, with
space_id pre-filled from localStorage.last_space_id (set whenever the
space view renders).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:56:30 +10:00
root
d7f9bde5e9 feat(api): karakeep webhook (HMAC-verified)
POST /api/ingest/karakeep accepts Karakeep webhook payloads. HMAC
signature on the raw body captured by express.json's verify hook.
Mounted on app before mountApi so it bypasses agentOrOwner — the
shared secret IS the auth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:55:57 +10:00
root
d1e986bc9c feat(jobs): ingest.karakeep worker
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:55:03 +10:00
root
de1d7e3476 feat(karakeep): bookmark fetch client
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:54:21 +10:00
root
62ac022f65 test(ai): live ollama embed integration (gated)
Auto-skips when CT 102 / OLLAMA_URL is unreachable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:50:49 +10:00
root
f116811dda feat(search): hybrid FTS + vector with RRF + graceful Ollama fallback
Replaces FTS-only /api/search in place. RRF (k=60) fuses ts_rank and
pgvector cosine distance rankings. Vector branch silently skipped when
Ollama times out / errors, keeping search snappy and resilient.

Messages have no embeddings in Plan 3, so they participate in the FTS
branch only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:50:33 +10:00
root
99ab1ffb70 fix(ingest): pin resolved IP into safe_fetch to defeat DNS-rebinding
Replaces the validate-then-call-fetch pattern (which left a TOCTOU
window where the OS resolver could return a different IP at connect
time) with an undici Agent dispatcher whose lookup() returns the IP we
already validated. Same hardening on every redirect hop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:48:52 +10:00
root
e558be49a9 feat(jobs): repo-level embed triggers (pages/refs/source_docs)
create/update on embeddable repos enqueue embed.text with a singleton
key that coalesces rapid edits. No-op when the queue is not running
(server tests construct createApp without booting pg-boss).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:48:03 +10:00
root
37b7753360 feat(jobs): embed.text worker (Ollama → vector(1024))
Pads nomic-embed-text's 768 dims to 1024 zeros so a later 1024-dim model
swap is a re-embed, not a migration (per master spec).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:43:57 +10:00
root
5799ea663e feat(ai): ollama embed-text wrapper
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:43:27 +10:00
root
afc20712cb feat(api): capture POST + upload + SSRF-safe URL fetch
safe_fetch.js validates URLs before fetch: rejects non-http(s), literal
or DNS-resolved loopback / RFC1918 / link-local / CGNAT / metadata
addresses; follows redirects manually with the same checks on each hop.
Test fixtures gate the check with VOID_INGEST_ALLOW_PRIVATE for offline
fixtures that hit 127.0.0.1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:42:54 +10:00
root
eceebd2947 feat(jobs): ingest.blob worker (content-addressed)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:36:15 +10:00
root
3ccfd20b5f feat(jobs): ingest.url worker (fetch + readability + idempotent ref)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:35:44 +10:00
root
6e973404e9 feat(ingest): content-addressed blob store
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:35:06 +10:00
root
c6e72e93d5 feat(ingest): readability wrapper
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:34:51 +10:00
root
8d2afcd040 chore(deps): @mozilla/readability + jsdom + multer for ingest
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:34:35 +10:00
root
6d42c7b440 feat(ui): jobs view stub + sidebar entry
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:32:16 +10:00
root
ec8517a82c feat(api): jobs routes (list/get/retry/delete, owner-only)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:29:52 +10:00
root
57efa4cbaa feat(jobs): jobs repo (list/getById/retry/remove)
Unifies pgboss.job (current, per-queue partitioned) and pgboss.archive
under one SELECT for operator views. retry promotes archived rows back
into the active partition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:29:03 +10:00
root
53ffd705c4 feat(jobs): echo worker + CLI bootstrap
Job queue starts only in the CLI gate (not inside createApp), so tests
manage their own queue lifecycle. waitForJob() takes a (name, id) pair
to match pg-boss v10's getJobById signature.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:28:06 +10:00
root
17a13dddb8 feat(jobs): pg-boss singleton client
Per-name ensureQueue promise dedup so concurrent enqueue+subscribe
on the same queue do not race createQueue (Postgres deadlock).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:26:37 +10:00
root
2d7c6993c2 chore(deps): add pg-boss ^10
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:24:22 +10:00
root
fa47419cbd docs: Plan 2 completion summary
22/22 tasks landed; 185 tests; 10 commits; SPA renders end-to-end
including the agent suggest -> owner approve flow. Captures the UI
smoke matrix, security findings handled, and what's deferred to
Plans 3-6.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:27:58 +10:00
root
8ae9bced24 chore: version 2.0.0-alpha.2 + changelog
Search view: read ?q from hash, call /api/search, group hits by kind
with rank + space_id; sidebar filters for kinds and space_id; updates
on Enter or filter change.

Bumps package.json + server.js VERSION to 2.0.0-alpha.2 and pins the
/health version assertion to match.

CHANGELOG: full Plan 2 entry covering API surface, capability tiering,
audit chain extension (approve/reject events), and the SPA shell.

Security: adds safeHref() to dom.js and applies it everywhere an
API-supplied URL becomes href / src (reference media block + reference
source_url anchor + resource url anchor). javascript: and other
non-http(s)/mailto schemes from agent-suggested content can no longer
execute in the owner's browser.

Plan 2 surface is feature-complete: 22/22 tasks landed, 185 tests
across 43 files, SPA renders end-to-end including the suggest -> approve
agent flow.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:26:56 +10:00
root
cd05dfd130 feat(ui): resource + inbox views
Resource: status header (status + runtime_type + host + clickable URL),
three-column row of Dependencies (with add-by-UUID form) / Source docs /
Runbook pages (filters /api/links/to/resource/:id for from_type=page
and relation=runbook, then fetches each page title). Change history
card pulls /api/resources/:id/changes.

Inbox: groups pending changes by agent. Each row shows entity type
badge + action + reason, with a collapsible payload disclosure.
Approve calls /api/pending-changes/:id/approve and, if the response
carries entity_id, navigates to the resulting detail view. Reject just
re-fetches. Both update the shared pending-count emitted to state.js
so the sidebar and topbar badges drop immediately.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:21:18 +10:00
root
ee582640ea feat(ui): page editor + reference detail
Page view: header + split-pane markdown editor (textarea on left,
marked + DOMPurify rendered preview on right) + backlinks card pulling
/api/pages/:id/backlinks. Save calls PATCH /api/pages/:id with body_md
and surfaces the resulting updated_at as a timestamp.

Reference detail: media block (image preview / YouTube embed via
youtube-nocookie / link fallback), summary card, metadata table, tags
card with attach/detach (creates the tag idempotently then attaches),
linked-from card from /api/links/to/ref/:id.

marked + DOMPurify vendored to public/vendor as ESM. The markdown
editor uses the explicit html: opt-in on dom.js's preview element
only — all other text comes from textContent.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:19:23 +10:00
root
ea5a99acff feat(ui): space + project + home views
Home: recent activity feed from /api/audit/actor?limit=20 with relative
timestamps and entity-typed links into detail views.

Space: header + three-column row of Projects / Open tasks (status=todo) /
Recent pages + refs cards. Status badges on projects and tasks use the
shared .status palette.

Project: header (status + start/complete dates), Tasks card with inline
status badges that cycle todo->doing->blocked->done on click (PATCH
/api/tasks/:id), Pages in space card, Add-task inline form bound to
project_id.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:17:16 +10:00
root
e5b2181ee3 feat(ui): sidebar + topbar + rightrail components
Sidebar: Spaces tree with lazy-expand to projects on caret click; bottom
Navigate section with Sacred Valley / Search / Inbox + placeholders for
Agents and Resources greyed out as later. Inbox item carries a
pending-count badge that wires to state.js so the topbar bell and the
sidebar share one poll.

Topbar: brand, + Capture button (modal stub for Plan 3 capture queue),
global search input (Enter -> /search?q=), pending Inbox bell with
matching badge, Owner toggle (stub for agent-switching post-Plan-2).

Rightrail remains the T17 collapsible companion placeholder.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:14:44 +10:00
root
59ad86425d feat(ui): static shell + router + api wrapper
Three-column grid (sidebar / main / right rail) with Cradle aesthetic:
blackflame accent on Cinzel display headings + Cormorant Garamond body
in cards, system UI for chrome. Hash-based router covers all entity
routes plus search, inbox, sacred-valley. api.js stores OWNER_TOKEN in
localStorage and prompts via a modal on 401. dom.js provides safe el()
+ mount() builders so no component ever assigns innerHTML from API data
(the only exception is an explicit, scary-named html: opt-in for
sanitizer output, used later by the markdown editor).

state.js is a tiny event bus for shared chrome state (pending count).
Components and views are loaded as ES modules — sidebar / topbar /
rightrail + 9 view stubs that the later Phase E tasks fill in.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:12:18 +10:00
root
69e26ada98 feat(api): unified FTS search
Single GET /api/search?q=&space_id=&kinds=&limit=&offset= unions FTS
hits across pages / refs / source_docs / messages with a `kind`
discriminator and ts_rank ordering. Each branch's to_tsvector matches
the GIN index expression on its source table so indexes are used.
Messages have no space_id and are excluded when a space filter is set.
Hybrid vector / RRF lands in Plan 3.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 02:04:57 +10:00