Commit Graph

200 Commits

Author SHA1 Message Date
root
d0d61575e3 feat(ai): vault_path secret resolver (env:/file:)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 18:04:21 +10:00
root
5f601c1a3c chore(deps): add @anthropic-ai/sdk for companion runtime
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 18:03:12 +10:00
root
31fb859fa4 docs(plan5): companion chat implementation plan (16 TDD tasks)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 18:01:01 +10:00
root
1cc2abf95c docs(plan5): companion chat design spec
Scope B (knowledge assistant + drafting via pending_changes approval chain),
lean Anthropic-SDK runtime (supersedes the top-level spec's Mastra wording),
extensible shared tool registry (search/read/propose_change/context), per-Space
ambient companion, SSE turn lifecycle, inline draft card synced with the Inbox,
structural prompt-injection containment. Ignore .superpowers/ brainstorm dir.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 17:49:08 +10:00
root
941df0d0d2 fix(deploy): point deploy targets at CT 311 new IP .216
Post-outage recovery: a rogue OpenWrt device seized 192.168.1.13 after the
power-cut reboot, ARP-poisoning the LAN so CT 311 was unreachable despite being
healthy. Renumbered CT 311 .13 -> .216 (out of the conflict-prone low range,
next to the DB at .215). Update push.sh + push-workers.sh defaults to
root@192.168.1.216; push.sh no longer defaults to the void2-app hostname (that
resolves to the Cloudflare tunnel and can't carry SSH).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-01 17:49:08 +10:00
root
6cba2e82da fix(deploy): exclude venv/ from push-workers rsync
The prod venv at /opt/void-workers/venv was being deleted on every
push because rsync --delete saw no matching dir in the source (which
has .venv/, not venv/). Now both names are excluded.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 11:04:21 +10:00
root
a8b2cddcf5 fix(workers): safe_fetch pins IP + manual redirect re-validation
Two real findings from the security reviewer:

1. urllib auto-follows 3xx redirects via the default HTTPRedirectHandler.
   The previous code's hop loop never ran — urllib silently followed.
   Replaced with http.client + a manual hop loop. Every hop re-runs
   _validate_url, so an open-redirect to 127.0.0.1 / RFC1918 / metadata
   gets caught on the second hop.

2. DNS TOCTOU — _resolve() validated but urllib.request re-resolved on
   connect. Now the connection is pinned to the validated IP via a
   PinnedHTTPConn / PinnedHTTPSConn subclass that overrides connect() to
   bind socket.create_connection to (addr, port). For HTTPS, TLS
   server_hostname is set to the original host so SNI + cert
   verification still work against the named host while the TCP
   destination is the pinned IP.

Tests added: redirect-to-loopback short-circuits at validation;
too-many-redirects exhausts max_hops; 2xx returns body; non-2xx raises.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:28:55 +10:00
root
7707b7eb00 chore: version 2.0.0-alpha.4 + changelog + plan-4 completion doc
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:25:31 +10:00
root
13fac102dd feat(cron): daily sync.source_doc enqueue
node-cron schedules runSync at 03:00 local time; runSync enqueues
sync.source_doc for every source_docs row with sync_source='url'.
Started from server.js's CLI gate alongside the job queue.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:14:07 +10:00
root
8fa7f71694 feat(workers): sync.source_doc with sha256 diff
Fetches upstream URL via safe_fetch, sha256-diffs against the prior
body_sha stored in metadata, updates body_text + last_synced only when
content changed. Unchanged syncs just touch last_synced.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:13:27 +10:00
root
cd1d69c689 feat(workers): safe_fetch Python port
Mirrors lib/ingest/safe_fetch.js. Same scheme + IP-range checks and
VOID_INGEST_ALLOW_PRIVATE env gate. Used by sync.source_doc and any
future Python workers that fetch user-controlled URLs.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:12:47 +10:00
root
65fd71dc0d fix(workers): yt-dlp argv injection — scheme check + -- separator
The url passed to yt-dlp is user-controllable (via /api/capture). Any
string starting with '-' would be parsed as a flag (e.g.
--config-location=/etc/passwd). Mitigations:
1. Validate scheme is http(s) and hostname is present before subprocess.
2. Pass `--` to yt-dlp so it stops flag parsing before the positional
   URL.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:11:57 +10:00
root
b10b68582d feat(api): capture routes YouTube/Vimeo URLs to ingest.video
POST /api/capture with a youtube.com / youtu.be / vimeo.com URL
enqueues ingest.video (Python worker) instead of ingest.url
(Node worker). Detection by URL hostname; idempotency_key + response
shape unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:08:16 +10:00
root
1ba7aae439 feat(workers): ingest.video via yt-dlp + Whisper
yt-dlp pulls metadata (title, description, uploader, thumbnail) and
bestaudio (opus). faster-whisper transcribes; audio file removed after.
Creates a refs row with kind='video' and source_kind='youtube' for
YouTube URLs, generic 'video' otherwise. Idempotent on
sha256(space_id + url) via refs.external_id.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:07:33 +10:00
root
e64f1345f6 feat(workers): whisper loader with CUDA detect + CPU fallback
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:06:50 +10:00
root
2adeae555d fix(deploy): push-workers.sh chowns + preserves .env
Rsync ran as root over SSH so files landed root-owned, but workers run
as voidworkers — the service couldn't even reach the venv binary.
Now: chown -R voidworkers after rsync, run venv create + pip install
under `su voidworkers -c`. Also excludes .env, .gitignore, .pytest_cache
so they survive across deploys.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 10:06:29 +10:00
root
3d82f0e5d5 feat(jobs): blob worker fans out to extract.pdf / extract.image
After creating a ref, the Node-side ingest.blob worker enqueues a
follow-up job for the Python void-workers (Plan 4) to OCR / extract
text. Other kinds (file) get no follow-up.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 09:34:06 +10:00
root
f2035c1de6 feat(workers): extract.image via Tesseract
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 05:00:21 +10:00
root
1f0e9a5f1b feat(workers): extract.pdf with Tesseract fallback
pdftotext first; falls back to per-page pdftoppm rasterization +
Tesseract OCR when the extracted text is < 200 chars. Updates
refs.body_text + metadata.extract.{method,chars} via the repo shim;
audit entry emitted with actor_kind='worker'.

born_digital.pdf fixture padded so pdftotext yields > 200 chars and
the test exercises the pdftotext path, not the OCR fallback.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:59:53 +10:00
root
bbb08a677e test(workers): pdf/image test fixtures
born_digital.pdf (pdftotext extractable), scanned.pdf (image-only, OCR
fallback target), eng_text.png (clear Tesseract-readable text).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:57:41 +10:00
root
2a6f7f88ef feat(workers): systemd unit + push-workers.sh
Deploy README extended with workers bootstrap + note on the void2-db
SQL_ASCII cluster requiring client_encoding=UTF8 on Python clients.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:46:58 +10:00
root
fba1ce48e4 feat(workers): runner loop + echo handler
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:43:52 +10:00
root
3e1dcbb7f8 feat(workers): pgboss claim/complete/fail via psycopg
Adds the Boss class — SELECT … FOR UPDATE SKIP LOCKED to atomically
claim, UPDATE state on completion. Retry semantics match pg-boss:
exponential backoff via retry_count / retry_delay / retry_backoff.

Forces client_encoding=UTF8 on every connection. The void2-db cluster
was initialized as SQL_ASCII so psycopg refuses to decode text by
default; UTF8 client_encoding works because the data is already UTF-8.
Node's pg lib is more forgiving and didn't surface this.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:43:26 +10:00
root
6e3798f6d1 feat(workers): Python skeleton + config + structlog
Plan 4 Phase A scaffolding. void-workers package at /workers/, sibling
of /lib/. pyproject.toml pins Python 3.12 with separate extras for
pdf / image / video / test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:41:33 +10:00
root
c4663992ec docs: Plan 4 implementation plan
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:39:55 +10:00
root
7514d9bee6 docs: Plan 4 design spec (Python void-workers)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:33:48 +10:00
root
54ba68a11c docs: move void-v2 specs + plans into the repo
All Void 2.0 superpowers specs and implementation plans now live at
docs/superpowers/{specs,plans}/ inside the repo. Previously they were
at /project/docs/superpowers/ which was not under git.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:11:32 +10:00
root
24ce601d94 fix(ingest): pinnedDispatcher lookup must use undici array form
cb(null, address, family) was returning Invalid IP address: undefined
under undici v6. Returning the full records array (each {address, family})
gives undici what it expects and lets it pick the best family.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:10:47 +10:00
root
837bf2a5b4 docs: Plan 3 completion summary
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:01:12 +10:00
root
a02a96ea5f chore: version 2.0.0-alpha.3 + changelog
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 04:00:32 +10:00
root
2ad4a32b3a feat(ui): Jobs panel with retry/delete + 10s polling
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:56:52 +10:00
root
063c29a835 feat(ui): drag-drop capture onto the main panel
Drops into #main POST /api/capture/upload one file at a time, with
space_id pre-filled from localStorage.last_space_id (set whenever the
space view renders).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:56:30 +10:00
root
d7f9bde5e9 feat(api): karakeep webhook (HMAC-verified)
POST /api/ingest/karakeep accepts Karakeep webhook payloads. HMAC
signature on the raw body captured by express.json's verify hook.
Mounted on app before mountApi so it bypasses agentOrOwner — the
shared secret IS the auth.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:55:57 +10:00
root
d1e986bc9c feat(jobs): ingest.karakeep worker
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:55:03 +10:00
root
de1d7e3476 feat(karakeep): bookmark fetch client
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:54:21 +10:00
root
62ac022f65 test(ai): live ollama embed integration (gated)
Auto-skips when CT 102 / OLLAMA_URL is unreachable.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:50:49 +10:00
root
f116811dda feat(search): hybrid FTS + vector with RRF + graceful Ollama fallback
Replaces FTS-only /api/search in place. RRF (k=60) fuses ts_rank and
pgvector cosine distance rankings. Vector branch silently skipped when
Ollama times out / errors, keeping search snappy and resilient.

Messages have no embeddings in Plan 3, so they participate in the FTS
branch only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:50:33 +10:00
root
99ab1ffb70 fix(ingest): pin resolved IP into safe_fetch to defeat DNS-rebinding
Replaces the validate-then-call-fetch pattern (which left a TOCTOU
window where the OS resolver could return a different IP at connect
time) with an undici Agent dispatcher whose lookup() returns the IP we
already validated. Same hardening on every redirect hop.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:48:52 +10:00
root
e558be49a9 feat(jobs): repo-level embed triggers (pages/refs/source_docs)
create/update on embeddable repos enqueue embed.text with a singleton
key that coalesces rapid edits. No-op when the queue is not running
(server tests construct createApp without booting pg-boss).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:48:03 +10:00
root
37b7753360 feat(jobs): embed.text worker (Ollama → vector(1024))
Pads nomic-embed-text's 768 dims to 1024 zeros so a later 1024-dim model
swap is a re-embed, not a migration (per master spec).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:43:57 +10:00
root
5799ea663e feat(ai): ollama embed-text wrapper
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:43:27 +10:00
root
afc20712cb feat(api): capture POST + upload + SSRF-safe URL fetch
safe_fetch.js validates URLs before fetch: rejects non-http(s), literal
or DNS-resolved loopback / RFC1918 / link-local / CGNAT / metadata
addresses; follows redirects manually with the same checks on each hop.
Test fixtures gate the check with VOID_INGEST_ALLOW_PRIVATE for offline
fixtures that hit 127.0.0.1.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:42:54 +10:00
root
eceebd2947 feat(jobs): ingest.blob worker (content-addressed)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:36:15 +10:00
root
3ccfd20b5f feat(jobs): ingest.url worker (fetch + readability + idempotent ref)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:35:44 +10:00
root
6e973404e9 feat(ingest): content-addressed blob store
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:35:06 +10:00
root
c6e72e93d5 feat(ingest): readability wrapper
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:34:51 +10:00
root
8d2afcd040 chore(deps): @mozilla/readability + jsdom + multer for ingest
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:34:35 +10:00
root
6d42c7b440 feat(ui): jobs view stub + sidebar entry
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:32:16 +10:00
root
ec8517a82c feat(api): jobs routes (list/get/retry/delete, owner-only)
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:29:52 +10:00
root
57efa4cbaa feat(jobs): jobs repo (list/getById/retry/remove)
Unifies pgboss.job (current, per-queue partitioned) and pgboss.archive
under one SELECT for operator views. retry promotes archived rows back
into the active partition.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-06-01 03:29:03 +10:00