- driver: tool_results arrive as type:'user' content blocks (not bare); parse them
- route: tool_result content is a JSON string; parse it for pending_change_id → draft event
- propose_change: inject ctx.space_id into create payloads (model can't know the uuid; tables require it)
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
claude resolves the MCP server command against the child env (no PATH), so a
bare 'node' failed to spawn (status:failed). Use process.execPath. Also pass
--tools to drop claude's built-ins (Bash/Read/Write/…) — companion gets only
the four mcp__void__* tools.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replaces the runTurn/callModel/Anthropic-API-key path in POST /turn with
runClaudeTurn (claude CLI) backed by a per-turn MCP config that spawns
companion-stdio.js. Extracts pending_change_id from tool_result events
defensively (structuredContent → text-JSON fallback). Rewrites companion
test to inject fake-claude-draft.js via app.locals.claudeExe.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements runClaudeTurn() — spawns the claude CLI for a single companion
turn using subscription/OAuth auth (strips ANTHROPIC_API_KEY +
ANTHROPIC_AUTH_TOKEN from child env), streaming normalised events (delta,
tool, tool_result, result, error) via onEvent callback.
Includes hermetic test + fake-claude.js fixture that mimics real 2.1.159
stream-json output; zero network/CLI calls in the test suite.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Post-outage recovery: a rogue OpenWrt device seized 192.168.1.13 after the
power-cut reboot, ARP-poisoning the LAN so CT 311 was unreachable despite being
healthy. Renumbered CT 311 .13 -> .216 (out of the conflict-prone low range,
next to the DB at .215). Update push.sh + push-workers.sh defaults to
root@192.168.1.216; push.sh no longer defaults to the void2-app hostname (that
resolves to the Cloudflare tunnel and can't carry SSH).
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The prod venv at /opt/void-workers/venv was being deleted on every
push because rsync --delete saw no matching dir in the source (which
has .venv/, not venv/). Now both names are excluded.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Two real findings from the security reviewer:
1. urllib auto-follows 3xx redirects via the default HTTPRedirectHandler.
The previous code's hop loop never ran — urllib silently followed.
Replaced with http.client + a manual hop loop. Every hop re-runs
_validate_url, so an open-redirect to 127.0.0.1 / RFC1918 / metadata
gets caught on the second hop.
2. DNS TOCTOU — _resolve() validated but urllib.request re-resolved on
connect. Now the connection is pinned to the validated IP via a
PinnedHTTPConn / PinnedHTTPSConn subclass that overrides connect() to
bind socket.create_connection to (addr, port). For HTTPS, TLS
server_hostname is set to the original host so SNI + cert
verification still work against the named host while the TCP
destination is the pinned IP.
Tests added: redirect-to-loopback short-circuits at validation;
too-many-redirects exhausts max_hops; 2xx returns body; non-2xx raises.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
node-cron schedules runSync at 03:00 local time; runSync enqueues
sync.source_doc for every source_docs row with sync_source='url'.
Started from server.js's CLI gate alongside the job queue.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Fetches upstream URL via safe_fetch, sha256-diffs against the prior
body_sha stored in metadata, updates body_text + last_synced only when
content changed. Unchanged syncs just touch last_synced.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors lib/ingest/safe_fetch.js. Same scheme + IP-range checks and
VOID_INGEST_ALLOW_PRIVATE env gate. Used by sync.source_doc and any
future Python workers that fetch user-controlled URLs.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
The url passed to yt-dlp is user-controllable (via /api/capture). Any
string starting with '-' would be parsed as a flag (e.g.
--config-location=/etc/passwd). Mitigations:
1. Validate scheme is http(s) and hostname is present before subprocess.
2. Pass `--` to yt-dlp so it stops flag parsing before the positional
URL.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
POST /api/capture with a youtube.com / youtu.be / vimeo.com URL
enqueues ingest.video (Python worker) instead of ingest.url
(Node worker). Detection by URL hostname; idempotency_key + response
shape unchanged.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
yt-dlp pulls metadata (title, description, uploader, thumbnail) and
bestaudio (opus). faster-whisper transcribes; audio file removed after.
Creates a refs row with kind='video' and source_kind='youtube' for
YouTube URLs, generic 'video' otherwise. Idempotent on
sha256(space_id + url) via refs.external_id.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Rsync ran as root over SSH so files landed root-owned, but workers run
as voidworkers — the service couldn't even reach the venv binary.
Now: chown -R voidworkers after rsync, run venv create + pip install
under `su voidworkers -c`. Also excludes .env, .gitignore, .pytest_cache
so they survive across deploys.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
After creating a ref, the Node-side ingest.blob worker enqueues a
follow-up job for the Python void-workers (Plan 4) to OCR / extract
text. Other kinds (file) get no follow-up.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
pdftotext first; falls back to per-page pdftoppm rasterization +
Tesseract OCR when the extracted text is < 200 chars. Updates
refs.body_text + metadata.extract.{method,chars} via the repo shim;
audit entry emitted with actor_kind='worker'.
born_digital.pdf fixture padded so pdftotext yields > 200 chars and
the test exercises the pdftotext path, not the OCR fallback.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Deploy README extended with workers bootstrap + note on the void2-db
SQL_ASCII cluster requiring client_encoding=UTF8 on Python clients.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds the Boss class — SELECT … FOR UPDATE SKIP LOCKED to atomically
claim, UPDATE state on completion. Retry semantics match pg-boss:
exponential backoff via retry_count / retry_delay / retry_backoff.
Forces client_encoding=UTF8 on every connection. The void2-db cluster
was initialized as SQL_ASCII so psycopg refuses to decode text by
default; UTF8 client_encoding works because the data is already UTF-8.
Node's pg lib is more forgiving and didn't surface this.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Plan 4 Phase A scaffolding. void-workers package at /workers/, sibling
of /lib/. pyproject.toml pins Python 3.12 with separate extras for
pdf / image / video / test.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>