Whisper (CT 311) and Ollama (CT 102) share one A2000. Before loading Whisper on CUDA, ask Ollama to unload its models (GET /api/ps then POST /api/generate keep_alive:0) and wait for the card to clear, so the GPU load has headroom. Best-effort and stdlib-only; Ollama reloads cooperatively, and the existing CUDA->CPU fallback covers any failure. Toggle via OLLAMA_FREE_BEFORE_STT; endpoint via OLLAMA_URL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
void-workers
Python ML ingest service alongside void-server (Node). Sibling of lib/ in the void-v2 repo.
Local dev
cd workers
python3.12 -m venv .venv
. .venv/bin/activate
pip install -e ".[all]"
export DATABASE_URL="postgres://..."
python -m void_workers.runner
Tests
pip install -e ".[test,all]"
DATABASE_URL="postgres://..." pytest -v
See ../docs/superpowers/plans/2026-06-01-void-v2-plan4-workers.md for the full plan and ../docs/superpowers/specs/2026-06-01-void-v2-plan4-workers.md for the design.