Void-Homelab

Hynes/Void-Homelab

Fork 0

Commit Graph

Author	SHA1	Message	Date
root	a9191cee00	feat(workers): free Ollama VRAM before loading Whisper on the GPU Whisper (CT 311) and Ollama (CT 102) share one A2000. Before loading Whisper on CUDA, ask Ollama to unload its models (GET /api/ps then POST /api/generate keep_alive:0) and wait for the card to clear, so the GPU load has headroom. Best-effort and stdlib-only; Ollama reloads cooperatively, and the existing CUDA->CPU fallback covers any failure. Toggle via OLLAMA_FREE_BEFORE_STT; endpoint via OLLAMA_URL. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 21:12:05 +10:00
root	3c028fed5a	fix(workers): graceful GPU→CPU fallback for Whisper at load time cuda_available() only covers "no GPU present". On a shared card the GPU can exist but fail to load the model (VRAM exhausted by another process e.g. Ollama). Try CUDA first, fall back to a CPU model on any load error instead of crashing the transcription job. Supports HA portability (node without GPU) and a contended GPU. Adds GPU-path + fallback tests. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-05 08:04:14 +10:00
root	e64f1345f6	feat(workers): whisper loader with CUDA detect + CPU fallback Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>	2026-06-01 10:06:50 +10:00

Author

SHA1

Message

Date

root

a9191cee00

feat(workers): free Ollama VRAM before loading Whisper on the GPU

Whisper (CT 311) and Ollama (CT 102) share one A2000. Before loading
Whisper on CUDA, ask Ollama to unload its models (GET /api/ps then POST
/api/generate keep_alive:0) and wait for the card to clear, so the GPU
load has headroom. Best-effort and stdlib-only; Ollama reloads
cooperatively, and the existing CUDA->CPU fallback covers any failure.
Toggle via OLLAMA_FREE_BEFORE_STT; endpoint via OLLAMA_URL.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-05 21:12:05 +10:00

root

3c028fed5a

fix(workers): graceful GPU→CPU fallback for Whisper at load time

cuda_available() only covers "no GPU present". On a shared card the GPU
can exist but fail to load the model (VRAM exhausted by another process
e.g. Ollama). Try CUDA first, fall back to a CPU model on any load
error instead of crashing the transcription job. Supports HA portability
(node without GPU) and a contended GPU. Adds GPU-path + fallback tests.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

2026-06-05 08:04:14 +10:00

root

e64f1345f6

feat(workers): whisper loader with CUDA detect + CPU fallback

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>

2026-06-01 10:06:50 +10:00

3 Commits