7.7 KiB
Floating Dross Chat — Design
Date: 2026-06-09 Status: Approved (pending final spec sign-off) Goal: Replace the docked, per-Space "Cradle Chat" with a global, movable floating-bubble Dross companion — mobile-first, with voice-clip input transcribed locally into instructions.
Background / problem
Today the companion lives in the right rail (public/components/rightrail.js). It is per-Space: it binds to the active Space's companion conversation (/api/spaces/:space_id/companion) and shows "Open a Space to chat with its companion." everywhere else — Sacred Valley, the apps, etc. Because the user mostly lives on non-Space views, the chat is empty/collapsed most of the time, which is why it "feels closed and not very Dross." The right rail is also cramped on mobile.
The chat mechanics are already factored into a reusable engine (public/components/agent_chat.js, wireAgentChat({logEl, inputEl, historyUrl, turnUrl, …})). Turns stream over SSE via lib/ai/agent/run_turn.js. Dross is the agent with slug companion.
Locked decisions (from brainstorming)
- Global Dross — one always-available companion, summonable on every view; not tied to a Space. He is told what the user is currently looking at (view context) but isn't locked to it.
- Floating bubble — a draggable violet orb that opens a draggable chat panel anchored to the orb. Replaces the right-rail companion. Position + open/closed state persist. Mobile = near-full-width panel.
- Collapse / close — keep the close (✕) top-right, and add a thumb-friendly "⌄ collapse" bar at the bottom of the panel. Both minimise back to the orb.
- Avatar — default Soft Eye; selectable in Settings between Soft Eye, Wisp Core, Orbiting Motes (all violet).
- Colour — Dross is violet by default, but his accent is tunable in Settings (his own vars, independent of the UI theme).
- Persona — give him the real Cradle-Dross voice (dry, sardonic, impatient, brilliant, secretly loyal) via an editable system prompt in Settings (tunable).
- Voice — record a clip → transcribe with local faster-whisper on the Ollama box (CT 102, GPU, CPU-fallback) → transcript lands in the input for review-and-send first (mode 1). A voice-mode setting allows graduating to hands-free auto-send (mode 2), then interpret-into-confirmable-action (mode 3) later.
- Audio retention (Phase 2, added 2026-06-09) — by default the clip is transcribed then destroyed (transient). Add a Dross setting "Keep voice clips" that, when on, saves each audio clip paired with its transcript, stored safely and securely (encrypted at rest / access-controlled; on a homelab dataset, owner-only — exact store TBD in P2: e.g. a
voice_clipstable + blob on a ZFS dataset, or object store). Off by default. This is a P2 deliverable, designed-for now.
Non-goals (this iteration)
- Voice modes 2 and 3 are designed-for but not built now (mode setting ships; only mode 1 wired).
- Multi-conversation history browser, per-Space companions in the bubble, and wake-word/always-listening are out of scope.
Architecture
Components
| Unit | Responsibility |
|---|---|
public/components/dross_bubble.js (new) |
The floating orb + panel: render, drag (orb & panel header), anchored open, collapse/close, avatar switch, voice record UI. Drives chat via wireAgentChat. Replaces the renderRightrail mount in app.js. |
public/components/dross_avatar.js (new) |
Pure render of the chosen avatar (soft-eye / wisp / motes) at a given size — reused by orb + panel header + settings preview. |
lib/api/routes/dross.js (new) |
Global (space-less) Dross: GET /api/dross (history + conversation id) and POST /api/dross/turn (SSE). Mirrors companion.js but resolves a global conversation for the companion agent and injects the persona + view context. |
lib/api/routes/voice.js (new) |
POST /api/voice/transcribe — accepts an audio blob, proxies to the faster-whisper service, returns { text }. Owner-only. |
public/views/settings.js (extend) |
New Dross section: avatar picker, accent colour, persona textarea, voice-mode select. Persists to app_settings key dross. |
| faster-whisper service on CT 102 (infra) | OpenAI-compatible /v1/audio/transcriptions (e.g. faster-whisper-server/speaches), GPU with CPU fallback, small/base model. Shares the Ollama LXC. |
Settings shape (app_settings key dross)
{
"avatar": "soft-eye", // soft-eye | wisp | motes
"accent": "#a86adf", // Dross's violet (independent of UI theme)
"persona": "<system prompt text>",
"voiceMode": "review" // review | handsfree | action(later)
}
Reuses the generic app_settings store (added in 2.9.0) and the /api/theme-style read-on-boot pattern. The bubble fetches dross settings on mount; the Settings panel writes them.
Data flow
Text turn: input → wireAgentChat → POST /api/dross/turn (body { text, view }) → SSE stream of Dross's reply (+ tool labels) into the panel log. History via GET /api/dross.
Voice turn (mode 1): tap mic → MediaRecorder captures a clip → on stop, POST /api/voice/transcribe (audio blob) → void-app proxies to CT 102 faster-whisper → { text } → text dropped into the input for the user to review/edit → user sends as a normal turn. (Mode 2 would auto-send; mode 3 would route the transcript through an interpret step.)
Persona: the dross.persona setting is injected as/with the agent's system prompt in run_turn for the global conversation, so his voice is consistent and user-tunable.
Context: view (current route/entity) is passed in the turn body so Dross can answer "what am I looking at" questions.
Error handling
- STT unavailable / GPU absent: transcribe endpoint returns a clear error; the bubble shows "couldn't transcribe — type instead" and never blocks text input. faster-whisper falls back to CPU on a GPU-less node (per the GPU/CPU-fallback HA rule) — slower but functional.
- Mic permission denied: show a one-line hint; hide the recording UI, keep typing.
- Turn/stream failure: existing
agent_chaterror path (surfaces an error bubble); retain the typed/transcribed text so it isn't lost. - No token / 401: bubble stays collapsed; opening prompts the normal owner-token flow.
Testing
- Headless UI: bubble renders; orb → open (anchored) → drag → collapse (bottom bar) → close (✕); each avatar variant renders; mobile width = near-full panel.
- Settings: changing avatar/accent/persona/voiceMode persists (
app_settings) and re-applies on reload. - API:
GET /api/drossreturns a global conversation;POST /api/dross/turnstreams;POST /api/voice/transcribereturns{text}for a sample WAV (mock the whisper service in the unit test; one live smoke test against CT 102). - Persona: a turn reflects the configured system prompt.
Build phases
- P1 — Floating bubble + global Dross + settings. New
dross_bubble.js+dross_avatar.js,dross.jsroute (global conversation), Settings → Dross section (avatar/accent/persona/voice-mode). Retire the right-rail companion. No voice yet. Ship-able on its own. - P2 — Voice (review-and-send). faster-whisper on CT 102,
voice.jstranscribe proxy, record UI + waveform, transcript → input → review → send. - P3 — Later. Voice mode 2 (hands-free auto-send), then mode 3 (interpret transcript into a confirmable action via the existing Little Blue action framework).
Documentation
Per the standing rule, ship docs to the Void wiki + Gitea (Hynes/Void-Homelab) with each phase; spec + plan under docs/superpowers/. Mockup at docs/mockups/dross-chat.html.