Floating Dross Chat — Design

Date: 2026-06-09 Status: Approved (pending final spec sign-off) Goal: Replace the docked, per-Space "Cradle Chat" with a global, movable floating-bubble Dross companion — mobile-first, with voice-clip input transcribed locally into instructions.

Background / problem

Today the companion lives in the right rail (public/components/rightrail.js). It is per-Space: it binds to the active Space's companion conversation (/api/spaces/:space_id/companion) and shows "Open a Space to chat with its companion." everywhere else — Sacred Valley, the apps, etc. Because the user mostly lives on non-Space views, the chat is empty/collapsed most of the time, which is why it "feels closed and not very Dross." The right rail is also cramped on mobile.

The chat mechanics are already factored into a reusable engine (public/components/agent_chat.js, wireAgentChat({logEl, inputEl, historyUrl, turnUrl, …})). Turns stream over SSE via lib/ai/agent/run_turn.js. Dross is the agent with slug companion.

Locked decisions (from brainstorming)

Global Dross — one always-available companion, summonable on every view; not tied to a Space. He is told what the user is currently looking at (view context) but isn't locked to it.
Floating bubble — a draggable violet orb that opens a draggable chat panel anchored to the orb. Replaces the right-rail companion. Position + open/closed state persist. Mobile = near-full-width panel.
Collapse / close — keep the close (✕) top-right, and add a thumb-friendly "⌄ collapse" bar at the bottom of the panel. Both minimise back to the orb.
Avatar — default Soft Eye; selectable in Settings between Soft Eye, Wisp Core, Orbiting Motes (all violet).
Colour — Dross is violet by default, but his accent is tunable in Settings (his own vars, independent of the UI theme).
Persona — give him the real Cradle-Dross voice (dry, sardonic, impatient, brilliant, secretly loyal) via an editable system prompt in Settings (tunable).
Voice — record a clip → transcribe with local faster-whisper on the Ollama box (CT 102, GPU, CPU-fallback) → transcript lands in the input for review-and-send first (mode 1). A voice-mode setting allows graduating to hands-free auto-send (mode 2), then interpret-into-confirmable-action (mode 3) later.
Audio retention (Phase 2, added 2026-06-09) — by default the clip is transcribed then destroyed (transient). Add a Dross setting "Keep voice clips" that, when on, saves each audio clip paired with its transcript, stored safely and securely (encrypted at rest / access-controlled; on a homelab dataset, owner-only — exact store TBD in P2: e.g. a voice_clips table + blob on a ZFS dataset, or object store). Off by default. This is a P2 deliverable, designed-for now.

Non-goals (this iteration)

Voice modes 2 and 3 are designed-for but not built now (mode setting ships; only mode 1 wired).
Multi-conversation history browser, per-Space companions in the bubble, and wake-word/always-listening are out of scope.

Architecture

Components

Unit	Responsibility
`public/components/dross_bubble.js` (new)	The floating orb + panel: render, drag (orb & panel header), anchored open, collapse/close, avatar switch, voice record UI. Drives chat via `wireAgentChat`. Replaces the `renderRightrail` mount in `app.js`.
`public/components/dross_avatar.js` (new)	Pure render of the chosen avatar (soft-eye / wisp / motes) at a given size — reused by orb + panel header + settings preview.
`lib/api/routes/dross.js` (new)	Global (space-less) Dross: `GET /api/dross` (history + conversation id) and `POST /api/dross/turn` (SSE). Mirrors `companion.js` but resolves a global conversation for the `companion` agent and injects the persona + view context.
`lib/api/routes/voice.js` (new)	`POST /api/voice/transcribe` — accepts an audio blob, proxies to the faster-whisper service, returns `{ text }`. Owner-only.
`public/views/settings.js` (extend)	New Dross section: avatar picker, accent colour, persona textarea, voice-mode select. Persists to `app_settings` key `dross`.
faster-whisper service on CT 102 (infra)	OpenAI-compatible `/v1/audio/transcriptions` (e.g. `faster-whisper-server`/`speaches`), GPU with CPU fallback, small/base model. Shares the Ollama LXC.

Settings shape (`app_settings` key `dross`)

{
  "avatar": "soft-eye",            // soft-eye | wisp | motes
  "accent": "#a86adf",             // Dross's violet (independent of UI theme)
  "persona": "<system prompt text>",
  "voiceMode": "review"            // review | handsfree | action(later)
}

Reuses the generic app_settings store (added in 2.9.0) and the /api/theme-style read-on-boot pattern. The bubble fetches dross settings on mount; the Settings panel writes them.

Data flow

Text turn: input → wireAgentChat → POST /api/dross/turn (body { text, view }) → SSE stream of Dross's reply (+ tool labels) into the panel log. History via GET /api/dross.

Voice turn (mode 1): tap mic → MediaRecorder captures a clip → on stop, POST /api/voice/transcribe (audio blob) → void-app proxies to CT 102 faster-whisper → { text } → text dropped into the input for the user to review/edit → user sends as a normal turn. (Mode 2 would auto-send; mode 3 would route the transcript through an interpret step.)

Persona: the dross.persona setting is injected as/with the agent's system prompt in run_turn for the global conversation, so his voice is consistent and user-tunable.

Context: view (current route/entity) is passed in the turn body so Dross can answer "what am I looking at" questions.

Error handling

STT unavailable / GPU absent: transcribe endpoint returns a clear error; the bubble shows "couldn't transcribe — type instead" and never blocks text input. faster-whisper falls back to CPU on a GPU-less node (per the GPU/CPU-fallback HA rule) — slower but functional.
Mic permission denied: show a one-line hint; hide the recording UI, keep typing.
Turn/stream failure: existing agent_chat error path (surfaces an error bubble); retain the typed/transcribed text so it isn't lost.
No token / 401: bubble stays collapsed; opening prompts the normal owner-token flow.

Testing

Headless UI: bubble renders; orb → open (anchored) → drag → collapse (bottom bar) → close (✕); each avatar variant renders; mobile width = near-full panel.
Settings: changing avatar/accent/persona/voiceMode persists (app_settings) and re-applies on reload.
API: GET /api/dross returns a global conversation; POST /api/dross/turn streams; POST /api/voice/transcribe returns {text} for a sample WAV (mock the whisper service in the unit test; one live smoke test against CT 102).
Persona: a turn reflects the configured system prompt.

Build phases

P1 — Floating bubble + global Dross + settings. New dross_bubble.js + dross_avatar.js, dross.js route (global conversation), Settings → Dross section (avatar/accent/persona/voice-mode). Retire the right-rail companion. No voice yet. Ship-able on its own.
P2 — Voice (review-and-send). faster-whisper on CT 102, voice.js transcribe proxy, record UI + waveform, transcript → input → review → send.
P3 — Later. Voice mode 2 (hands-free auto-send), then mode 3 (interpret transcript into a confirmable action via the existing Little Blue action framework).

Documentation

Per the standing rule, ship docs to the Void wiki + Gitea (Hynes/Void-Homelab) with each phase; spec + plan under docs/superpowers/. Mockup at docs/mockups/dross-chat.html.

7.7 KiB Raw Blame History