Files
Void-Homelab/docs/superpowers/specs/2026-06-08-lan-device-discovery-design.md
2026-06-08 20:48:21 +10:00

167 lines
9.0 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design: LAN Device Discovery (MAC inventory + review/name)
**Date:** 2026-06-08
**Status:** Approved (brainstorm), pending implementation plan
**Repo:** void-v2 (CT 311 / `void-app`)
## Summary
Replace the static, hand-maintained `public/devices.json` with a **persistent,
MAC-keyed device store** fed by a recurring ARP scan. Each scan **logs MACs to
the DB and diffs against what's known** — new devices land in a review queue;
known devices just get their IP / `last_seen` / presence updated. The owner can
**add a discovered device, edit it, and give it a name** for reference (mirrors
the Void's existing services "discovered → promote" pattern).
## Background
- **Static today:** `public/views/devices_band.js` does `fetch('/devices.json')`
— a curated, manually-edited list (IP/MAC/vendor/group/flag). It re-reads a
static file; nothing is persisted or diffed.
- **Existing precedent to mirror:** `monitored_services` uses
`source='discovered' AND NOT enabled` as a review queue; `PATCH /services/:id`
promotes + edits. We reproduce this shape for devices.
- **Separate from `network_hosts`:** that table is the homelab-guest inventory
(Proxmox `BC:24:11:*` LXCs, infra_audit). The devices band is IoT / personal /
unknown LAN gear — kept separate.
- **Scan engine:** the Void host (CT 311) has `ip`/`arp` but **not**
`nmap`/`arp-scan`. We add `arp-scan` (chosen for reliable L2 ARP sweeps that
ICMP-blocking devices can't dodge, plus a built-in OUI vendor DB).
### Lessons borrowed from Scanopy (self-hosted discovery tool)
- **Decouple scanner from storage/UI** — the scanner just scans and reports; the
server owns dedup + persistence. → isolated `lib/infra/scan.js`.
- **MAC is the identity, IP is a mutable attribute** — key on MAC, update IP each
scan (handles DHCP churn). → `mac` primary key.
- **Scheduled rescans + timestamp inventory** — periodic batch with
`first_seen`/`last_seen`/`present`, diff by "MAC seen before?". → hourly cron.
- **Vendor via OUI** — `arp-scan` ships an OUI database; vendor is free.
- **Randomized MACs are an open problem** even for Scanopy — so we at least
**flag** locally-administered MACs so the user knows OUI can't ID them.
## Decisions
| Decision | Choice | Rationale |
|---|---|---|
| Scan engine | **`arp-scan --localnet` on CT 311**, hourly cron | Reliable L2 sweep + built-in OUI; self-contained (no external scanner dep). |
| Cadence | **Hourly** (staggered, e.g. `7 * * * *`) | "No rush"; device drift is slow. |
| DB growth | **Upsert by MAC — one row per device, no per-scan history** | Table is bounded by distinct devices ever seen (dozenshundreds), not scan count → no bloat. |
| Identity | **MAC primary key**; IP a mutable column | Survives DHCP IP changes. |
| Review flow | Mirror services `discovered → promote` | New MAC → `status='new'`; owner names/edits → `status='known'`. |
| Source of truth | **DB** (`lan_devices`); `devices.json` becomes the one-time migration seed, then removed | Single source of truth. |
| Randomized-MAC bloat | **Auto-prune unreviewed + absent rows** (randomized >24h, others >14d); keep `known`/`ignored` forever | Rotated randomized MACs never accumulate; the table stays bounded. |
## Architecture
scan (arp-scan) → `parseArpScan` (+randomized flag) → `upsertScan` by MAC →
`markAbsent` for unseen → review queue (`status='new'`) → owner names/groups/promotes
→ known devices render in the band.
## Components
### Migration `024_lan_devices.sql`
Table `lan_devices`:
- `mac text PRIMARY KEY`
- `ip text`, `vendor text`
- `name text` (owner-given reference name, null until named)
- `grp text` (Smart Home | Entertainment | Personal | Network | Flagged)
- `note text`
- `status text NOT NULL DEFAULT 'new'` (`new` | `known` | `ignored`)
- `randomized boolean NOT NULL DEFAULT false` (locally-administered MAC)
- `flagged boolean NOT NULL DEFAULT false`
- `first_seen timestamptz NOT NULL DEFAULT now()`
- `last_seen timestamptz NOT NULL DEFAULT now()`
- `present boolean NOT NULL DEFAULT true`
**Seed (embedded SQL, from the current curated `devices.json`):**
- Devices **with a MAC**: non-flagged → `status='known'` with their name/group;
flagged (e.g. `.15` ASUS) → `status='new'`, `flagged=true`.
- The `.13` Orbi satellite and `.171` Galaxy Tab S4 fixes carry over as `known`.
- MAC-less curated entries (`.21/.22/.34/.35/.51`, currently offline) are **not
seeded** — they reappear as `new` (with a real MAC) the first time they're seen
online. (Documented so it's expected, not a gap.)
### `lib/infra/scan.js` (decoupled scanner)
- `parseArpScan(text) -> [{ ip, mac, vendor, randomized }]`**pure** parser of
`arp-scan` tab-separated output (skips banner/footer); `randomized` = first
octet has the locally-administered bit (`& 0x02`).
- `isRandomizedMac(mac) -> boolean` — pure helper.
- `runScan({ exec }) -> rows` — shells `arp-scan --localnet -x` (interface
auto/`-I eth0`), returns `parseArpScan(stdout)`. `exec` injected for tests.
### `lib/db/repos/lan_devices.js`
- `upsertScan(rows)` — insert unseen MACs as `status='new'`; for existing, update
`ip`, `vendor`, `last_seen=now()`, `present=true` (never overwrite owner
`name`/`grp`/`status`).
- `markAbsent(seenMacs)``present=false` for MACs not in the latest scan.
- `listKnown()` (`status='known'`, grouped by `grp`), `listDiscovered()`
(`status='new'`), `get(mac)`, `update(mac, {name, grp, status, note, flagged})`,
`remove(mac)`. (`ignored` devices show in neither.)
- `prune()` — delete unreviewed + absent rows past their TTL: `status='new' AND
present=false AND ((randomized AND last_seen < now()-'24h') OR (NOT randomized
AND last_seen < now()-'14d'))`. Never touches `known`/`ignored`.
### Cron (`lib/cron/index.js`)
Add hourly (`7 * * * *`): `runScan()` → `upsertScan` → `markAbsent` → `prune()`.
Wrapped in try/catch — a scan failure logs and never crashes the cron, and
`prune()` only runs after a *successful* scan (so a failed scan can't reap rows).
### API `lib/api/routes/devices.js` (mount `/api/devices`, owner-gated)
- `GET /` — known devices grouped for the band.
- `GET /discovered` — `status='new'` review queue.
- `PATCH /:mac` — set `name`/`grp`/`status`/`note`/`flagged` (this is "add from
discovered" + "edit" + "name"); promoting = `status:'known'`.
- `DELETE /:mac` — remove.
- `POST /scan` — run a scan immediately (owner).
- `:mac` param validated against a MAC regex.
### Frontend
- `public/views/devices_band.js` — fetch `/api/devices` (grouped) instead of the
static file; render the MAC (existing `.dv-mac` style from today's change).
- **Discovered review** — a section/panel listing `/api/devices/discovered`, each
with an **Add / Edit** form (name + group select + notes) that `PATCH`es to
promote; plus inline edit for known devices and an Ignore/Delete action.
- **Randomized devices** get a small "randomized MAC" badge (with a tooltip:
naming pins it only until the MAC rotates; disable SSID randomization for
stable tracking). A `known` device that's been `present=false` for ≥30d shows
an "absent Nd" marker for easy manual cleanup (never auto-deleted).
- Remove `public/devices.json` (superseded by the DB).
## Infra setup (one-time, on CT 311)
`apt install arp-scan` + grant the binary raw-socket capability so the non-root
`void` service user can run it:
`setcap cap_net_raw,cap_net_admin+eip /usr/sbin/arp-scan`. Captured in
`deploy/README.md`. If the capability/tool is missing, the scan logs a clear
error and the feature degrades to "no new discoveries" (existing data still shows).
## Error handling
- `arp-scan` missing / unprivileged / non-zero exit → `runScan` throws; cron
catches, logs, leaves the DB untouched (known devices still render).
- Empty/garbled scan output → `parseArpScan` returns `[]`; `markAbsent([])` is a
no-op guard (never blanket-marks everything absent on a failed scan).
- Bad MAC in PATCH → 400 via zod.
## Testing
- **`parseArpScan` / `isRandomizedMac`** — pure unit tests (sample arp-scan
output incl. a randomized MAC, banner/footer lines, a malformed line).
- **`lan_devices` repo** (vitest + test DB) — `upsertScan` inserts new vs updates
existing without clobbering owner fields; `markAbsent` flips presence; promote
via `update`.
- **API** (supertest) — `/discovered` lists only `new`; `PATCH` promotes/edits;
owner-gated.
- **Frontend** (jsdom) — band renders groups + MAC from `/api/devices`;
discovered panel renders the add/edit form.
- **Manual** — `POST /api/devices/scan`, confirm new devices appear, name one,
see it move to the band.
## Out of scope (YAGNI)
- Service/port fingerprinting, SNMP/LLDP topology (that's Scanopy's job).
- Multi-subnet/VLAN scanning (single `/24`).
- Push notifications on new-device discovery.
- Stable identity for randomized-MAC devices across rotations (not solvable from
L2 alone; the user-side fix is disabling MAC randomization for the SSID).
## References
- Scanopy — github.com/scanopy/scanopy ; scanopy.net (self-hosted discovery/topology, AGPL-3.0).