Files
Void-Homelab/docs/superpowers/specs/2026-06-08-lan-device-discovery-design.md
2026-06-08 20:48:21 +10:00

9.0 KiB
Raw Blame History

Design: LAN Device Discovery (MAC inventory + review/name)

Date: 2026-06-08 Status: Approved (brainstorm), pending implementation plan Repo: void-v2 (CT 311 / void-app)

Summary

Replace the static, hand-maintained public/devices.json with a persistent, MAC-keyed device store fed by a recurring ARP scan. Each scan logs MACs to the DB and diffs against what's known — new devices land in a review queue; known devices just get their IP / last_seen / presence updated. The owner can add a discovered device, edit it, and give it a name for reference (mirrors the Void's existing services "discovered → promote" pattern).

Background

  • Static today: public/views/devices_band.js does fetch('/devices.json') — a curated, manually-edited list (IP/MAC/vendor/group/flag). It re-reads a static file; nothing is persisted or diffed.
  • Existing precedent to mirror: monitored_services uses source='discovered' AND NOT enabled as a review queue; PATCH /services/:id promotes + edits. We reproduce this shape for devices.
  • Separate from network_hosts: that table is the homelab-guest inventory (Proxmox BC:24:11:* LXCs, infra_audit). The devices band is IoT / personal / unknown LAN gear — kept separate.
  • Scan engine: the Void host (CT 311) has ip/arp but not nmap/arp-scan. We add arp-scan (chosen for reliable L2 ARP sweeps that ICMP-blocking devices can't dodge, plus a built-in OUI vendor DB).

Lessons borrowed from Scanopy (self-hosted discovery tool)

  • Decouple scanner from storage/UI — the scanner just scans and reports; the server owns dedup + persistence. → isolated lib/infra/scan.js.
  • MAC is the identity, IP is a mutable attribute — key on MAC, update IP each scan (handles DHCP churn). → mac primary key.
  • Scheduled rescans + timestamp inventory — periodic batch with first_seen/last_seen/present, diff by "MAC seen before?". → hourly cron.
  • Vendor via OUIarp-scan ships an OUI database; vendor is free.
  • Randomized MACs are an open problem even for Scanopy — so we at least flag locally-administered MACs so the user knows OUI can't ID them.

Decisions

Decision Choice Rationale
Scan engine arp-scan --localnet on CT 311, hourly cron Reliable L2 sweep + built-in OUI; self-contained (no external scanner dep).
Cadence Hourly (staggered, e.g. 7 * * * *) "No rush"; device drift is slow.
DB growth Upsert by MAC — one row per device, no per-scan history Table is bounded by distinct devices ever seen (dozenshundreds), not scan count → no bloat.
Identity MAC primary key; IP a mutable column Survives DHCP IP changes.
Review flow Mirror services discovered → promote New MAC → status='new'; owner names/edits → status='known'.
Source of truth DB (lan_devices); devices.json becomes the one-time migration seed, then removed Single source of truth.
Randomized-MAC bloat Auto-prune unreviewed + absent rows (randomized >24h, others >14d); keep known/ignored forever Rotated randomized MACs never accumulate; the table stays bounded.

Architecture

scan (arp-scan) → parseArpScan (+randomized flag) → upsertScan by MAC → markAbsent for unseen → review queue (status='new') → owner names/groups/promotes → known devices render in the band.

Components

Migration 024_lan_devices.sql

Table lan_devices:

  • mac text PRIMARY KEY
  • ip text, vendor text
  • name text (owner-given reference name, null until named)
  • grp text (Smart Home | Entertainment | Personal | Network | Flagged)
  • note text
  • status text NOT NULL DEFAULT 'new' (new | known | ignored)
  • randomized boolean NOT NULL DEFAULT false (locally-administered MAC)
  • flagged boolean NOT NULL DEFAULT false
  • first_seen timestamptz NOT NULL DEFAULT now()
  • last_seen timestamptz NOT NULL DEFAULT now()
  • present boolean NOT NULL DEFAULT true

Seed (embedded SQL, from the current curated devices.json):

  • Devices with a MAC: non-flagged → status='known' with their name/group; flagged (e.g. .15 ASUS) → status='new', flagged=true.
  • The .13 Orbi satellite and .171 Galaxy Tab S4 fixes carry over as known.
  • MAC-less curated entries (.21/.22/.34/.35/.51, currently offline) are not seeded — they reappear as new (with a real MAC) the first time they're seen online. (Documented so it's expected, not a gap.)

lib/infra/scan.js (decoupled scanner)

  • parseArpScan(text) -> [{ ip, mac, vendor, randomized }]pure parser of arp-scan tab-separated output (skips banner/footer); randomized = first octet has the locally-administered bit (& 0x02).
  • isRandomizedMac(mac) -> boolean — pure helper.
  • runScan({ exec }) -> rows — shells arp-scan --localnet -x (interface auto/-I eth0), returns parseArpScan(stdout). exec injected for tests.

lib/db/repos/lan_devices.js

  • upsertScan(rows) — insert unseen MACs as status='new'; for existing, update ip, vendor, last_seen=now(), present=true (never overwrite owner name/grp/status).
  • markAbsent(seenMacs)present=false for MACs not in the latest scan.
  • listKnown() (status='known', grouped by grp), listDiscovered() (status='new'), get(mac), update(mac, {name, grp, status, note, flagged}), remove(mac). (ignored devices show in neither.)
  • prune() — delete unreviewed + absent rows past their TTL: status='new' AND present=false AND ((randomized AND last_seen < now()-'24h') OR (NOT randomized AND last_seen < now()-'14d')). Never touches known/ignored.

Cron (lib/cron/index.js)

Add hourly (7 * * * *): runScan()upsertScanmarkAbsentprune(). Wrapped in try/catch — a scan failure logs and never crashes the cron, and prune() only runs after a successful scan (so a failed scan can't reap rows).

API lib/api/routes/devices.js (mount /api/devices, owner-gated)

  • GET / — known devices grouped for the band.
  • GET /discoveredstatus='new' review queue.
  • PATCH /:mac — set name/grp/status/note/flagged (this is "add from discovered" + "edit" + "name"); promoting = status:'known'.
  • DELETE /:mac — remove.
  • POST /scan — run a scan immediately (owner).
  • :mac param validated against a MAC regex.

Frontend

  • public/views/devices_band.js — fetch /api/devices (grouped) instead of the static file; render the MAC (existing .dv-mac style from today's change).
  • Discovered review — a section/panel listing /api/devices/discovered, each with an Add / Edit form (name + group select + notes) that PATCHes to promote; plus inline edit for known devices and an Ignore/Delete action.
  • Randomized devices get a small "randomized MAC" badge (with a tooltip: naming pins it only until the MAC rotates; disable SSID randomization for stable tracking). A known device that's been present=false for ≥30d shows an "absent Nd" marker for easy manual cleanup (never auto-deleted).
  • Remove public/devices.json (superseded by the DB).

Infra setup (one-time, on CT 311)

apt install arp-scan + grant the binary raw-socket capability so the non-root void service user can run it: setcap cap_net_raw,cap_net_admin+eip /usr/sbin/arp-scan. Captured in deploy/README.md. If the capability/tool is missing, the scan logs a clear error and the feature degrades to "no new discoveries" (existing data still shows).

Error handling

  • arp-scan missing / unprivileged / non-zero exit → runScan throws; cron catches, logs, leaves the DB untouched (known devices still render).
  • Empty/garbled scan output → parseArpScan returns []; markAbsent([]) is a no-op guard (never blanket-marks everything absent on a failed scan).
  • Bad MAC in PATCH → 400 via zod.

Testing

  • parseArpScan / isRandomizedMac — pure unit tests (sample arp-scan output incl. a randomized MAC, banner/footer lines, a malformed line).
  • lan_devices repo (vitest + test DB) — upsertScan inserts new vs updates existing without clobbering owner fields; markAbsent flips presence; promote via update.
  • API (supertest) — /discovered lists only new; PATCH promotes/edits; owner-gated.
  • Frontend (jsdom) — band renders groups + MAC from /api/devices; discovered panel renders the add/edit form.
  • ManualPOST /api/devices/scan, confirm new devices appear, name one, see it move to the band.

Out of scope (YAGNI)

  • Service/port fingerprinting, SNMP/LLDP topology (that's Scanopy's job).
  • Multi-subnet/VLAN scanning (single /24).
  • Push notifications on new-device discovery.
  • Stable identity for randomized-MAC devices across rotations (not solvable from L2 alone; the user-side fix is disabling MAC randomization for the SSID).

References

  • Scanopy — github.com/scanopy/scanopy ; scanopy.net (self-hosted discovery/topology, AGPL-3.0).