Commit graph

65 commits

Author SHA1 Message Date
4d16704ede ci(security): bump Silleellie/pylint-github-action to v3
Closes Dependabot PR #2.

v3 changelog: adds an optional commit-message parameter (we do not
use it, default is fine), removes the Endbug dependency that caused
issues on github-enterprise, and bumps its own internal checkout /
setup-python actions. None of the inputs we pass (lint-path,
python-version, requirements-path, pylintrc-path, readme-path,
badge-text, color-*) changed.

Re-pinned by full commit SHA, same hardening pattern as v2.1.
2026-05-17 18:44:31 +00:00
d7fab0fd89 chore(frontend-deps): bump recharts, globals, eslint-plugin-react-refresh
Closes Dependabot PRs #7 (recharts), #6 (globals) and #4
(eslint-plugin-react-refresh).

  - recharts ^3.2.0 -> ^3.8.1 (runtime chart lib used by App.jsx;
    bundle grew from 489 kB to 550 kB gzipped — within the 1 MB
    soft budget),
  - globals ^16.3.0 -> ^17.6.0 (eslint flat-config peer; no API
    surface used directly),
  - eslint-plugin-react-refresh ^0.4.20 -> ^0.5.2.

Verified locally with `npm install && npm run lint && npm run build`:
zero lint errors, build completes in 814 ms, npm audit reports no
vulnerabilities.

Two Dependabot PRs intentionally not included here:
  - #12 @eslint/js -> ^10 needs eslint -> ^10 first (peer dep),
  - #9 @vitejs/plugin-react -> ^6 needs vite -> ^8 first (peer dep).
Both will be revisited once Dependabot opens the matching core
bumps.
2026-05-17 18:30:43 +00:00
ed6dea4307 chore(deps): bump websockets to 16.0 and setup-python to v6
Closes Dependabot PRs #13 (websockets) and #1 (setup-python).

websockets: 15.0.1 -> 16.0
  Major bump but the API we use (serve(handler), handler arg
  exposing request.path, ws.send, close(code, reason)) is
  unchanged. Verified by running the existing 89-test suite
  against websockets==16.0 locally — _ws_token_ok still reads
  the query string as before.

actions/setup-python: v5 -> v6
  First-party action, low risk. The previous version was already
  tag-pinned (acceptable for first-party). cache: 'pip' input is
  preserved.

pip-audit remains clean (0 known vulnerabilities).
2026-05-17 18:20:43 +00:00
006e62f744
Merge pull request #3 from richardnixondev/dependabot/github_actions/actions/checkout-6
Bump actions/checkout from 4 to 6
2026-05-17 17:29:58 +01:00
d308e9f5b4
Merge pull request #10 from richardnixondev/dependabot/pip/idna-3.15
chore(deps): bump idna from 3.10 to 3.15
2026-05-17 17:29:44 +01:00
8d58c35d8b
Merge pull request #8 from richardnixondev/dependabot/pip/yarl-1.23.0
chore(deps): bump yarl from 1.20.1 to 1.23.0
2026-05-17 17:29:23 +01:00
c4a17cfce8
Merge pull request #5 from richardnixondev/dependabot/pip/attrs-26.1.0
chore(deps): bump attrs from 25.3.0 to 26.1.0
2026-05-17 17:29:20 +01:00
dependabot[bot]
85f191f4d9
chore(deps): bump yarl from 1.20.1 to 1.23.0
---
updated-dependencies:
- dependency-name: yarl
  dependency-version: 1.23.0
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-17 16:22:05 +00:00
dependabot[bot]
bdba271262
chore(deps): bump attrs from 25.3.0 to 26.1.0
Bumps [attrs](https://github.com/python-attrs/attrs) from 25.3.0 to 26.1.0.
- [Release notes](https://github.com/python-attrs/attrs/releases)
- [Changelog](https://github.com/python-attrs/attrs/blob/main/CHANGELOG.md)
- [Commits](https://github.com/python-attrs/attrs/compare/25.3.0...26.1.0)

---
updated-dependencies:
- dependency-name: attrs
  dependency-version: 26.1.0
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-17 16:21:54 +00:00
dependabot[bot]
8dd022601f
chore(deps): bump idna from 3.10 to 3.15
Bumps [idna](https://github.com/kjd/idna) from 3.10 to 3.15.
- [Release notes](https://github.com/kjd/idna/releases)
- [Changelog](https://github.com/kjd/idna/blob/master/HISTORY.md)
- [Commits](https://github.com/kjd/idna/compare/v3.10...v3.15)

---
updated-dependencies:
- dependency-name: idna
  dependency-version: '3.15'
  dependency-type: direct:production
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-17 16:21:50 +00:00
github-actions
cabdce8b74 Updated pylint badge 2026-05-17 16:20:31 +00:00
9413797a51 test: cover Settings validation and Metrics schema
34 new tests (89 total, still ~0.1s).

test_settings.py — exercises BackendSettings directly with _env_file=None
so the developer's local .env does not leak in:
  - default port ranges and invariants,
  - non-integer / out-of-range port rejection,
  - cpu_alert_th out-of-range rejection,
  - env override roundtrip,
  - extra="ignore" tolerates typos (regression: an unknown env var
    should not crash startup).

test_metrics_schema.py — black-box tests of parse_metrics() with each
case named after the attack it guards against:
  - happy path with full and partial (optional fields) payloads,
  - every required field individually missing,
  - every percentage field individually out of [0, 100],
  - extra="forbid" rejects smuggled keys (e.g. {"injected": "<!channel>"}),
  - unsafe device_id patterns (slashes, newlines, path traversal,
    65-char overflow, empty string),
  - invalid raw JSON,
  - NaN / Infinity / -Infinity which json.loads accepts but the
    schema (Field + the finite-value validator) rejects.
2026-05-17 17:19:50 +01:00
0e816fb966 feat(security): validate full MQTT payload schema with Pydantic
Previously only device_id and cpu_percent had explicit checks. The
rest of the payload — mem_percent, disk_percent, gpu_percent,
timestamp, agent_cpu_percent, agent_mem_mb — was trusted as long
as json.loads accepted it, so a malicious or buggy publisher could
push:

  - mem_percent: "<script>alert(1)</script>" (rendered later in
    the WS dashboard / Slack summary as if numeric),
  - disk_percent: NaN (which compares False everywhere and breaks
    downstream chart aggregation),
  - extra keys ("evil": "<!channel>"), persisted in device_state
    and forwarded verbatim to WS clients.

Pydantic Metrics model now enforces the whole frame:
  - device_id pattern (same regex as validate_device_id),
  - percentages bounded to [0, 100],
  - explicit NaN rejection (a finite-value @field_validator on top
    of Field(ge/le), which already excludes inf),
  - timestamp >= 0,
  - extra="forbid" so unknown keys are dropped at the door.

on_message now goes through parse_metrics() which logs a WARNING
with the structured pydantic error list on rejection.
2026-05-17 17:19:50 +01:00
adf6a7a1ce feat(config): typed settings via pydantic-settings
Replace the scattered os.getenv() + int()/float() pattern with a
BaseSettings class on both modules. Wins:

  - bad config now fails at import with a readable pydantic error
    (WS_PORT=abc no longer produces a ValueError stack from inside
    main()); ports are bounded to [1, 65535], cpu_alert_th to [0,100],
    backoff_min/interval to >= 1,
  - .env loading moves into pydantic-settings (env_file in
    SettingsConfigDict), so the manual load_dotenv() call is gone,
  - every callback now reads from a single ``settings`` instance, so
    runtime overrides are possible (tests use monkeypatch on
    backend.settings instead of patching module-level constants).

Test for ws_token is updated to patch backend.settings.ws_auth_token
rather than the old WS_AUTH_TOKEN module constant; the contract is
unchanged so all 55 tests still pass.

Pydantic stack pinned: pydantic==2.13.4, pydantic-core==2.46.4,
pydantic-settings==2.14.1 (plus annotated-types and typing-inspection
as transitives). pip-audit remains clean.
2026-05-17 17:19:50 +01:00
dependabot[bot]
0b97d9a793
Bump actions/checkout from 4 to 6
Bumps [actions/checkout](https://github.com/actions/checkout) from 4 to 6.
- [Release notes](https://github.com/actions/checkout/releases)
- [Changelog](https://github.com/actions/checkout/blob/main/CHANGELOG.md)
- [Commits](https://github.com/actions/checkout/compare/v4...v6)

---
updated-dependencies:
- dependency-name: actions/checkout
  dependency-version: '6'
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
2026-05-17 15:24:52 +00:00
github-actions
06c665843d Updated pylint badge 2026-05-17 15:24:51 +00:00
031e18e062 chore(ci): enable Dependabot for pip, npm and github-actions
pip-audit now fails the CI on known CVEs (added earlier in the lint
workflow), but that only protects against regressions in *new* PRs.
Dependabot closes the loop on the rest: it raises weekly PRs to bump
pinned versions before a CVE is even disclosed publicly, including
SHA-pinned bumps for the Silleellie third-party action so the SHA
pin does not become a maintenance trap.

Three ecosystems are configured:
  - pip (repo root, runtime + requirements-dev),
  - npm (frontend/),
  - github-actions (workflows/ and reusable actions).
2026-05-17 16:24:10 +01:00
aeba3cd1e5 feat(backend): reconnect MQTT with exponential backoff
Before this change a single connect() failure (bad cert, broker
down, transient network glitch) made mqtt_loop return after a one-
line error, while the surrounding asyncio.gather kept the WS server
and the Slack summary loop running blind — the bridge looked alive
but no metrics flowed.

The loop now:
  - keeps the SSL context outside the retry body so reconnects do
    not re-read cert files on every attempt,
  - awaits an asyncio.Event flipped by on_disconnect, so it only
    enters the backoff sleep when the broker actually dropped us,
  - retries with exponential backoff capped at MQTT_BACKOFF_MAX
    (default 60s), resetting after each successful connect,
  - lets asyncio.CancelledError propagate so shutdown still works.

next_backoff() is a tiny pure helper so the doubling/ceiling logic
is unit tested in isolation (tests/test_backoff.py, 7 cases).

Also hoisted `import ssl` to module top-level — the previous in-
function import was a leftover from earlier copy/paste and tripped
pylint's import-outside-toplevel check.
2026-05-17 16:24:10 +01:00
469201efcb refactor(logging): replace print() with stdlib logging
print() across both modules made production observability painful:
no levels, no timestamps under the developer's control, and the
common `except Exception as e: print(e)` pattern dropped the
traceback. A single grep for `[MQTT ERROR]` could not tell whether
the failure was a parse error, a TLS handshake, or an OOM.

Now both backend.py and collect_metrics.py use a module logger:

  - basicConfig with `%(asctime)s %(levelname)s %(name)s: %(message)s`,
  - level driven by the LOG_LEVEL env var (defaults to INFO),
  - log.exception() in catch blocks so the stack trace is preserved,
  - debug-level for noisy frames (raw payload, agent self stats,
    per-message broadcast log) so prod runs stay readable.

Also rename the agent's local CPU threshold to AGENT_CPU_WARN (env
overridable) so the magic 90 in collect_metrics no longer drifts
silently from the backend's CPU_ALERT_TH.
2026-05-17 16:24:10 +01:00
github-actions
89d6da7ca4 Updated pylint badge 2026-05-17 14:02:29 +00:00
4fea59a3a5 ci: run pytest in the lint workflow
Install requirements-dev.txt (which transitively pulls runtime deps)
and run pytest before pylint. The suite is fast (~0.1s) so it does
not affect total CI time, but it now gates merges on the validation
and alert-predicate contracts.
2026-05-17 15:20:00 +00:00
c4329a9b9b test: add pytest suite for validation, alerting and WS auth
48 tests, ~0.1s total. Each case targets a specific bug class so the
file reads as a contract:

  - test_validation.py
      accepts well-formed device ids, rejects whitespace, slashes,
      colons, newlines, zero-width spaces, oversize values, HTML.
  - test_alert_predicate.py
      threshold boundary, bool-vs-int trap, NaN / inf / out-of-range,
      non-numeric payloads, per-device cooldown window.
  - test_active_snapshot.py
      recent vs stale, the "<= prune_seconds" boundary (inclusive),
      missing last_seen treated as ancient, empty state.
  - test_ws_token.py
      open mode, missing/wrong/empty/extra-param query strings, plus
      the happy path with the correct token.

conftest.py stubs MQTT_BROKER and prepends the repo root to sys.path
so `import backend` works without a .env file. Dev deps split into
requirements-dev.txt to keep the runtime image lean.
2026-05-17 14:45:00 +00:00
725d1a543e refactor(backend): extract pure helpers for parsing and alerting
Pull the validation, snapshot pruning and alert predicate out of the
MQTT callback so they can be unit tested without mocking gmqtt:

  - filter_active(state, seen, now, prune_seconds) — pure pruning
    rule, now also the body of active_snapshot();
  - validate_device_id(raw) — single source of truth for the
    [A-Za-z0-9._-]{1,64} contract;
  - should_alert(cpu, device_id, now, last_alert, cooldown, threshold)
    — boolean predicate that captures the bool-vs-int trap, the
    [0,100] range check, and the per-device cooldown.

Pure behavior is unchanged; on_message now calls these helpers
instead of inlining the same logic.
2026-05-17 14:10:00 +00:00
3ca228cc15 feat(security): authenticate WebSocket clients with shared token
Without auth, the WS server at 0.0.0.0:6789 exposed every device's
metrics to anyone on the network — useful reconnaissance for an
attacker (saturated nodes are easier DoS targets) and trivial pivot
from a compromised host.

Server side:
  - WS_AUTH_TOKEN env defaults to empty (open mode for local dev),
  - when set, ws_handler reads ?token=... from the handshake target
    and rejects with WS close 1008 unless secrets.compare_digest
    matches; the comparison is constant-time to avoid timing oracles.

Client side:
  - frontend reads VITE_WS_URL and VITE_WS_TOKEN, so the same build
    works in dev (localhost, no token) and prod (proxied wss, token).
  - frontend/.env.sample documents the variables; .gitignore extended
    to keep .env / .env.* out of the repo while allowing .env.sample.

env_sample also documents ALERT_COOLDOWN, MAX_PAYLOAD_BYTES and
MAX_DEVICES that the previous commits introduced.
2026-05-17 13:35:00 +00:00
github-actions
4bfa8e6d81 Updated pylint badge 2026-05-17 13:27:10 +00:00
2dddf163fe ci(security): harden Pylint workflow
- Declare permissions:contents:write explicitly. Defaulting to the
  repository-wide GITHUB_TOKEN scope is broader than required for
  badge updates and violates least privilege.
- Pin Silleellie/pylint-github-action by full commit SHA instead of
  the mutable v2.1 tag, removing the supply-chain risk where a tag
  re-point would run arbitrary code with our GITHUB_TOKEN.
- Add a pip-audit step so new CVEs in pinned deps fail the build.
- Enable pip cache to cut ~30s off cold runs.
2026-05-15 08:28:38 +00:00
db6ace094f refactor(ws): broadcast with gather and prune disconnected clients
The previous per-client asyncio.create_task(ws.send) had two problems:
  - tasks were created without a reference, so a slow GC could drop
    them before they ran and exceptions vanished silently,
  - the surrounding try/except could only catch *synchronous* failures
    of create_task itself, never an actual send failure, so dead
    sockets stayed in `clients` forever and the discard branch was
    effectively dead code.

Use a single _broadcast coroutine that fans out with asyncio.gather
(return_exceptions=True) and prunes the clients set based on real
send results. Schedule fire-and-forget work through _schedule, which
keeps a strong reference to the task until it completes.
2026-05-13 14:58:10 +00:00
4f23b20565 fix(security): throttle Slack alerts and tighten numeric validation
Each MQTT message with cpu_percent >= CPU_ALERT_TH used to schedule a
post_slack task immediately. A bursty (or hostile) publisher could
spam the channel, burn the webhook's 1 req/s quota, and bury real
alerts under alert fatigue.

Also harden the predicate:
  - reject bool values (True passes isinstance check for int otherwise),
  - bound cpu to [0, 100] so NaN / inf / 1e308 cannot trigger,
  - re-alert per device only after ALERT_COOLDOWN seconds.
2026-05-13 05:24:00 +00:00
bc1e725437 fix(security): validate device_id and cap untrusted MQTT input
A compromised device (or a stolen cert) could previously:
  - publish a giant payload to exhaust backend memory in json.loads,
  - spoof messages claiming arbitrary device_id values (rendered into
    Slack alerts as mrkdwn, enabling content injection / channel
    impersonation with link unfurls and broadcast keywords),
  - flood device_state with random ids to drive unbounded memory growth
    since the prune is only applied on read in active_snapshot().

Add three guards in on_message:
  - MAX_PAYLOAD_BYTES (default 16 KiB) — rejects oversize frames before
    json parsing,
  - _DEVICE_ID_RE — accepts only [A-Za-z0-9._-]{1,64}, rejecting
    newlines, slack mrkdwn metacharacters and absurd lengths,
  - MAX_DEVICES cap — refuses new device_ids once the active set is
    full, so a misbehaving publisher cannot grow the dict without bound.
2026-05-08 06:28:36 +00:00
906702a1cc chore(deps): drop unused libs and bump aiohttp / python-dotenv
- Remove paho-mqtt, pynvml, nvidia-ml-py and the redundant dotenv shim
  (the code only imports gmqtt, psutil and python-dotenv; GPU stats come
  from a nvidia-smi subprocess, not pynvml).
- Bump aiohttp 3.12.15 -> 3.13.4 (closes 18 CVEs; mostly server-side
  but defense-in-depth for the Slack client too).
- Bump python-dotenv 1.1.1 -> 1.2.2 (CVE-2026-28684; not exploitable
  here since only load_dotenv is used, but keeps scanners clean).
2026-05-07 03:49:58 +00:00
e3f962f183 Update README.md 2025-10-16 10:00:00 +01:00
46c569c697 Delete docs/MVP.md 2025-10-15 10:00:00 +01:00
81d770a1e9 Update DESIGN.md 2025-10-14 10:00:00 +01:00
90a4f4065f Update README.md 2025-10-13 10:00:00 +01:00
github-actions
c2b9642f53 Updated pylint badge 2025-10-12 10:00:00 +01:00
6eedc05c09 fix backend connection with mqtt amazon 2025-10-11 10:00:00 +01:00
3ed008c9b3 Update README.md
fix npm run command
2025-10-10 10:00:00 +01:00
cbb04c09b9 Update README.md 2025-10-09 10:00:00 +01:00
github-actions
0cfcd43c16 Updated pylint badge 2025-10-08 10:00:00 +01:00
31ca3228f4 Update README.md 2025-10-07 10:00:00 +01:00
94088080ec Update .pylintrc
active score pylint
2025-10-06 10:00:00 +01:00
e9a783408b Update README.md 2025-10-05 10:00:00 +01:00
cdaf042976 Update pylint.yml 2025-10-04 10:00:00 +01:00
18e1f16df9 Update pylint.yml 2025-10-03 10:00:00 +01:00
5147f1c639 Update README.md 2025-10-02 10:00:00 +01:00
3e0af91799 Update README.md 2025-10-01 10:00:00 +01:00
4dc9e32971 fix pylint agent 2025-09-30 10:00:00 +01:00
555a74d8a4 fix pylint 2025-09-29 10:00:00 +01:00
3d5035358b fix ci flow 2025-09-28 10:00:00 +01:00
9b1276aa53 Update pylint.yml 2025-09-27 10:00:00 +01:00