docs: refresh README covering v1.1.0 → v1.3.1

A documentacao estava colada na v1.0; cinco releases depois precisa
de uma atualizacao geral.

Features
- Reorganiza em Collection / Web dashboard / Operations.
- Inclui Tag taxonomy, NSFW gate, Discreet mode, PIN lock, keyboard
  shortcuts, last_error banner, /health endpoint, modular HTMX+Alpine
  frontend.

Web Features
- Cobre os novos toggles do header (NSFW, Discreto, Lock), filtros
  NSFW na galeria, atalhos de teclado no modal, tag chips.

Security & Limits
- Nova subsection PIN lock com semantica do cookie HMAC, idle timeout,
  bypass de /health e /static.
- Tabela de rate limits ganha linha para mutacoes de config + tags
  (60/min por IP+path).

Env vars (Docker)
- RMC_PIN, RMC_PIN_TIMEOUT documentados.

Synology DSM
- Passo 4 menciona PIN como segundo fator se exposto publicamente.
- Novo passo 6: curl POST /api/tags/backfill para taggear retroativo
  uma biblioteca pre-existente.

API Reference
- Nova secao Tags (5 endpoints) com nota de que /api/media ja inclui
  tags por post.
- Nova secao Health & Auth (/health, /unlock, /lock).
- Collector ganha /api/collector/clear-error.
- Gallery filters ganham nsfw=all|hide|only.

Database Schema
- Adiciona coluna posts.nsfw, tabelas tags + post_tags + scheduler_history
  com comentarios de categoria/source.
- Nota sobre PRAGMA journal_mode=WAL.

Project Structure
- Atualiza arvore: routers/health.py, routers/tags.py, session.py,
  static/css/app.css, static/js/app.js, templates/partials/,
  unlock.html, docker-compose.synology.yml, .github/workflows/release.yml.
- Comentarios curtos por arquivo.

Apenas docs — sem mudanca de codigo, sem bump.
This commit is contained in:
authentik Default Admin 2026-05-17 21:54:32 +01:00
parent 9f21aa0751
commit a6ee86c4bb

211
README.md
View file

@ -9,18 +9,34 @@ A powerful, self-hosted media collector for Reddit that automatically downloads
## Features
- **Multi-source Collection** - Collect from subreddits and user profiles
- **Smart Deduplication** - MD5 hash-based detection prevents duplicate downloads
- **Gallery Support** - Automatically handles Reddit galleries with multiple images
- **Multiple Extractors** - Built-in support for Reddit, Imgur, Gfycat, and Redgifs
- **Immich Integration** - Generates JSON sidecar files with metadata for seamless import
- **Web Dashboard** - Modern web interface for configuration and monitoring
- **Blacklist System** - Filter out unwanted authors, subreddits, keywords, and domains
- **Favourites System** - Mark and filter your favourite posts
- **Video Thumbnails** - Auto-generated thumbnails for video preview in gallery
- **No API Keys Required** - Uses Reddit's public JSON endpoints
- **Docker Support** - Easy deployment with Docker Compose
- **Scheduled Collection** - Cron-ready for automated periodic collection
### Collection
- **Multi-source** — subreddits and user profiles
- **Smart deduplication** — MD5 hash-based; never downloads the same file twice
- **Gallery support** — handles Reddit galleries with multiple images
- **Multiple extractors** — Reddit, Imgur, Gfycat, Redgifs
- **No API keys** — uses Reddit's public JSON endpoints
### Web dashboard
- **Modular HTMX + Alpine.js** frontend (vanilla, zero build step)
- **Gallery** with infinite scroll, multi-select, bulk delete, sorting/filtering
- **Tag taxonomy (Stash-inspired)** — auto-tags every post by `subreddit / performer / genre / nsfw`; manual tags preserved across reruns; colored chips on every card
- **NSFW gate** — blur thumbnails by default; toggle 👁/🙈 in the header (persisted in localStorage)
- **Discreet mode** 🤫 — compact thumbnails for screen-shoulder privacy; auto-activates after 60 s of idle
- **PIN lock** (optional) — HMAC-signed session cookie on top of Basic Auth; idle timeout configurable
- **Favourites** + per-author view + sync favourites to user targets
- **Blacklist** — authors, subreddits, title keywords, domains
- **Scheduler** — interval or specific times, run history, "run now" button
- **`last_error` banner** — failed scheduled runs surface at the top until dismissed
- **Keyboard shortcuts** in the modal: `j`/`k` navigate, `f` favourite, `b` blacklist author, `Esc` close
- **`/health` endpoint** — public JSON with DB/ffmpeg/scheduler/writable status, ready for Container Manager monitors
### Operations
- **Docker** — single-image deployment, published to `ghcr.io/richardnixondev/reddit-media-collector`
- **Synology DSM Container Manager** ready (one-click deploy via compose)
- **SQLite WAL** — crash-safe on power loss, fast concurrent reads
- **Rotating log file** (10 MB × 5 backups) — caps disk usage on NAS
- **HTTP Basic Auth** (optional) + per-IP rate limiting on every mutation
- **Immich integration** — JSON sidecar with metadata for seamless import
## Quick Start
@ -144,24 +160,39 @@ Access the dashboard at `http://localhost:8000`
### Web Features
- **Dashboard** - View collection statistics and manage targets
- **Gallery** - Browse downloaded media with filtering, infinite scroll, and favourites
- **Authors** - Browse content grouped by author, with per-author modal
- **Settings** - Configure download options, blacklist, and scheduler
- **Scheduler** - Configure recurring collection runs (replaces external cron when running as a service)
- **Collector Control** - Trigger collection runs manually or per target
- **Dashboard** — collection statistics, trends chart, top authors, recent downloads
- **Gallery** — infinite scroll, filtering (subreddit / author / type / favourites / NSFW), sorting, bulk select, multi-delete, tag chips per card, NSFW blur with hover/global toggle
- **Authors** — grid grouped by author with per-author modal
- **Sources** — add/remove subreddits and users (HTMX, no page reload)
- **Settings** — download options, blacklist, scheduler config, individual collection
- **Scheduler** — interval or specific times, run history, "run now"
- **Collector control** — manual trigger or per-target collection
- **Header toggles** — NSFW 👁/🙈, Discreet 🤫, 🔒 Lock (when PIN enabled)
- **Keyboard shortcuts** — in the media modal: `j`/`k` next/prev, `f` favourite, `b` blacklist author, `Esc` close
## Security & Limits
### HTTP Basic Auth (optional)
Set both `RMC_AUTH_USER` and `RMC_AUTH_PASS` to require Basic credentials on every endpoint (including `/`). With either variable unset, the API stays public — appropriate for trusted local/intranet deployments.
Set both `RMC_AUTH_USER` and `RMC_AUTH_PASS` to require Basic credentials on every route except `/health`. With either variable unset, the API stays public — appropriate for trusted local/intranet deployments.
```bash
RMC_AUTH_USER=alice RMC_AUTH_PASS=s3cret uvicorn src.web.app:app
```
### PIN lock (optional, UI-only)
Set `RMC_PIN` (4-6 digits) to gate the **web UI** behind a numeric PIN screen on top of Basic Auth. Useful when others might use the device but you don't want to leak the API password.
- Session cookie is signed with HMAC-SHA256 using a key generated at boot — restart invalidates every session.
- Idle window: 10 min default; override with `RMC_PIN_TIMEOUT` (seconds).
- `/health`, `/static/*`, `/favicon` bypass the lock so monitors and CSS keep working.
- Clicking "🔒 Lock" in the header locks immediately.
```bash
RMC_PIN=1234 RMC_PIN_TIMEOUT=900 uvicorn src.web.app:app
```
### Rate limiting (per IP, per endpoint)
Heavy endpoints are throttled in-process to prevent runaway clients:
Heavy and mutation endpoints are throttled in-process to prevent runaway clients:
| Endpoint | Limit |
|---------------------------------------|---------------|
@ -169,6 +200,7 @@ Heavy endpoints are throttled in-process to prevent runaway clients:
| `POST /api/collector/run` | 3 / minute |
| `POST /api/media/cleanup-blacklist` | 5 / minute |
| `POST /api/media/cleanup-by-type` | 5 / minute |
| `POST/PUT/DELETE /api/{subreddits,users,blacklist/*,settings/*,posts/*/tags}` | 60 / minute |
Limits are per (client IP, route path) and reset on a sliding window. For multi-worker deployments, swap the in-memory buckets for a shared backend.
@ -229,6 +261,8 @@ Relevant environment variables (all optional, with sensible defaults inside the
| `RMC_CONFIG_PATH` | `/app/config.yaml` | YAML with subreddits/users/blacklist |
| `RMC_TIMEZONE` | `UTC` | Timezone used by the scheduler |
| `RMC_AUTH_USER` / `RMC_AUTH_PASS` | unset | Enable HTTP Basic Auth on every route when both are set |
| `RMC_PIN` | unset | 4-6 digit PIN gating the web UI (HMAC cookie); leave unset to disable |
| `RMC_PIN_TIMEOUT` | `600` | Idle seconds before the PIN cookie expires |
## Synology DSM Deployment
@ -257,10 +291,17 @@ Tested on DSM 7.2 with **Container Manager** on x86_64 Plus models. The publishe
*Control Panel → Login Portal → Advanced → Reverse Proxy → Create*. Source:
`reddit.yourdomain.com` (HTTPS:443). Destination: `localhost:8000` (HTTP). Attach a
Let's Encrypt cert. When exposing publicly, uncomment `RMC_AUTH_USER`/`RMC_AUTH_PASS`
in the compose file.
in the compose file — and consider adding `RMC_PIN=NNNN` for a second factor on the UI.
5. **Updating:** *Container Manager → Project → reddit-media-collector → Action → Build*
re-pulls `latest`. The `data/` volume keeps the database, scheduler history and config.
re-pulls `latest`. The `data/` volume keeps the database, scheduler history, tags and config.
6. **First boot (one-time, if you already had a library):**
to retroactively tag every existing post (subreddit / performer / genre / nsfw):
```bash
curl -X POST http://<nas-ip>:8000/api/tags/backfill
```
Idempotent — safe to re-run.
**Permissions note:** if the container can't write to the bind mounts, check the owner
of `/volume1/docker/reddit-media-collector/` with `ls -ln`. Either `chown` it to a user
@ -401,7 +442,8 @@ The web interface exposes a REST API. FastAPI also serves interactive docs at `/
| Method | Endpoint | Description |
|--------|----------|-------------|
| POST | `/api/collector/run` | Trigger collection run |
| GET | `/api/collector/status` | Collector status |
| GET | `/api/collector/status` | Collector status (includes `last_error`) |
| POST | `/api/collector/clear-error` | Dismiss the `last_error` banner |
| POST | `/api/collect/individual` | Collect from a single subreddit/user |
| GET | `/api/collect/targets` | List available targets |
| GET | `/api/scheduler/status` | Scheduler state + next run |
@ -409,6 +451,29 @@ The web interface exposes a REST API. FastAPI also serves interactive docs at `/
| GET | `/api/scheduler/history` | Past scheduler runs |
| POST | `/api/scheduler/run-now` | Execute schedule immediately |
### Tags
Stash-style taxonomy. Auto-tags are recreated on every collect (idempotent); user-added tags survive reruns.
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/api/tags?category=` | List all tags (filter by category: `performer`/`source`/`genre`/`meta`) |
| GET | `/api/posts/{id}/tags` | Tags attached to a single post |
| POST | `/api/posts/{id}/tags` | Attach a user-curated tag |
| DELETE | `/api/posts/{id}/tags/{tag_id}` | Detach a tag (auto or user) |
| POST | `/api/tags/backfill` | One-time pass to retroactively auto-tag the whole library |
`GET /api/media` already returns each post's `tags: [{name, category}]` array (batched join — O(1) extra queries per page).
The gallery filters now also accept `nsfw=all|hide|only`.
### Health & Auth
| Method | Endpoint | Description |
|--------|----------|-------------|
| GET | `/health` | Public JSON: `db`, `ffmpeg`, `scheduler`, `downloads_writable`, `version`, `auth_enabled` |
| GET | `/unlock` | PIN entry screen (only when `RMC_PIN` is set) |
| POST | `/unlock` | Submit PIN; sets the signed session cookie |
| POST | `/lock` | Invalidate the session cookie immediately |
## Database Schema
The SQLite database (`media.db`) stores all metadata:
@ -428,8 +493,9 @@ CREATE TABLE posts (
local_path TEXT,
file_hash TEXT,
permalink TEXT,
source_type TEXT,
flair TEXT
source_type TEXT, -- 'subreddit' or 'user'
flair TEXT,
nsfw INTEGER DEFAULT 0 -- Reddit's over_18 mirrored locally
);
CREATE TABLE favorites (
@ -437,8 +503,40 @@ CREATE TABLE favorites (
favorited_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FOREIGN KEY (post_id) REFERENCES posts(id)
);
-- Tag taxonomy (Stash-inspired)
CREATE TABLE tags (
id INTEGER PRIMARY KEY AUTOINCREMENT,
name TEXT NOT NULL,
category TEXT, -- 'performer' | 'source' | 'genre' | 'meta'
is_nsfw INTEGER DEFAULT 0,
description TEXT,
UNIQUE(name, category)
);
CREATE TABLE post_tags (
post_id TEXT NOT NULL,
tag_id INTEGER NOT NULL,
source TEXT DEFAULT 'auto', -- 'auto' | 'user'
PRIMARY KEY (post_id, tag_id),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
);
-- Scheduler history (in-app cron alternative)
CREATE TABLE scheduler_history (
id INTEGER PRIMARY KEY AUTOINCREMENT,
started_at TIMESTAMP,
finished_at TIMESTAMP,
status TEXT, -- 'success' | 'error' | 'timeout' | 'running'
posts_processed INTEGER DEFAULT 0,
posts_downloaded INTEGER DEFAULT 0,
error_message TEXT
);
```
The database uses **SQLite WAL** (`PRAGMA journal_mode=WAL`) for crash safety on power loss and concurrent reads while a collect is running.
## Troubleshooting
### Videos saving as .html
@ -477,41 +575,48 @@ pip install -e ".[dev]"
```
reddit-media-collector/
├── src/
│ ├── main.py # Collector entry point
│ ├── config.py # Configuration dataclasses
│ ├── database.py # SQLite wrapper
│ ├── downloader.py # Downloader with retry + dedupe
│ ├── reddit_client.py # Reddit JSON-API client
│ ├── sidecar.py # Immich-compatible JSON sidecars
│ ├── extractors/ # URL extractors per host
│ │ ├── reddit.py
│ │ ├── imgur.py
│ │ └── gfycat.py
│ └── web/ # FastAPI app
│ ├── app.py
│ ├── auth.py # Optional HTTP Basic auth
│ ├── main.py # Collector entry point
│ ├── config.py # Configuration dataclasses + rotating logger
│ ├── database.py # SQLite (WAL) + tags taxonomy + auto-tagger
│ ├── downloader.py # Downloader with retry + dedupe
│ ├── reddit_client.py # Reddit JSON-API client
│ ├── sidecar.py # Immich-compatible JSON sidecars
│ ├── extractors/ # URL extractors per host (reddit, imgur, gfycat)
│ └── web/ # FastAPI app
│ ├── app.py # App, lifespan, PIN-lock middleware
│ ├── auth.py # Optional HTTP Basic auth
│ ├── session.py # HMAC-signed PIN session cookie
│ ├── config_manager.py
│ ├── deps.py
│ ├── rate_limit.py # Per-IP throttle dependency
│ ├── rate_limit.py # Per-IP throttle dependency
│ ├── routers/
│ │ ├── config.py
│ │ ├── config.py # Config CRUD (HTMX-aware fragments)
│ │ ├── favorites.py
│ │ ├── media.py
│ │ ├── scheduler.py
│ │ └── stats.py
│ │ ├── health.py # Public /health JSON
│ │ ├── media.py # Gallery, file serving, cleanups
│ │ ├── scheduler.py # In-app scheduler + collector control
│ │ ├── stats.py
│ │ └── tags.py # Tag taxonomy CRUD + backfill
│ ├── static/
│ │ └── js/api.js # Shared frontend helpers
│ │ ├── css/app.css # Extracted from inline (themed via CSS vars)
│ │ └── js/
│ │ ├── api.js # Shared fetch helpers
│ │ └── app.js # All app logic
│ └── templates/
│ └── index.html # SPA shell
├── tests/ # pytest (unit + API contract)
├── downloads/ # Downloaded media (gitignored)
├── config.yaml # Configuration
├── media.db # SQLite database (gitignored)
├── pyproject.toml # Project metadata + tooling config
├── .pre-commit-config.yaml # ruff + mypy hooks
├── .github/workflows/ci.yml # lint, types, test, docker
│ ├── index.html # Composition: extends layout + includes
│ ├── unlock.html # PIN entry screen
│ └── partials/ # tab_*.html, _modal_*.html, _item_*.html, _tag.html
├── tests/ # pytest (unit + API contract; 151 tests)
├── downloads/ # Downloaded media (gitignored)
├── config.yaml # Configuration
├── pyproject.toml # Project metadata + tooling config
├── .pre-commit-config.yaml # ruff + mypy hooks
├── .github/workflows/
│ ├── ci.yml # lint, types, test, docker (per PR/push)
│ └── release.yml # GHCR publish on every v* tag
├── Dockerfile
├── docker-compose.yml
├── docker-compose.yml # Local dev
├── docker-compose.synology.yml # NAS-flavored (image from GHCR)
└── README.md
```