docs: refresh README covering v1.1.0 → v1.3.1
A documentacao estava colada na v1.0; cinco releases depois precisa de uma atualizacao geral. Features - Reorganiza em Collection / Web dashboard / Operations. - Inclui Tag taxonomy, NSFW gate, Discreet mode, PIN lock, keyboard shortcuts, last_error banner, /health endpoint, modular HTMX+Alpine frontend. Web Features - Cobre os novos toggles do header (NSFW, Discreto, Lock), filtros NSFW na galeria, atalhos de teclado no modal, tag chips. Security & Limits - Nova subsection PIN lock com semantica do cookie HMAC, idle timeout, bypass de /health e /static. - Tabela de rate limits ganha linha para mutacoes de config + tags (60/min por IP+path). Env vars (Docker) - RMC_PIN, RMC_PIN_TIMEOUT documentados. Synology DSM - Passo 4 menciona PIN como segundo fator se exposto publicamente. - Novo passo 6: curl POST /api/tags/backfill para taggear retroativo uma biblioteca pre-existente. API Reference - Nova secao Tags (5 endpoints) com nota de que /api/media ja inclui tags por post. - Nova secao Health & Auth (/health, /unlock, /lock). - Collector ganha /api/collector/clear-error. - Gallery filters ganham nsfw=all|hide|only. Database Schema - Adiciona coluna posts.nsfw, tabelas tags + post_tags + scheduler_history com comentarios de categoria/source. - Nota sobre PRAGMA journal_mode=WAL. Project Structure - Atualiza arvore: routers/health.py, routers/tags.py, session.py, static/css/app.css, static/js/app.js, templates/partials/, unlock.html, docker-compose.synology.yml, .github/workflows/release.yml. - Comentarios curtos por arquivo. Apenas docs — sem mudanca de codigo, sem bump.
This commit is contained in:
parent
9f21aa0751
commit
a6ee86c4bb
1 changed files with 158 additions and 53 deletions
211
README.md
211
README.md
|
|
@ -9,18 +9,34 @@ A powerful, self-hosted media collector for Reddit that automatically downloads
|
|||
|
||||
## Features
|
||||
|
||||
- **Multi-source Collection** - Collect from subreddits and user profiles
|
||||
- **Smart Deduplication** - MD5 hash-based detection prevents duplicate downloads
|
||||
- **Gallery Support** - Automatically handles Reddit galleries with multiple images
|
||||
- **Multiple Extractors** - Built-in support for Reddit, Imgur, Gfycat, and Redgifs
|
||||
- **Immich Integration** - Generates JSON sidecar files with metadata for seamless import
|
||||
- **Web Dashboard** - Modern web interface for configuration and monitoring
|
||||
- **Blacklist System** - Filter out unwanted authors, subreddits, keywords, and domains
|
||||
- **Favourites System** - Mark and filter your favourite posts
|
||||
- **Video Thumbnails** - Auto-generated thumbnails for video preview in gallery
|
||||
- **No API Keys Required** - Uses Reddit's public JSON endpoints
|
||||
- **Docker Support** - Easy deployment with Docker Compose
|
||||
- **Scheduled Collection** - Cron-ready for automated periodic collection
|
||||
### Collection
|
||||
- **Multi-source** — subreddits and user profiles
|
||||
- **Smart deduplication** — MD5 hash-based; never downloads the same file twice
|
||||
- **Gallery support** — handles Reddit galleries with multiple images
|
||||
- **Multiple extractors** — Reddit, Imgur, Gfycat, Redgifs
|
||||
- **No API keys** — uses Reddit's public JSON endpoints
|
||||
|
||||
### Web dashboard
|
||||
- **Modular HTMX + Alpine.js** frontend (vanilla, zero build step)
|
||||
- **Gallery** with infinite scroll, multi-select, bulk delete, sorting/filtering
|
||||
- **Tag taxonomy (Stash-inspired)** — auto-tags every post by `subreddit / performer / genre / nsfw`; manual tags preserved across reruns; colored chips on every card
|
||||
- **NSFW gate** — blur thumbnails by default; toggle 👁/🙈 in the header (persisted in localStorage)
|
||||
- **Discreet mode** 🤫 — compact thumbnails for screen-shoulder privacy; auto-activates after 60 s of idle
|
||||
- **PIN lock** (optional) — HMAC-signed session cookie on top of Basic Auth; idle timeout configurable
|
||||
- **Favourites** + per-author view + sync favourites to user targets
|
||||
- **Blacklist** — authors, subreddits, title keywords, domains
|
||||
- **Scheduler** — interval or specific times, run history, "run now" button
|
||||
- **`last_error` banner** — failed scheduled runs surface at the top until dismissed
|
||||
- **Keyboard shortcuts** in the modal: `j`/`k` navigate, `f` favourite, `b` blacklist author, `Esc` close
|
||||
- **`/health` endpoint** — public JSON with DB/ffmpeg/scheduler/writable status, ready for Container Manager monitors
|
||||
|
||||
### Operations
|
||||
- **Docker** — single-image deployment, published to `ghcr.io/richardnixondev/reddit-media-collector`
|
||||
- **Synology DSM Container Manager** ready (one-click deploy via compose)
|
||||
- **SQLite WAL** — crash-safe on power loss, fast concurrent reads
|
||||
- **Rotating log file** (10 MB × 5 backups) — caps disk usage on NAS
|
||||
- **HTTP Basic Auth** (optional) + per-IP rate limiting on every mutation
|
||||
- **Immich integration** — JSON sidecar with metadata for seamless import
|
||||
|
||||
## Quick Start
|
||||
|
||||
|
|
@ -144,24 +160,39 @@ Access the dashboard at `http://localhost:8000`
|
|||
|
||||
### Web Features
|
||||
|
||||
- **Dashboard** - View collection statistics and manage targets
|
||||
- **Gallery** - Browse downloaded media with filtering, infinite scroll, and favourites
|
||||
- **Authors** - Browse content grouped by author, with per-author modal
|
||||
- **Settings** - Configure download options, blacklist, and scheduler
|
||||
- **Scheduler** - Configure recurring collection runs (replaces external cron when running as a service)
|
||||
- **Collector Control** - Trigger collection runs manually or per target
|
||||
- **Dashboard** — collection statistics, trends chart, top authors, recent downloads
|
||||
- **Gallery** — infinite scroll, filtering (subreddit / author / type / favourites / NSFW), sorting, bulk select, multi-delete, tag chips per card, NSFW blur with hover/global toggle
|
||||
- **Authors** — grid grouped by author with per-author modal
|
||||
- **Sources** — add/remove subreddits and users (HTMX, no page reload)
|
||||
- **Settings** — download options, blacklist, scheduler config, individual collection
|
||||
- **Scheduler** — interval or specific times, run history, "run now"
|
||||
- **Collector control** — manual trigger or per-target collection
|
||||
- **Header toggles** — NSFW 👁/🙈, Discreet 🤫, 🔒 Lock (when PIN enabled)
|
||||
- **Keyboard shortcuts** — in the media modal: `j`/`k` next/prev, `f` favourite, `b` blacklist author, `Esc` close
|
||||
|
||||
## Security & Limits
|
||||
|
||||
### HTTP Basic Auth (optional)
|
||||
Set both `RMC_AUTH_USER` and `RMC_AUTH_PASS` to require Basic credentials on every endpoint (including `/`). With either variable unset, the API stays public — appropriate for trusted local/intranet deployments.
|
||||
Set both `RMC_AUTH_USER` and `RMC_AUTH_PASS` to require Basic credentials on every route except `/health`. With either variable unset, the API stays public — appropriate for trusted local/intranet deployments.
|
||||
|
||||
```bash
|
||||
RMC_AUTH_USER=alice RMC_AUTH_PASS=s3cret uvicorn src.web.app:app
|
||||
```
|
||||
|
||||
### PIN lock (optional, UI-only)
|
||||
Set `RMC_PIN` (4-6 digits) to gate the **web UI** behind a numeric PIN screen on top of Basic Auth. Useful when others might use the device but you don't want to leak the API password.
|
||||
|
||||
- Session cookie is signed with HMAC-SHA256 using a key generated at boot — restart invalidates every session.
|
||||
- Idle window: 10 min default; override with `RMC_PIN_TIMEOUT` (seconds).
|
||||
- `/health`, `/static/*`, `/favicon` bypass the lock so monitors and CSS keep working.
|
||||
- Clicking "🔒 Lock" in the header locks immediately.
|
||||
|
||||
```bash
|
||||
RMC_PIN=1234 RMC_PIN_TIMEOUT=900 uvicorn src.web.app:app
|
||||
```
|
||||
|
||||
### Rate limiting (per IP, per endpoint)
|
||||
Heavy endpoints are throttled in-process to prevent runaway clients:
|
||||
Heavy and mutation endpoints are throttled in-process to prevent runaway clients:
|
||||
|
||||
| Endpoint | Limit |
|
||||
|---------------------------------------|---------------|
|
||||
|
|
@ -169,6 +200,7 @@ Heavy endpoints are throttled in-process to prevent runaway clients:
|
|||
| `POST /api/collector/run` | 3 / minute |
|
||||
| `POST /api/media/cleanup-blacklist` | 5 / minute |
|
||||
| `POST /api/media/cleanup-by-type` | 5 / minute |
|
||||
| `POST/PUT/DELETE /api/{subreddits,users,blacklist/*,settings/*,posts/*/tags}` | 60 / minute |
|
||||
|
||||
Limits are per (client IP, route path) and reset on a sliding window. For multi-worker deployments, swap the in-memory buckets for a shared backend.
|
||||
|
||||
|
|
@ -229,6 +261,8 @@ Relevant environment variables (all optional, with sensible defaults inside the
|
|||
| `RMC_CONFIG_PATH` | `/app/config.yaml` | YAML with subreddits/users/blacklist |
|
||||
| `RMC_TIMEZONE` | `UTC` | Timezone used by the scheduler |
|
||||
| `RMC_AUTH_USER` / `RMC_AUTH_PASS` | unset | Enable HTTP Basic Auth on every route when both are set |
|
||||
| `RMC_PIN` | unset | 4-6 digit PIN gating the web UI (HMAC cookie); leave unset to disable |
|
||||
| `RMC_PIN_TIMEOUT` | `600` | Idle seconds before the PIN cookie expires |
|
||||
|
||||
## Synology DSM Deployment
|
||||
|
||||
|
|
@ -257,10 +291,17 @@ Tested on DSM 7.2 with **Container Manager** on x86_64 Plus models. The publishe
|
|||
*Control Panel → Login Portal → Advanced → Reverse Proxy → Create*. Source:
|
||||
`reddit.yourdomain.com` (HTTPS:443). Destination: `localhost:8000` (HTTP). Attach a
|
||||
Let's Encrypt cert. When exposing publicly, uncomment `RMC_AUTH_USER`/`RMC_AUTH_PASS`
|
||||
in the compose file.
|
||||
in the compose file — and consider adding `RMC_PIN=NNNN` for a second factor on the UI.
|
||||
|
||||
5. **Updating:** *Container Manager → Project → reddit-media-collector → Action → Build*
|
||||
re-pulls `latest`. The `data/` volume keeps the database, scheduler history and config.
|
||||
re-pulls `latest`. The `data/` volume keeps the database, scheduler history, tags and config.
|
||||
|
||||
6. **First boot (one-time, if you already had a library):**
|
||||
to retroactively tag every existing post (subreddit / performer / genre / nsfw):
|
||||
```bash
|
||||
curl -X POST http://<nas-ip>:8000/api/tags/backfill
|
||||
```
|
||||
Idempotent — safe to re-run.
|
||||
|
||||
**Permissions note:** if the container can't write to the bind mounts, check the owner
|
||||
of `/volume1/docker/reddit-media-collector/` with `ls -ln`. Either `chown` it to a user
|
||||
|
|
@ -401,7 +442,8 @@ The web interface exposes a REST API. FastAPI also serves interactive docs at `/
|
|||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| POST | `/api/collector/run` | Trigger collection run |
|
||||
| GET | `/api/collector/status` | Collector status |
|
||||
| GET | `/api/collector/status` | Collector status (includes `last_error`) |
|
||||
| POST | `/api/collector/clear-error` | Dismiss the `last_error` banner |
|
||||
| POST | `/api/collect/individual` | Collect from a single subreddit/user |
|
||||
| GET | `/api/collect/targets` | List available targets |
|
||||
| GET | `/api/scheduler/status` | Scheduler state + next run |
|
||||
|
|
@ -409,6 +451,29 @@ The web interface exposes a REST API. FastAPI also serves interactive docs at `/
|
|||
| GET | `/api/scheduler/history` | Past scheduler runs |
|
||||
| POST | `/api/scheduler/run-now` | Execute schedule immediately |
|
||||
|
||||
### Tags
|
||||
Stash-style taxonomy. Auto-tags are recreated on every collect (idempotent); user-added tags survive reruns.
|
||||
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/api/tags?category=` | List all tags (filter by category: `performer`/`source`/`genre`/`meta`) |
|
||||
| GET | `/api/posts/{id}/tags` | Tags attached to a single post |
|
||||
| POST | `/api/posts/{id}/tags` | Attach a user-curated tag |
|
||||
| DELETE | `/api/posts/{id}/tags/{tag_id}` | Detach a tag (auto or user) |
|
||||
| POST | `/api/tags/backfill` | One-time pass to retroactively auto-tag the whole library |
|
||||
|
||||
`GET /api/media` already returns each post's `tags: [{name, category}]` array (batched join — O(1) extra queries per page).
|
||||
|
||||
The gallery filters now also accept `nsfw=all|hide|only`.
|
||||
|
||||
### Health & Auth
|
||||
| Method | Endpoint | Description |
|
||||
|--------|----------|-------------|
|
||||
| GET | `/health` | Public JSON: `db`, `ffmpeg`, `scheduler`, `downloads_writable`, `version`, `auth_enabled` |
|
||||
| GET | `/unlock` | PIN entry screen (only when `RMC_PIN` is set) |
|
||||
| POST | `/unlock` | Submit PIN; sets the signed session cookie |
|
||||
| POST | `/lock` | Invalidate the session cookie immediately |
|
||||
|
||||
## Database Schema
|
||||
|
||||
The SQLite database (`media.db`) stores all metadata:
|
||||
|
|
@ -428,8 +493,9 @@ CREATE TABLE posts (
|
|||
local_path TEXT,
|
||||
file_hash TEXT,
|
||||
permalink TEXT,
|
||||
source_type TEXT,
|
||||
flair TEXT
|
||||
source_type TEXT, -- 'subreddit' or 'user'
|
||||
flair TEXT,
|
||||
nsfw INTEGER DEFAULT 0 -- Reddit's over_18 mirrored locally
|
||||
);
|
||||
|
||||
CREATE TABLE favorites (
|
||||
|
|
@ -437,8 +503,40 @@ CREATE TABLE favorites (
|
|||
favorited_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
|
||||
FOREIGN KEY (post_id) REFERENCES posts(id)
|
||||
);
|
||||
|
||||
-- Tag taxonomy (Stash-inspired)
|
||||
CREATE TABLE tags (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
name TEXT NOT NULL,
|
||||
category TEXT, -- 'performer' | 'source' | 'genre' | 'meta'
|
||||
is_nsfw INTEGER DEFAULT 0,
|
||||
description TEXT,
|
||||
UNIQUE(name, category)
|
||||
);
|
||||
|
||||
CREATE TABLE post_tags (
|
||||
post_id TEXT NOT NULL,
|
||||
tag_id INTEGER NOT NULL,
|
||||
source TEXT DEFAULT 'auto', -- 'auto' | 'user'
|
||||
PRIMARY KEY (post_id, tag_id),
|
||||
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
|
||||
FOREIGN KEY (tag_id) REFERENCES tags(id) ON DELETE CASCADE
|
||||
);
|
||||
|
||||
-- Scheduler history (in-app cron alternative)
|
||||
CREATE TABLE scheduler_history (
|
||||
id INTEGER PRIMARY KEY AUTOINCREMENT,
|
||||
started_at TIMESTAMP,
|
||||
finished_at TIMESTAMP,
|
||||
status TEXT, -- 'success' | 'error' | 'timeout' | 'running'
|
||||
posts_processed INTEGER DEFAULT 0,
|
||||
posts_downloaded INTEGER DEFAULT 0,
|
||||
error_message TEXT
|
||||
);
|
||||
```
|
||||
|
||||
The database uses **SQLite WAL** (`PRAGMA journal_mode=WAL`) for crash safety on power loss and concurrent reads while a collect is running.
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Videos saving as .html
|
||||
|
|
@ -477,41 +575,48 @@ pip install -e ".[dev]"
|
|||
```
|
||||
reddit-media-collector/
|
||||
├── src/
|
||||
│ ├── main.py # Collector entry point
|
||||
│ ├── config.py # Configuration dataclasses
|
||||
│ ├── database.py # SQLite wrapper
|
||||
│ ├── downloader.py # Downloader with retry + dedupe
|
||||
│ ├── reddit_client.py # Reddit JSON-API client
|
||||
│ ├── sidecar.py # Immich-compatible JSON sidecars
|
||||
│ ├── extractors/ # URL extractors per host
|
||||
│ │ ├── reddit.py
|
||||
│ │ ├── imgur.py
|
||||
│ │ └── gfycat.py
|
||||
│ └── web/ # FastAPI app
|
||||
│ ├── app.py
|
||||
│ ├── auth.py # Optional HTTP Basic auth
|
||||
│ ├── main.py # Collector entry point
|
||||
│ ├── config.py # Configuration dataclasses + rotating logger
|
||||
│ ├── database.py # SQLite (WAL) + tags taxonomy + auto-tagger
|
||||
│ ├── downloader.py # Downloader with retry + dedupe
|
||||
│ ├── reddit_client.py # Reddit JSON-API client
|
||||
│ ├── sidecar.py # Immich-compatible JSON sidecars
|
||||
│ ├── extractors/ # URL extractors per host (reddit, imgur, gfycat)
|
||||
│ └── web/ # FastAPI app
|
||||
│ ├── app.py # App, lifespan, PIN-lock middleware
|
||||
│ ├── auth.py # Optional HTTP Basic auth
|
||||
│ ├── session.py # HMAC-signed PIN session cookie
|
||||
│ ├── config_manager.py
|
||||
│ ├── deps.py
|
||||
│ ├── rate_limit.py # Per-IP throttle dependency
|
||||
│ ├── rate_limit.py # Per-IP throttle dependency
|
||||
│ ├── routers/
|
||||
│ │ ├── config.py
|
||||
│ │ ├── config.py # Config CRUD (HTMX-aware fragments)
|
||||
│ │ ├── favorites.py
|
||||
│ │ ├── media.py
|
||||
│ │ ├── scheduler.py
|
||||
│ │ └── stats.py
|
||||
│ │ ├── health.py # Public /health JSON
|
||||
│ │ ├── media.py # Gallery, file serving, cleanups
|
||||
│ │ ├── scheduler.py # In-app scheduler + collector control
|
||||
│ │ ├── stats.py
|
||||
│ │ └── tags.py # Tag taxonomy CRUD + backfill
|
||||
│ ├── static/
|
||||
│ │ └── js/api.js # Shared frontend helpers
|
||||
│ │ ├── css/app.css # Extracted from inline (themed via CSS vars)
|
||||
│ │ └── js/
|
||||
│ │ ├── api.js # Shared fetch helpers
|
||||
│ │ └── app.js # All app logic
|
||||
│ └── templates/
|
||||
│ └── index.html # SPA shell
|
||||
├── tests/ # pytest (unit + API contract)
|
||||
├── downloads/ # Downloaded media (gitignored)
|
||||
├── config.yaml # Configuration
|
||||
├── media.db # SQLite database (gitignored)
|
||||
├── pyproject.toml # Project metadata + tooling config
|
||||
├── .pre-commit-config.yaml # ruff + mypy hooks
|
||||
├── .github/workflows/ci.yml # lint, types, test, docker
|
||||
│ ├── index.html # Composition: extends layout + includes
|
||||
│ ├── unlock.html # PIN entry screen
|
||||
│ └── partials/ # tab_*.html, _modal_*.html, _item_*.html, _tag.html
|
||||
├── tests/ # pytest (unit + API contract; 151 tests)
|
||||
├── downloads/ # Downloaded media (gitignored)
|
||||
├── config.yaml # Configuration
|
||||
├── pyproject.toml # Project metadata + tooling config
|
||||
├── .pre-commit-config.yaml # ruff + mypy hooks
|
||||
├── .github/workflows/
|
||||
│ ├── ci.yml # lint, types, test, docker (per PR/push)
|
||||
│ └── release.yml # GHCR publish on every v* tag
|
||||
├── Dockerfile
|
||||
├── docker-compose.yml
|
||||
├── docker-compose.yml # Local dev
|
||||
├── docker-compose.synology.yml # NAS-flavored (image from GHCR)
|
||||
└── README.md
|
||||
```
|
||||
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue