reddit-media-collector/config.yaml.example
Richard Nixon f825a638cd fix(config): resilient YAML loader (cp1252 fallback) + ASCII-only example
Cause
- v1.4.1's config.yaml.example had an em-dash (U+2014, 3-byte UTF-8 sequence)
  inside the skip_nsfw comment. When DSM File Station / classic Notepad
  re-saves the file, they convert it to a single 0x97 byte (Windows-1252
  em-dash) — which is invalid UTF-8 and crashes yaml.safe_load with
  UnicodeDecodeError, returning 500 on every GET / after PIN unlock.

Fix
- src/web/config_manager.load_config(): try utf-8 first; on
  UnicodeDecodeError, retry with cp1252 (which maps 0x97 → U+2014
  correctly so PyYAML accepts the text). Log a warning so the next
  save_config() normalizes the file back to UTF-8.
- config.yaml.example: replace em-dash with comma in the skip_nsfw comment
  so the canonical example is pure ASCII and cannot trigger this again.

Bump 1.4.1 -> 1.4.2 (patch — bugfix to a regression introduced by
1.4.1's own example file).

Verified
- 151 tests still green.
- Manual smoke: a YAML file containing raw 0x97 byte loads correctly
  via the cp1252 fallback, with warning logged. UTF-8 files unchanged.

For existing users hit by this on the NAS, the immediate workaround
is to remove the em-dash from their config.yaml's skip_nsfw comment
line (or just delete the comment entirely). After 1.4.2 the loader
handles it transparently.
2026-05-17 22:44:38 +01:00

39 lines
1.1 KiB
Text

# Reddit Media Collector - Configuration
# Copy this file to config.yaml and customize as needed
# No Reddit API credentials required - uses public JSON endpoints
targets:
# Subreddits to collect from
subreddits:
- name: "pics"
limit: 25
sort: "hot" # hot, new, top, rising
- name: "earthporn"
limit: 50
sort: "top"
time_filter: "week" # hour, day, week, month, year, all (only for "top" sort)
# Users to collect from (their submitted posts)
users: []
# - name: "example_user"
# limit: 30
download:
output_dir: "./downloads"
media_types:
- "image"
- "video"
- "gif"
min_score: 10 # Minimum upvotes to download
skip_nsfw: false # Set to true to skip NSFW posts (default: false, collects everything)
max_file_size_mb: 100 # Skip files larger than this
rate_limit:
# Be gentle! Public API has stricter limits than authenticated
requests_per_minute: 10 # Keep low to avoid 429 errors
download_delay_seconds: 2 # Delay between file downloads
logging:
level: "INFO" # DEBUG, INFO, WARNING, ERROR
file: "collector.log"