fix(config): resilient YAML loader (cp1252 fallback) + ASCII-only example

Cause
- v1.4.1's config.yaml.example had an em-dash (U+2014, 3-byte UTF-8 sequence)
  inside the skip_nsfw comment. When DSM File Station / classic Notepad
  re-saves the file, they convert it to a single 0x97 byte (Windows-1252
  em-dash) — which is invalid UTF-8 and crashes yaml.safe_load with
  UnicodeDecodeError, returning 500 on every GET / after PIN unlock.

Fix
- src/web/config_manager.load_config(): try utf-8 first; on
  UnicodeDecodeError, retry with cp1252 (which maps 0x97 → U+2014
  correctly so PyYAML accepts the text). Log a warning so the next
  save_config() normalizes the file back to UTF-8.
- config.yaml.example: replace em-dash with comma in the skip_nsfw comment
  so the canonical example is pure ASCII and cannot trigger this again.

Bump 1.4.1 -> 1.4.2 (patch — bugfix to a regression introduced by
1.4.1's own example file).

Verified
- 151 tests still green.
- Manual smoke: a YAML file containing raw 0x97 byte loads correctly
  via the cp1252 fallback, with warning logged. UTF-8 files unchanged.

For existing users hit by this on the NAS, the immediate workaround
is to remove the em-dash from their config.yaml's skip_nsfw comment
line (or just delete the comment entirely). After 1.4.2 the loader
handles it transparently.
This commit is contained in:
authentik Default Admin 2026-05-17 22:44:38 +01:00
parent 52c82c376c
commit f825a638cd
3 changed files with 25 additions and 5 deletions

View file

@ -26,7 +26,7 @@ download:
- "video"
- "gif"
min_score: 10 # Minimum upvotes to download
skip_nsfw: false # Set to true to skip NSFW posts (default: false — collect everything)
skip_nsfw: false # Set to true to skip NSFW posts (default: false, collects everything)
max_file_size_mb: 100 # Skip files larger than this
rate_limit:

View file

@ -7,7 +7,7 @@ packages = ["src"]
[project]
name = "reddit-media-collector"
version = "1.4.1"
version = "1.4.2"
description = "Self-hosted media collector for Reddit with Immich integration"
readme = "README.md"
requires-python = ">=3.11"

View file

@ -1,5 +1,6 @@
"""Configuration file manager for CRUD operations."""
import logging
import os
from pathlib import Path
from typing import Any
@ -8,14 +9,33 @@ import yaml
CONFIG_PATH = Path(os.environ.get("RMC_CONFIG_PATH", str(Path(__file__).parent.parent.parent / "config.yaml")))
_log = logging.getLogger(__name__)
def load_config() -> dict[str, Any]:
"""Load configuration from YAML file."""
"""Load configuration from YAML file.
Tries UTF-8 first (canonical); falls back to cp1252 (Windows-1252) if
DSM File Station, classic Notepad or a similar editor re-saved the file
with a legacy single-byte encoding. cp1252 maps bytes like 0x97 to the
proper Unicode em-dash so PyYAML accepts the text. The next
save_config() will normalize the file back to UTF-8.
"""
if not CONFIG_PATH.exists():
return {"targets": {"subreddits": [], "users": []}}
with open(CONFIG_PATH, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
try:
with open(CONFIG_PATH, encoding="utf-8") as f:
return yaml.safe_load(f) or {}
except UnicodeDecodeError as e:
_log.warning(
"%s contains non-UTF-8 bytes (%s); falling back to cp1252. "
"Next save_config() will normalize the file to UTF-8.",
CONFIG_PATH,
e,
)
with open(CONFIG_PATH, encoding="cp1252") as f:
return yaml.safe_load(f) or {}
def save_config(config: dict[str, Any]) -> None: