fix(config): resilient YAML loader (cp1252 fallback) + ASCII-only example
Cause - v1.4.1's config.yaml.example had an em-dash (U+2014, 3-byte UTF-8 sequence) inside the skip_nsfw comment. When DSM File Station / classic Notepad re-saves the file, they convert it to a single 0x97 byte (Windows-1252 em-dash) — which is invalid UTF-8 and crashes yaml.safe_load with UnicodeDecodeError, returning 500 on every GET / after PIN unlock. Fix - src/web/config_manager.load_config(): try utf-8 first; on UnicodeDecodeError, retry with cp1252 (which maps 0x97 → U+2014 correctly so PyYAML accepts the text). Log a warning so the next save_config() normalizes the file back to UTF-8. - config.yaml.example: replace em-dash with comma in the skip_nsfw comment so the canonical example is pure ASCII and cannot trigger this again. Bump 1.4.1 -> 1.4.2 (patch — bugfix to a regression introduced by 1.4.1's own example file). Verified - 151 tests still green. - Manual smoke: a YAML file containing raw 0x97 byte loads correctly via the cp1252 fallback, with warning logged. UTF-8 files unchanged. For existing users hit by this on the NAS, the immediate workaround is to remove the em-dash from their config.yaml's skip_nsfw comment line (or just delete the comment entirely). After 1.4.2 the loader handles it transparently.
This commit is contained in:
parent
52c82c376c
commit
f825a638cd
3 changed files with 25 additions and 5 deletions
|
|
@ -26,7 +26,7 @@ download:
|
|||
- "video"
|
||||
- "gif"
|
||||
min_score: 10 # Minimum upvotes to download
|
||||
skip_nsfw: false # Set to true to skip NSFW posts (default: false — collect everything)
|
||||
skip_nsfw: false # Set to true to skip NSFW posts (default: false, collects everything)
|
||||
max_file_size_mb: 100 # Skip files larger than this
|
||||
|
||||
rate_limit:
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ packages = ["src"]
|
|||
|
||||
[project]
|
||||
name = "reddit-media-collector"
|
||||
version = "1.4.1"
|
||||
version = "1.4.2"
|
||||
description = "Self-hosted media collector for Reddit with Immich integration"
|
||||
readme = "README.md"
|
||||
requires-python = ">=3.11"
|
||||
|
|
|
|||
|
|
@ -1,5 +1,6 @@
|
|||
"""Configuration file manager for CRUD operations."""
|
||||
|
||||
import logging
|
||||
import os
|
||||
from pathlib import Path
|
||||
from typing import Any
|
||||
|
|
@ -8,14 +9,33 @@ import yaml
|
|||
|
||||
CONFIG_PATH = Path(os.environ.get("RMC_CONFIG_PATH", str(Path(__file__).parent.parent.parent / "config.yaml")))
|
||||
|
||||
_log = logging.getLogger(__name__)
|
||||
|
||||
|
||||
def load_config() -> dict[str, Any]:
|
||||
"""Load configuration from YAML file."""
|
||||
"""Load configuration from YAML file.
|
||||
|
||||
Tries UTF-8 first (canonical); falls back to cp1252 (Windows-1252) if
|
||||
DSM File Station, classic Notepad or a similar editor re-saved the file
|
||||
with a legacy single-byte encoding. cp1252 maps bytes like 0x97 to the
|
||||
proper Unicode em-dash so PyYAML accepts the text. The next
|
||||
save_config() will normalize the file back to UTF-8.
|
||||
"""
|
||||
if not CONFIG_PATH.exists():
|
||||
return {"targets": {"subreddits": [], "users": []}}
|
||||
|
||||
with open(CONFIG_PATH, encoding="utf-8") as f:
|
||||
return yaml.safe_load(f) or {}
|
||||
try:
|
||||
with open(CONFIG_PATH, encoding="utf-8") as f:
|
||||
return yaml.safe_load(f) or {}
|
||||
except UnicodeDecodeError as e:
|
||||
_log.warning(
|
||||
"%s contains non-UTF-8 bytes (%s); falling back to cp1252. "
|
||||
"Next save_config() will normalize the file to UTF-8.",
|
||||
CONFIG_PATH,
|
||||
e,
|
||||
)
|
||||
with open(CONFIG_PATH, encoding="cp1252") as f:
|
||||
return yaml.safe_load(f) or {}
|
||||
|
||||
|
||||
def save_config(config: dict[str, Any]) -> None:
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue