Replace the Streamlit dashboard with a modern Next.js 15 frontend using
TypeScript, Tailwind CSS, shadcn/ui components, Recharts, and TanStack
Query. All four pages (Overview, Price Battle, Product History, Basket
Compare) are fully reimplemented with responsive layouts, collapsible
sidebar navigation, and proper data fetching with caching. Adds Docker
Compose setup for db + api + frontend and removes streamlit/plotly deps.
Scrapers:
- Rewrite Tesco scraper to handle Akamai WAF and obfuscated CSS
- Fix Dunnes category discovery to top-level only (29 vs 1603)
- Rewrite Lidl parser to extract from data-grid-data JSON attributes
- Improve Aldi and SuperValu scrapers with better error handling
API:
- Add /api/search-prices endpoint for cross-store product comparison
- Fix timezone mismatch in price history endpoint (naive vs aware datetime)
- Fix scrape status filter (success/partial instead of done)
Dashboard:
- Rewrite all 4 pages to match actual API response schemas
- Fix Price Battle button state management with st.rerun()
- Add popular search buttons for real product comparison
- Add product catalogue with pagination and image support
- Fix store colour matching to use partial name matching
- Remove last_scrape from overview, add battle pie chart
Tests for RawProduct/ScrapeResult dataclasses, product name
normalizer, cross-store matcher (EAN, fuzzy, unit validation),
and FastAPI endpoints with mocked database sessions.
Product History: search products, view price trends over time
with per-store line charts and promo indicators.
Basket Compare: build shopping list, compare total cost across
all stores with item-level price breakdown.
Overview page with KPI cards, cheapest store indicator, and
recent price changes. Price Battle page with store vs store
comparison table, win percentage pie chart, and category filter.
Multi-page Streamlit app with sidebar navigation. Reusable Plotly
chart library with consistent store color scheme. Filter components
for store, category, date range, and search.
AsyncIOScheduler configured for Europe/Dublin timezone.
Runs all store scrapers sequentially at 22:00, then triggers
product matching. Includes misfire grace and standalone runner.
GET /api/products/{id}/compare for cross-store price comparison.
GET /api/battle for store ranking by cheapest wins.
POST /api/baskets/compare for shopping list cost comparison
across all stores with item-level breakdown.
GET /api/products with search, category, store filters and pagination.
GET /api/products/{id} with eager-loaded relations.
GET /api/products/{id}/prices for time-series history.
GET /api/stores, /api/categories, /api/stats for KPIs.
FastAPI application with CORS, health check, docs redirect.
Pydantic v2 schemas for all request/response models: products,
prices, comparisons, baskets, and stats.
Three-level matching strategy: exact EAN barcode match, fuzzy
name matching via rapidfuzz with unit-size cross-validation,
and batch matching for merging singleton store products.
Text normalization for cross-store matching: unit standardization
(Litre->l, Kilogram->kg), noise word removal, brand extraction
from ~70 known Irish grocery brands, and unit info parsing.
Full Playwright scraper for dunnesstores.com. Handles JS-heavy
rendering and anti-bot measures with user-agent rotation,
random delays, and cookie acceptance automation.
Playwright-based scraper for tesco.ie. Intercepts XHR API responses
for structured product data. Falls back to DOM extraction when API
interception fails. Covers all grocery categories.
Abstract BaseScraper with RawProduct/ScrapeResult dataclasses.
Handles full scrape lifecycle: category discovery, product extraction,
persistence to database, and scrape run logging. Includes
user-agent rotation and random delay utilities.
Define all tables: stores, categories, products, store_products,
price_records, scrape_runs. Include relationships and composite
index on (store_product_id, scraped_at) for time-series queries.
Set up Python project with hatch build system and all core
dependencies: FastAPI, SQLAlchemy, Playwright, httpx, Streamlit,
Plotly, rapidfuzz, APScheduler.