Commit graph

27 commits

Author SHA1 Message Date
d891a88f0c Update README with project overview and status 2026-02-11 21:48:22 +00:00
ca5a2712b6 Add Product Admin page with merge, edit, and unlink
- Backend: 5 new admin endpoints (unmatched listing, store-products,
  product update, merge, unlink) with Pydantic schemas
- Frontend: Admin page with searchable table, checkbox multi-select
  for merging, inline product editing dialog, expandable store
  products panel with unlink support, and floating merge bar
- Install shadcn dialog, checkbox, label, and alert-dialog components
- Add Product Admin link to sidebar navigation
2026-02-11 20:34:56 +00:00
20f7c76cdf Migrate frontend from Streamlit to Next.js with shadcn/ui
Replace the Streamlit dashboard with a modern Next.js 15 frontend using
TypeScript, Tailwind CSS, shadcn/ui components, Recharts, and TanStack
Query. All four pages (Overview, Price Battle, Product History, Basket
Compare) are fully reimplemented with responsive layouts, collapsible
sidebar navigation, and proper data fetching with caching. Adds Docker
Compose setup for db + api + frontend and removes streamlit/plotly deps.
2026-02-11 18:12:19 +00:00
82430864f7 Fix scrapers, dashboard pages, and API for production use
Scrapers:
- Rewrite Tesco scraper to handle Akamai WAF and obfuscated CSS
- Fix Dunnes category discovery to top-level only (29 vs 1603)
- Rewrite Lidl parser to extract from data-grid-data JSON attributes
- Improve Aldi and SuperValu scrapers with better error handling

API:
- Add /api/search-prices endpoint for cross-store product comparison
- Fix timezone mismatch in price history endpoint (naive vs aware datetime)
- Fix scrape status filter (success/partial instead of done)

Dashboard:
- Rewrite all 4 pages to match actual API response schemas
- Fix Price Battle button state management with st.rerun()
- Add popular search buttons for real product comparison
- Add product catalogue with pagination and image support
- Fix store colour matching to use partial name matching
- Remove last_scrape from overview, add battle pie chart
2026-02-11 09:52:14 +00:00
f9c4389f5a Add test suite for scrapers, matcher, and API
Tests for RawProduct/ScrapeResult dataclasses, product name
normalizer, cross-store matcher (EAN, fuzzy, unit validation),
and FastAPI endpoints with mocked database sessions.
2026-02-11 08:44:19 +00:00
8feea63abe Add Product History and Basket Compare pages
Product History: search products, view price trends over time
with per-store line charts and promo indicators.
Basket Compare: build shopping list, compare total cost across
all stores with item-level price breakdown.
2026-02-10 10:18:35 +00:00
c41bf56c68 Add Overview and Price Battle dashboard pages
Overview page with KPI cards, cheapest store indicator, and
recent price changes. Price Battle page with store vs store
comparison table, win percentage pie chart, and category filter.
2026-02-09 14:52:08 +00:00
eb4164c289 Add Streamlit dashboard framework and chart components
Multi-page Streamlit app with sidebar navigation. Reusable Plotly
chart library with consistent store color scheme. Filter components
for store, category, date range, and search.
2026-02-08 11:37:22 +00:00
e81c43a5b8 Add APScheduler daily scrape job
AsyncIOScheduler configured for Europe/Dublin timezone.
Runs all store scrapers sequentially at 22:00, then triggers
product matching. Includes misfire grace and standalone runner.
2026-02-06 09:15:47 +00:00
dcd22e023c Add comparison and basket API endpoints
GET /api/products/{id}/compare for cross-store price comparison.
GET /api/battle for store ranking by cheapest wins.
POST /api/baskets/compare for shopping list cost comparison
across all stores with item-level breakdown.
2026-02-05 16:28:14 +00:00
0d7855dff3 Add product and price API endpoints
GET /api/products with search, category, store filters and pagination.
GET /api/products/{id} with eager-loaded relations.
GET /api/products/{id}/prices for time-series history.
GET /api/stores, /api/categories, /api/stats for KPIs.
2026-02-03 13:42:56 +00:00
b04e20b2b2 Add FastAPI app setup and Pydantic schemas
FastAPI application with CORS, health check, docs redirect.
Pydantic v2 schemas for all request/response models: products,
prices, comparisons, baskets, and stats.
2026-02-01 10:05:18 +00:00
c4ed3c5263 Add cross-store product matcher
Three-level matching strategy: exact EAN barcode match, fuzzy
name matching via rapidfuzz with unit-size cross-validation,
and batch matching for merging singleton store products.
2026-01-29 18:10:33 +00:00
94680b3575 Add product name normalizer
Text normalization for cross-store matching: unit standardization
(Litre->l, Kilogram->kg), noise word removal, brand extraction
from ~70 known Irish grocery brands, and unit info parsing.
2026-01-26 14:55:41 +00:00
327ba145f5 Add SuperValu scraper with auto-login
Playwright scraper for shop.supervalu.ie. Handles automatic
authentication flow before browsing categories. Requires
SUPERVALU_EMAIL and SUPERVALU_PASSWORD env vars.
2026-01-23 11:22:05 +00:00
89846eca08 Add Dunnes Stores scraper
Full Playwright scraper for dunnesstores.com. Handles JS-heavy
rendering and anti-bot measures with user-agent rotation,
random delays, and cookie acceptance automation.
2026-01-19 17:33:28 +00:00
086c93748d Add Lidl Ireland scraper
HTTP + Playwright scraper for lidl.ie. Handles main catalog
and weekly changing offers. Parses product grid pages for
pricing and availability data.
2026-01-15 20:05:52 +00:00
ff88f095cc Add Aldi Ireland scraper
HTTP-based scraper for aldi.ie with Playwright fallback for
special offers. Covers main product categories and weekly specials.
2026-01-12 15:40:09 +00:00
f57d4512bb Add Tesco Ireland scraper
Playwright-based scraper for tesco.ie. Intercepts XHR API responses
for structured product data. Falls back to DOM extraction when API
interception fails. Covers all grocery categories.
2026-01-08 19:12:44 +00:00
05ac7ab112 Add base scraper framework
Abstract BaseScraper with RawProduct/ScrapeResult dataclasses.
Handles full scrape lifecycle: category discovery, product extraction,
persistence to database, and scrape run logging. Includes
user-agent rotation and random delay utilities.
2026-01-05 13:28:17 +00:00
afbb275104 Add seed script for stores and categories
Seed 5 Irish supermarkets (Tesco, Dunnes, SuperValu, Aldi, Lidl)
and 15 product categories. Idempotent - checks for existing records
before inserting.
2026-01-02 09:50:33 +00:00
9612dcb351 Add Alembic migration setup
Configure Alembic with env.py targeting all ORM models.
Ready for autogenerate migrations.
2025-12-31 11:15:20 +00:00
b2cc932900 Add SQLAlchemy ORM models
Define all tables: stores, categories, products, store_products,
price_records, scrape_runs. Include relationships and composite
index on (store_product_id, scraped_at) for time-series queries.
2025-12-30 16:33:45 +00:00
ad30bbe60b Add core config and database module
Pydantic-settings based configuration with env file support.
Async SQLAlchemy engine setup with asyncpg driver.
2025-12-28 10:45:12 +00:00
058d01cb78 Add project setup: pyproject.toml, .gitignore, .env.example
Set up Python project with hatch build system and all core
dependencies: FastAPI, SQLAlchemy, Playwright, httpx, Streamlit,
Plotly, rapidfuzz, APScheduler.
2025-12-26 14:22:31 +00:00
cac57ee299
Add docker-rootless.sh for Docker rootless setup 2025-12-24 09:36:21 +00:00
066e3144c2
Initial commit 2025-12-24 09:13:21 +00:00