Job-Tracker/README.md
Richard Nixon 5890f2bbf7 docs: rename to Job Tracker and add architecture + lifecycle diagrams
- Renamed the H1 from 'Job Agent' to 'Job Tracker' to match the published
  repo name.
- Added two Mermaid diagrams up front under 'How it works':
  - Architecture diagram: Next.js (RSC + API) ↔ Postgres+pgvector ↔
    Claude CLI ↔ local embedder, with HNSW cosine retrieval and the
    storage/ folder for uploads + generated PDFs.
  - Lifecycle diagram: end-to-end flow from profile setup → application
    creation → auto-indexing → generation (CV / cover letter / interview
    prep / emails) → review with ATS scoring → approve → pipeline tracking
    with unified feed, contacts, and per-company research → outcome →
    feedback loop that reindexes positive outcomes for future RAG
    retrieval.

Both Gitea and GitHub render Mermaid natively in markdown.
2026-05-24 20:59:58 +01:00

22 KiB

Job Tracker

EN — Personal job-application command center: full-pipeline tracking, AI-generated CVs/cover letters/emails, per-company research log, recruiter networking, and an integrated event feed for every application. Postgres + pgvector RAG, Next.js 16.

PT — Centro de comando pessoal para busca de emprego: pipeline completo de candidaturas, geração de CV/cover letter/email com IA, log de pesquisa por empresa, networking de recrutadores e feed unificado de eventos para cada vaga. RAG com Postgres + pgvector, Next.js 16.


How it works

Architecture

flowchart LR
    User([👤 User])

    subgraph App["Next.js 16 (App Router + RSC)"]
        UI[Pages /applications /contacts /profile<br/>dashboard, raio-x, settings]
        API[API routes<br/>async route handlers]
    end

    subgraph Storage["Persistence"]
        PG[(Postgres 16<br/>+ pgvector)]
        FS[storage/uploads<br/>storage/generated]
    end

    subgraph AI["AI subsystems"]
        Claude[Claude CLI<br/>subprocess]
        Embedder[Local embedder<br/>all-MiniLM-L6-v2<br/>384 dims]
    end

    User -->|HTTPS| UI
    UI -->|fetch| API
    API <-->|Drizzle ORM<br/>postgres-js| PG
    API -->|stdin/stdout| Claude
    API -->|index on write| Embedder
    Embedder -->|vector embeddings| PG
    API -->|PDFs, uploaded docs| FS
    PG -.->|HNSW cosine `<=>`<br/>top-K chunks| API

Lifecycle of one job application

flowchart TD
    P([Set up profile<br/>experience, skills, documents])
    A[Add application<br/>+ paste job description]
    I[Auto-index JD chunks<br/>into pgvector]
    G{Generate document?}
    CV[CV<br/>profile + RAG + ATS keywords → Claude]
    CL[Cover letter<br/>profile + RAG → Claude]
    PR[Interview prep<br/>STAR answers]
    EM[Email<br/>follow-up / thank-you / withdraw]
    R[Review + ATS score]
    F{Refine?}
    AP[Approve]
    K[Index approved CV/CL<br/>for future RAG retrieval]
    T[Track pipeline status]
    FD[Unified feed<br/>notes · stages · contact interactions]
    CO[Link contacts<br/>recruiter, interviewer, hiring manager]
    RES[Per-company research<br/>news · culture · glassdoor · tech stack]
    OUT([Outcome:<br/>interview · offer · rejected])
    LOOP[Feedback loop<br/>positive outcomes reindex<br/>improving future generations]

    P --> A
    A --> I
    I --> G
    G --> CV
    G --> CL
    G --> PR
    G --> EM
    CV --> R
    CL --> R
    R --> F
    F -->|yes, with feedback| G
    F -->|no| AP
    AP --> K
    AP --> T
    T --> FD
    T --> CO
    T --> RES
    T --> OUT
    OUT --> LOOP
    LOOP --> I

Table of contents


Features

Application pipeline

  • Status machine: draft → applied → screening → interview → offer → accepted (plus rejected, withdrawn, ghosted)
  • Per-application "raio-x" view that bundles everything in one page: the job, the timeline, linked contacts, company research, other applications at the same employer
  • Auto-reminders fired on key transitions (follow-up 7 days after applied, prep on interview, decide on offer)

Unified event feed

Every application has a single timeline mixing four event types — filter or scroll, no separate logs:

  • Status changes — automatic when you move a card through the pipeline
  • Notes — free-form observations ("recruiter said decision next week")
  • Interview stages — structured (title, scheduled date, outcome: pending/passed/rejected)
  • Contact interactions — log a touchpoint with a specific recruiter/interviewer

AI generation (via Claude CLI)

  • CV tailored to the job description, ATS-aware (keyword extraction + scoring)
  • Cover letter grounded in your real experience (no hallucinated tenure)
  • Refinement loop — feed back specific changes ("emphasize Python", "add quantified achievements") to regenerate
  • Interview prep — STAR-format answers, likely questions, "questions to ask the interviewer"
  • Emails — follow-up, thank-you, withdrawal templates
  • Configurable system prompts (per generation type) editable at /settings

Contacts & networking

  • Contacts tracked separately from applications, linked via a many-to-many junction (contact_application) — the same recruiter can appear across every role they put you in front of
  • Per-junction role (recruiter, interviewer, hiring_manager, referrer) so a person can have different functions in different processes
  • Free-text or autocompleted company field on the contact form (auto-creates the company if it's new)

Per-company research

  • Typed entries (news, culture, tech_stack, glassdoor, interview_experience, compensation, general) with source URL and markdown content
  • Visible on every application page for that company → context every time you reopen the role
  • Indexable by the RAG system (sourceType='company_research') so future generations can use the research

RAG (Retrieval-Augmented Generation)

  • Local embeddings (Xenova/all-MiniLM-L6-v2, 384 dims, ~23 MB model, no API key)
  • Stored in Postgres as vector(384) with an HNSW cosine index
  • Sources indexed: uploaded documents, experience entries, application JDs, generated CVs/cover letters with positive feedback, company research
  • Native pgvector <=> similarity search — milliseconds even with tens of thousands of chunks

Dashboard

  • Stat cards: total / active / interview rate / offer rate / avg response days
  • Application funnel (horizontal bars with conversion percentages, semaphore-colored)
  • Action items: stale applications, draft list, interviews to prep, pending feedback, due reminders
  • Weekly velocity with a configurable target (saved to localStorage)
  • Response rate by job-board domain (LinkedIn vs Indeed vs direct, etc.)
  • Time-in-stage for active applications
  • CV performance: which generated CVs led to interviews/offers

Knowledge base

  • All feedback entries (CV outcomes) aggregated; quality ratings + notes used to inform future generations via RAG
  • Positive outcomes (interview / offer) automatically reindexed into the RAG store

Tech stack

  • Framework — Next.js 16 (App Router, Turbopack), React 19
  • Database — Postgres 16 + pgvector (Docker compose ships a local instance)
  • ORM — Drizzle (postgres-js driver, async)
  • AI — Claude CLI subprocess (no SDK, no API key in the app itself)
  • Embeddings@huggingface/transformers running all-MiniLM-L6-v2 locally
  • PDF parsingpdf-parse
  • PDF rendering@react-pdf/renderer
  • UI — Tailwind v4, Lucide icons, react-markdown

Quick start

Prerequisites

  • Node.js 20+
  • Docker (for Postgres) or your own Postgres 16+ with pgvector
  • Claude CLI installed and authenticated (claude on PATH)

Setup

# 1. Install deps
npm install

# 2. Start Postgres 16 + pgvector (port 5433, defaults from docker-compose.yml)
docker compose up -d

# 3. Apply schema
npm run db:migrate

# 4. Configure env (.env.local is committed-ignored)
cat > .env.local <<EOF
DATABASE_URL=postgres://jobagent:jobagent@127.0.0.1:5433/job_agent
STORAGE_DIR=./storage
EOF

# 5. Run the dev server
npm run dev

Open http://localhost:3000.

First-time setup: go to /profile and fill in your basics (name, summary, experience, education, skills). Anything you skip here weakens every AI-generated document, because the prompts inject the profile as ground truth.


Usage walkthrough

1. Create your profile (one-time)

/profile — fill in personal info, summary, work experience (one entry per role), education, skills, certifications, languages. Upload any reference documents (existing CVs, project descriptions, performance reviews). Documents are automatically chunked and indexed into the RAG store on upload.

2. Add an application

/applications/new — paste the job description and basic company info. The app:

  • Creates the company (or links to an existing one)
  • Adds the application in draft status
  • Indexes the JD into the RAG store
  • Auto-extracts ATS-relevant keywords on demand

3. Generate documents

On any application, click Generate CV (/applications/[id]/generate):

  • Pick CV or Cover Letter
  • The system builds a context from your profile + RAG-retrieved chunks (most relevant experiences, document snippets, past successful CVs) + the target job description + top ATS keywords
  • Claude returns a markdown document
  • Review the ATS score and missing keywords — click Regenerate with missing keywords to retry, or write specific refinement feedback ("more quantified achievements")
  • Approve when satisfied — that marks the CV as the canonical version for the application and auto-creates a feedback row

4. Manage the pipeline on the application page

/applications/[id] — the raio-x view:

  • Move the status forward with the buttons under the status pill
  • Add events to the timeline: notes, interview stages (with scheduled date + outcome), contact interactions (linked to a specific contact)
  • Link contacts to the application (Contacts panel → pick from existing contacts)
  • Add per-company research (Research panel → news / culture / tech_stack / etc.)
  • See other applications at the same company (Other applications panel)
  • Generate follow-up / thank-you / withdrawal emails from the Email Templates section

5. Track contacts

/contacts — list view grouped by company. Edit any contact inline (pencil icon on hover). The company field autocompletes from existing companies; type a new name and it's auto-created.

6. Configure AI prompts (optional)

/settings — edit the system prompts for each generation type (CV, cover letter, interview prep, follow-up email, etc.). Each has a "Reset to default" button. Useful when you want to bias toward a specific style or industry.

7. Check the dashboard

/ — overview of pipeline health. The Action Items panel is the highest-signal thing: stale follow-ups, drafts you forgot to send, due reminders.


Project structure

src/
├── app/
│   ├── api/                                 # Route handlers
│   │   ├── applications/
│   │   │   ├── route.ts                     # GET (list) / POST (create)
│   │   │   └── [id]/
│   │   │       ├── route.ts                 # GET/PUT/PATCH/DELETE
│   │   │       ├── feedback/route.ts        # CV/CL outcome tracking
│   │   │       ├── generate-cv/route.ts     # AI: tailored CV
│   │   │       ├── generate-cl/route.ts     # AI: cover letter
│   │   │       ├── generate-email/route.ts  # AI: follow-up / thank-you / withdraw
│   │   │       ├── interview-prep/route.ts  # AI: STAR prep
│   │   │       ├── keywords/route.ts        # ATS keyword extraction + scoring
│   │   │       ├── events/                  # Unified event feed
│   │   │       └── contacts/route.ts        # Link / unlink contacts
│   │   ├── companies/[id]/research/         # Per-company research CRUD
│   │   ├── companies/route.ts               # List companies (for autocomplete)
│   │   ├── contacts/route.ts                # Contact CRUD
│   │   ├── dashboard/stats/route.ts         # All dashboard aggregations
│   │   ├── documents/route.ts               # Upload + delete + autoindex
│   │   ├── education/, experience/, skills/ # Profile sub-resources
│   │   ├── generated/[id]/route.ts          # Approve / edit a generated doc
│   │   ├── knowledge/route.ts               # Aggregated feedback view
│   │   ├── profile/route.ts                 # Profile upsert
│   │   ├── rag/reindex/route.ts             # Trigger / inspect RAG index
│   │   ├── reminders/route.ts               # Reminder CRUD
│   │   └── settings/prompts/route.ts        # Configurable AI system prompts
│   ├── applications/                        # Pipeline UI
│   ├── contacts/                            # Networking UI
│   ├── knowledge/                           # Feedback / knowledge browser
│   ├── profile/                             # Profile editor
│   ├── settings/                            # Prompt editor
│   └── page.tsx                             # Dashboard
├── components/
│   ├── applications/application-form.tsx    # Shared form for new + edit flows
│   └── layout/                              # PageShell, Sidebar
├── db/
│   ├── schema.ts                            # Drizzle pg-core schema
│   ├── index.ts                             # postgres-js client (lazy proxy)
│   └── queries/                             # Async query modules per domain
└── lib/
    ├── ats/                                 # Keyword extraction, matching, ATS scoring
    ├── claude/                              # Prompt builders + CLI wrapper
    │   ├── client.ts                        # Claude CLI subprocess
    │   ├── context-builder.ts               # Profile + RAG → GenerationContext
    │   ├── prompts.ts                       # CV / refinement / cover letter prompts
    │   ├── email-prompts.ts                 # Email prompt builders
    │   └── interview-prompts.ts             # Interview prep prompt
    ├── rag/
    │   ├── embedder.ts                      # all-MiniLM-L6-v2 singleton
    │   ├── chunker.ts                       # Text chunking strategies
    │   ├── indexer.ts                       # Per-source-type indexing
    │   └── retriever.ts                     # pgvector cosine search
    ├── validators/                          # Zod schemas
    └── status-machine.ts                    # Allowed status transitions
scripts/
├── dump-sqlite.mjs                          # Legacy: dump old SQLite into JSON
└── restore-postgres.mjs                     # Restore JSON dump into Postgres
drizzle/migrations/                          # Generated SQL migrations
docker-compose.yml                           # Postgres 16 + pgvector

Database

Schema lives in src/db/schema.ts. Drizzle generates migrations from it.

npm run db:generate    # Diff schema → new SQL migration
npm run db:migrate     # Apply pending migrations
npm run db:studio      # Open Drizzle Studio (web DB browser)

Key tables

Table Purpose
profile Single-user personal info (always id = 1)
experience, education, skill, certification, language, document Profile sub-resources
company Companies (auto-created when adding apps or contacts)
company_research Per-company research entries (news, culture, glassdoor, …)
application Job applications with status
application_event Unified event feed (event_typestatus_change / stage / note / contact_interaction)
contact Recruiters / interviewers / referrers
contact_application M:N junction; per-junction role
generated_cv, generated_cover_letter AI-generated documents (markdown + metadata)
generation_feedback Outcome tracking (interview / offer / rejected / no_response) + quality rating
embedding_chunk RAG index — vector(384) with HNSW cosine index
reminder Auto + manual reminders
prompt_config Editable system prompts per generation type

Migrating from the legacy SQLite version

If you're upgrading from the pre-Postgres version of this repo:

# 1. Dump SQLite → JSON (reads ./storage/db/job-agent.db)
node scripts/dump-sqlite.mjs

# 2. Start Postgres + apply fresh schema
docker compose up -d
npm run db:migrate

# 3. Restore the dump (preserves IDs, converts bool/json/vector columns)
node --env-file=.env.local scripts/restore-postgres.mjs

RAG system

The retriever uses pgvector's <=> (cosine distance) operator against the HNSW index on embedding_chunk.embedding for native, millisecond-latency similarity search — no JS-side scoring loop.

How it works

  1. Indexing — text from documents / experiences / JDs / approved generations / research is split into chunks and embedded via all-MiniLM-L6-v2 (runs locally, 384 dims)
  2. Storage — vectors stored directly in Postgres as vector(384); metadata as jsonb
  3. Retrieval — query text is embedded, then ORDER BY embedding <=> :query::vector LIMIT k returns top-K, indexed by HNSW
  4. Score filtering — results below minScore (default 0.25 cosine similarity) are dropped

What gets indexed and when

Source type Chunking Indexed when
document Paragraphs ~500 chars, light overlap On upload (fire-and-forget)
experience One chunk per entry On create / update
application Paragraphs ~500 chars On application create / JD update
generated_cv By markdown ## section On feedback with interview or offer outcome
generated_cl By markdown ## section Same as above
company_research Single chunk (per entry) On research entry create

Management

  • Auto: every relevant write fires an indexing job (non-blocking)
  • Manual rebuild: POST /api/rag/reindex or the Rebuild Index button on /knowledge
  • Stats: GET /api/rag/reindex returns counts by source type

API reference

Method Endpoint Notes
GET / POST /api/applications List with optional ?status= filter; create
GET / PUT / PATCH / DELETE /api/applications/[id] Full app + bundled contacts, research, related apps. PATCH for status change
POST /api/applications/[id]/generate-cv Body { previousContent?, refinementFeedback? } for refinement loop
POST /api/applications/[id]/generate-cl Cover letter
POST /api/applications/[id]/generate-email Body { emailType, notes? }follow_up / thank_you / withdraw
POST /api/applications/[id]/interview-prep Saves the result to application.interview_prep
GET / POST /api/applications/[id]/keywords GET = extract from JD; POST { cvText } = score CV against JD
GET / POST / PATCH /api/applications/[id]/feedback Outcome + quality tracking
POST / DELETE /api/applications/[id]/events Add event (note / stage / contact_interaction); DELETE via events/[eventId]
POST /api/applications/[id]/contacts Link a contact (body { contactId, role? }); DELETE ?contactId= to unlink
GET /api/companies List for autocomplete
GET / POST /api/companies/[id]/research Research entries (POST body: { type, title, content, sourceUrl? })
PATCH / DELETE /api/companies/[id]/research/[researchId] Edit / delete
GET / POST / PUT / DELETE /api/contacts CRUD; companyName field is resolved to an existing or new company via findOrCreate
GET / POST / PUT / DELETE /api/profile /api/experience /api/education /api/skills /api/documents Profile sub-resources
GET / PATCH /api/generated/[id] View / approve / edit content of a generated doc
GET / POST / PATCH /api/reminders List due + upcoming; create; complete/delete via PATCH
GET / PUT /api/settings/prompts Configurable AI system prompts
GET / PATCH /api/knowledge Aggregated feedback view
GET / POST /api/rag/reindex Stats / full rebuild
GET /api/dashboard/stats All dashboard aggregations in one call

Configurable AI prompts

Every generation type has its system prompt stored in prompt_config and editable at /settings. Defaults live in src/db/queries/prompt-configs.ts and are seeded on first read.

Key What it generates
cv_generation Tailored CV
cv_refinement CV refinement (when you give feedback to regenerate)
cover_letter Cover letter
interview_prep STAR-format interview prep
email_follow_up Follow-up after applying
email_thank_you Thank-you after interview
email_withdraw Withdrawal email

Each row stores both system_prompt (current) and default_prompt (factory default), so "Reset to default" is always available.


Data export / migration

# Dump current Postgres to JSON (useful for backups / moving to another machine)
docker exec job-agent-postgres pg_dump -U jobagent job_agent > storage/db/backup-$(date +%Y%m%d).sql

# Restore on a fresh container
docker exec -i job-agent-postgres psql -U jobagent -d job_agent < storage/db/backup-YYYYMMDD.sql

The two scripts/*.mjs helpers handle the SQLite → Postgres migration path; once you're on Postgres, prefer pg_dump / psql for snapshots.


Development notes

  • Every DB query is async — there's no sync API. If you add a new query, declare the function async and await all Drizzle calls.
  • The db export in src/db/index.ts is a lazy proxy; the postgres-js client is only created on first use, so importing db is free at build time.
  • The Claude CLI is invoked as a subprocess. Make sure claude --version works in the same shell as npm run dev.
  • The embedding model downloads on first use (~23 MB) into ~/.cache/huggingface. Subsequent runs are instant.
  • All form inputs use light-only color schemes (color-scheme: light set on :root in globals.css) because the UI wasn't designed for dark mode. Per-input bg-white is set explicitly to prevent UA dark form styling from leaking through.
  • Type-check before committing: npx tsc --noEmit.

License

Personal project — use freely, no warranty.