Richard Nixon 5890f2bbf7 docs: rename to Job Tracker and add architecture + lifecycle diagrams

- Renamed the H1 from 'Job Agent' to 'Job Tracker' to match the published
  repo name.
- Added two Mermaid diagrams up front under 'How it works':
  - Architecture diagram: Next.js (RSC + API) ↔ Postgres+pgvector ↔
    Claude CLI ↔ local embedder, with HNSW cosine retrieval and the
    storage/ folder for uploads + generated PDFs.
  - Lifecycle diagram: end-to-end flow from profile setup → application
    creation → auto-indexing → generation (CV / cover letter / interview
    prep / emails) → review with ATS scoring → approve → pipeline tracking
    with unified feed, contacts, and per-company research → outcome →
    feedback loop that reindexes positive outcomes for future RAG
    retrieval.

Both Gitea and GitHub render Mermaid natively in markdown.

2026-05-24 20:59:58 +01:00

22 KiB

Raw Permalink Blame History

Job Tracker

EN — Personal job-application command center: full-pipeline tracking, AI-generated CVs/cover letters/emails, per-company research log, recruiter networking, and an integrated event feed for every application. Postgres + pgvector RAG, Next.js 16.

PT — Centro de comando pessoal para busca de emprego: pipeline completo de candidaturas, geração de CV/cover letter/email com IA, log de pesquisa por empresa, networking de recrutadores e feed unificado de eventos para cada vaga. RAG com Postgres + pgvector, Next.js 16.

How it works

Architecture

flowchart LR
    User([👤 User])

    subgraph App["Next.js 16 (App Router + RSC)"]
        UI[Pages /applications /contacts /profile<br/>dashboard, raio-x, settings]
        API[API routes<br/>async route handlers]
    end

    subgraph Storage["Persistence"]
        PG[(Postgres 16<br/>+ pgvector)]
        FS[storage/uploads<br/>storage/generated]
    end

    subgraph AI["AI subsystems"]
        Claude[Claude CLI<br/>subprocess]
        Embedder[Local embedder<br/>all-MiniLM-L6-v2<br/>384 dims]
    end

    User -->|HTTPS| UI
    UI -->|fetch| API
    API <-->|Drizzle ORM<br/>postgres-js| PG
    API -->|stdin/stdout| Claude
    API -->|index on write| Embedder
    Embedder -->|vector embeddings| PG
    API -->|PDFs, uploaded docs| FS
    PG -.->|HNSW cosine `<=>`<br/>top-K chunks| API

Lifecycle of one job application

flowchart TD
    P([Set up profile<br/>experience, skills, documents])
    A[Add application<br/>+ paste job description]
    I[Auto-index JD chunks<br/>into pgvector]
    G{Generate document?}
    CV[CV<br/>profile + RAG + ATS keywords → Claude]
    CL[Cover letter<br/>profile + RAG → Claude]
    PR[Interview prep<br/>STAR answers]
    EM[Email<br/>follow-up / thank-you / withdraw]
    R[Review + ATS score]
    F{Refine?}
    AP[Approve]
    K[Index approved CV/CL<br/>for future RAG retrieval]
    T[Track pipeline status]
    FD[Unified feed<br/>notes · stages · contact interactions]
    CO[Link contacts<br/>recruiter, interviewer, hiring manager]
    RES[Per-company research<br/>news · culture · glassdoor · tech stack]
    OUT([Outcome:<br/>interview · offer · rejected])
    LOOP[Feedback loop<br/>positive outcomes reindex<br/>improving future generations]

    P --> A
    A --> I
    I --> G
    G --> CV
    G --> CL
    G --> PR
    G --> EM
    CV --> R
    CL --> R
    R --> F
    F -->|yes, with feedback| G
    F -->|no| AP
    AP --> K
    AP --> T
    T --> FD
    T --> CO
    T --> RES
    T --> OUT
    OUT --> LOOP
    LOOP --> I

Features
Tech stack
Quick start
Usage walkthrough
Project structure
Database
RAG system
API reference
Configurable AI prompts
Data export / migration

Features

Application pipeline

Status machine: draft → applied → screening → interview → offer → accepted (plus rejected, withdrawn, ghosted)
Per-application "raio-x" view that bundles everything in one page: the job, the timeline, linked contacts, company research, other applications at the same employer
Auto-reminders fired on key transitions (follow-up 7 days after applied, prep on interview, decide on offer)

Unified event feed

Every application has a single timeline mixing four event types — filter or scroll, no separate logs:

Status changes — automatic when you move a card through the pipeline
Notes — free-form observations ("recruiter said decision next week")
Interview stages — structured (title, scheduled date, outcome: pending/passed/rejected)
Contact interactions — log a touchpoint with a specific recruiter/interviewer

AI generation (via Claude CLI)

CV tailored to the job description, ATS-aware (keyword extraction + scoring)
Cover letter grounded in your real experience (no hallucinated tenure)
Refinement loop — feed back specific changes ("emphasize Python", "add quantified achievements") to regenerate
Interview prep — STAR-format answers, likely questions, "questions to ask the interviewer"
Emails — follow-up, thank-you, withdrawal templates
Configurable system prompts (per generation type) editable at /settings

Contacts & networking

Contacts tracked separately from applications, linked via a many-to-many junction (contact_application) — the same recruiter can appear across every role they put you in front of
Per-junction role (recruiter, interviewer, hiring_manager, referrer) so a person can have different functions in different processes
Free-text or autocompleted company field on the contact form (auto-creates the company if it's new)

Per-company research

Typed entries (news, culture, tech_stack, glassdoor, interview_experience, compensation, general) with source URL and markdown content
Visible on every application page for that company → context every time you reopen the role
Indexable by the RAG system (sourceType='company_research') so future generations can use the research

RAG (Retrieval-Augmented Generation)

Local embeddings (Xenova/all-MiniLM-L6-v2, 384 dims, ~23 MB model, no API key)
Stored in Postgres as vector(384) with an HNSW cosine index
Sources indexed: uploaded documents, experience entries, application JDs, generated CVs/cover letters with positive feedback, company research
Native pgvector <=> similarity search — milliseconds even with tens of thousands of chunks

Dashboard

Stat cards: total / active / interview rate / offer rate / avg response days
Application funnel (horizontal bars with conversion percentages, semaphore-colored)
Action items: stale applications, draft list, interviews to prep, pending feedback, due reminders
Weekly velocity with a configurable target (saved to localStorage)
Response rate by job-board domain (LinkedIn vs Indeed vs direct, etc.)
Time-in-stage for active applications
CV performance: which generated CVs led to interviews/offers

Knowledge base

All feedback entries (CV outcomes) aggregated; quality ratings + notes used to inform future generations via RAG
Positive outcomes (interview / offer) automatically reindexed into the RAG store

Tech stack

Framework — Next.js 16 (App Router, Turbopack), React 19
Database — Postgres 16 + pgvector (Docker compose ships a local instance)
ORM — Drizzle (postgres-js driver, async)
AI — Claude CLI subprocess (no SDK, no API key in the app itself)
Embeddings — @huggingface/transformers running all-MiniLM-L6-v2 locally
PDF parsing — pdf-parse
PDF rendering — @react-pdf/renderer
UI — Tailwind v4, Lucide icons, react-markdown

Quick start

Prerequisites

Node.js 20+
Docker (for Postgres) or your own Postgres 16+ with pgvector
Claude CLI installed and authenticated (claude on PATH)

Setup

# 1. Install deps
npm install

# 2. Start Postgres 16 + pgvector (port 5433, defaults from docker-compose.yml)
docker compose up -d

# 3. Apply schema
npm run db:migrate

# 4. Configure env (.env.local is committed-ignored)
cat > .env.local <<EOF
DATABASE_URL=postgres://jobagent:jobagent@127.0.0.1:5433/job_agent
STORAGE_DIR=./storage
EOF

# 5. Run the dev server
npm run dev

Open http://localhost:3000.

First-time setup: go to /profile and fill in your basics (name, summary, experience, education, skills). Anything you skip here weakens every AI-generated document, because the prompts inject the profile as ground truth.

Usage walkthrough

1. Create your profile (one-time)

/profile — fill in personal info, summary, work experience (one entry per role), education, skills, certifications, languages. Upload any reference documents (existing CVs, project descriptions, performance reviews). Documents are automatically chunked and indexed into the RAG store on upload.

2. Add an application

/applications/new — paste the job description and basic company info. The app:

Creates the company (or links to an existing one)
Adds the application in draft status
Indexes the JD into the RAG store
Auto-extracts ATS-relevant keywords on demand

3. Generate documents

On any application, click Generate CV (/applications/[id]/generate):

Pick CV or Cover Letter
The system builds a context from your profile + RAG-retrieved chunks (most relevant experiences, document snippets, past successful CVs) + the target job description + top ATS keywords
Claude returns a markdown document
Review the ATS score and missing keywords — click Regenerate with missing keywords to retry, or write specific refinement feedback ("more quantified achievements")
Approve when satisfied — that marks the CV as the canonical version for the application and auto-creates a feedback row

4. Manage the pipeline on the application page

/applications/[id] — the raio-x view:

Move the status forward with the buttons under the status pill
Add events to the timeline: notes, interview stages (with scheduled date + outcome), contact interactions (linked to a specific contact)
Link contacts to the application (Contacts panel → pick from existing contacts)
Add per-company research (Research panel → news / culture / tech_stack / etc.)
See other applications at the same company (Other applications panel)
Generate follow-up / thank-you / withdrawal emails from the Email Templates section

5. Track contacts

/contacts — list view grouped by company. Edit any contact inline (pencil icon on hover). The company field autocompletes from existing companies; type a new name and it's auto-created.

6. Configure AI prompts (optional)

/settings — edit the system prompts for each generation type (CV, cover letter, interview prep, follow-up email, etc.). Each has a "Reset to default" button. Useful when you want to bias toward a specific style or industry.

7. Check the dashboard

/ — overview of pipeline health. The Action Items panel is the highest-signal thing: stale follow-ups, drafts you forgot to send, due reminders.

Project structure

src/
├── app/
│   ├── api/                                 # Route handlers
│   │   ├── applications/
│   │   │   ├── route.ts                     # GET (list) / POST (create)
│   │   │   └── [id]/
│   │   │       ├── route.ts                 # GET/PUT/PATCH/DELETE
│   │   │       ├── feedback/route.ts        # CV/CL outcome tracking
│   │   │       ├── generate-cv/route.ts     # AI: tailored CV
│   │   │       ├── generate-cl/route.ts     # AI: cover letter
│   │   │       ├── generate-email/route.ts  # AI: follow-up / thank-you / withdraw
│   │   │       ├── interview-prep/route.ts  # AI: STAR prep
│   │   │       ├── keywords/route.ts        # ATS keyword extraction + scoring
│   │   │       ├── events/                  # Unified event feed
│   │   │       └── contacts/route.ts        # Link / unlink contacts
│   │   ├── companies/[id]/research/         # Per-company research CRUD
│   │   ├── companies/route.ts               # List companies (for autocomplete)
│   │   ├── contacts/route.ts                # Contact CRUD
│   │   ├── dashboard/stats/route.ts         # All dashboard aggregations
│   │   ├── documents/route.ts               # Upload + delete + autoindex
│   │   ├── education/, experience/, skills/ # Profile sub-resources
│   │   ├── generated/[id]/route.ts          # Approve / edit a generated doc
│   │   ├── knowledge/route.ts               # Aggregated feedback view
│   │   ├── profile/route.ts                 # Profile upsert
│   │   ├── rag/reindex/route.ts             # Trigger / inspect RAG index
│   │   ├── reminders/route.ts               # Reminder CRUD
│   │   └── settings/prompts/route.ts        # Configurable AI system prompts
│   ├── applications/                        # Pipeline UI
│   ├── contacts/                            # Networking UI
│   ├── knowledge/                           # Feedback / knowledge browser
│   ├── profile/                             # Profile editor
│   ├── settings/                            # Prompt editor
│   └── page.tsx                             # Dashboard
├── components/
│   ├── applications/application-form.tsx    # Shared form for new + edit flows
│   └── layout/                              # PageShell, Sidebar
├── db/
│   ├── schema.ts                            # Drizzle pg-core schema
│   ├── index.ts                             # postgres-js client (lazy proxy)
│   └── queries/                             # Async query modules per domain
└── lib/
    ├── ats/                                 # Keyword extraction, matching, ATS scoring
    ├── claude/                              # Prompt builders + CLI wrapper
    │   ├── client.ts                        # Claude CLI subprocess
    │   ├── context-builder.ts               # Profile + RAG → GenerationContext
    │   ├── prompts.ts                       # CV / refinement / cover letter prompts
    │   ├── email-prompts.ts                 # Email prompt builders
    │   └── interview-prompts.ts             # Interview prep prompt
    ├── rag/
    │   ├── embedder.ts                      # all-MiniLM-L6-v2 singleton
    │   ├── chunker.ts                       # Text chunking strategies
    │   ├── indexer.ts                       # Per-source-type indexing
    │   └── retriever.ts                     # pgvector cosine search
    ├── validators/                          # Zod schemas
    └── status-machine.ts                    # Allowed status transitions
scripts/
├── dump-sqlite.mjs                          # Legacy: dump old SQLite into JSON
└── restore-postgres.mjs                     # Restore JSON dump into Postgres
drizzle/migrations/                          # Generated SQL migrations
docker-compose.yml                           # Postgres 16 + pgvector

Database

Schema lives in src/db/schema.ts. Drizzle generates migrations from it.

npm run db:generate    # Diff schema → new SQL migration
npm run db:migrate     # Apply pending migrations
npm run db:studio      # Open Drizzle Studio (web DB browser)

Key tables

Table	Purpose
`profile`	Single-user personal info (always `id = 1`)
`experience`, `education`, `skill`, `certification`, `language`, `document`	Profile sub-resources
`company`	Companies (auto-created when adding apps or contacts)
`company_research`	Per-company research entries (news, culture, glassdoor, …)
`application`	Job applications with status
`application_event`	Unified event feed (`event_type` ∈ `status_change` / `stage` / `note` / `contact_interaction`)
`contact`	Recruiters / interviewers / referrers
`contact_application`	M:N junction; per-junction role
`generated_cv`, `generated_cover_letter`	AI-generated documents (markdown + metadata)
`generation_feedback`	Outcome tracking (interview / offer / rejected / no_response) + quality rating
`embedding_chunk`	RAG index — `vector(384)` with HNSW cosine index
`reminder`	Auto + manual reminders
`prompt_config`	Editable system prompts per generation type

Migrating from the legacy SQLite version

If you're upgrading from the pre-Postgres version of this repo:

# 1. Dump SQLite → JSON (reads ./storage/db/job-agent.db)
node scripts/dump-sqlite.mjs

# 2. Start Postgres + apply fresh schema
docker compose up -d
npm run db:migrate

# 3. Restore the dump (preserves IDs, converts bool/json/vector columns)
node --env-file=.env.local scripts/restore-postgres.mjs

RAG system

The retriever uses pgvector's <=> (cosine distance) operator against the HNSW index on embedding_chunk.embedding for native, millisecond-latency similarity search — no JS-side scoring loop.

How it works

Indexing — text from documents / experiences / JDs / approved generations / research is split into chunks and embedded via all-MiniLM-L6-v2 (runs locally, 384 dims)
Storage — vectors stored directly in Postgres as vector(384); metadata as jsonb
Retrieval — query text is embedded, then ORDER BY embedding <=> :query::vector LIMIT k returns top-K, indexed by HNSW
Score filtering — results below minScore (default 0.25 cosine similarity) are dropped

What gets indexed and when

Source type	Chunking	Indexed when
`document`	Paragraphs ~500 chars, light overlap	On upload (fire-and-forget)
`experience`	One chunk per entry	On create / update
`application`	Paragraphs ~500 chars	On application create / JD update
`generated_cv`	By markdown `##` section	On feedback with `interview` or `offer` outcome
`generated_cl`	By markdown `##` section	Same as above
`company_research`	Single chunk (per entry)	On research entry create

Management

Auto: every relevant write fires an indexing job (non-blocking)
Manual rebuild: POST /api/rag/reindex or the Rebuild Index button on /knowledge
Stats: GET /api/rag/reindex returns counts by source type

API reference

Method	Endpoint	Notes
GET / POST	`/api/applications`	List with optional `?status=` filter; create
GET / PUT / PATCH / DELETE	`/api/applications/[id]`	Full app + bundled contacts, research, related apps. PATCH for status change
POST	`/api/applications/[id]/generate-cv`	Body `{ previousContent?, refinementFeedback? }` for refinement loop
POST	`/api/applications/[id]/generate-cl`	Cover letter
POST	`/api/applications/[id]/generate-email`	Body `{ emailType, notes? }` — `follow_up` / `thank_you` / `withdraw`
POST	`/api/applications/[id]/interview-prep`	Saves the result to `application.interview_prep`
GET / POST	`/api/applications/[id]/keywords`	GET = extract from JD; POST `{ cvText }` = score CV against JD
GET / POST / PATCH	`/api/applications/[id]/feedback`	Outcome + quality tracking
POST / DELETE	`/api/applications/[id]/events`	Add event (note / stage / contact_interaction); DELETE via `events/[eventId]`
POST	`/api/applications/[id]/contacts`	Link a contact (body `{ contactId, role? }`); DELETE `?contactId=` to unlink
GET	`/api/companies`	List for autocomplete
GET / POST	`/api/companies/[id]/research`	Research entries (POST body: `{ type, title, content, sourceUrl? }`)
PATCH / DELETE	`/api/companies/[id]/research/[researchId]`	Edit / delete
GET / POST / PUT / DELETE	`/api/contacts`	CRUD; `companyName` field is resolved to an existing or new company via findOrCreate
GET / POST / PUT / DELETE	`/api/profile` `/api/experience` `/api/education` `/api/skills` `/api/documents`	Profile sub-resources
GET / PATCH	`/api/generated/[id]`	View / approve / edit content of a generated doc
GET / POST / PATCH	`/api/reminders`	List due + upcoming; create; complete/delete via PATCH
GET / PUT	`/api/settings/prompts`	Configurable AI system prompts
GET / PATCH	`/api/knowledge`	Aggregated feedback view
GET / POST	`/api/rag/reindex`	Stats / full rebuild
GET	`/api/dashboard/stats`	All dashboard aggregations in one call

Configurable AI prompts

Every generation type has its system prompt stored in prompt_config and editable at /settings. Defaults live in src/db/queries/prompt-configs.ts and are seeded on first read.

Key	What it generates
`cv_generation`	Tailored CV
`cv_refinement`	CV refinement (when you give feedback to regenerate)
`cover_letter`	Cover letter
`interview_prep`	STAR-format interview prep
`email_follow_up`	Follow-up after applying
`email_thank_you`	Thank-you after interview
`email_withdraw`	Withdrawal email

Each row stores both system_prompt (current) and default_prompt (factory default), so "Reset to default" is always available.

Data export / migration

# Dump current Postgres to JSON (useful for backups / moving to another machine)
docker exec job-agent-postgres pg_dump -U jobagent job_agent > storage/db/backup-$(date +%Y%m%d).sql

# Restore on a fresh container
docker exec -i job-agent-postgres psql -U jobagent -d job_agent < storage/db/backup-YYYYMMDD.sql

The two scripts/*.mjs helpers handle the SQLite → Postgres migration path; once you're on Postgres, prefer pg_dump / psql for snapshots.

Development notes

Every DB query is async — there's no sync API. If you add a new query, declare the function async and await all Drizzle calls.
The db export in src/db/index.ts is a lazy proxy; the postgres-js client is only created on first use, so importing db is free at build time.
The Claude CLI is invoked as a subprocess. Make sure claude --version works in the same shell as npm run dev.
The embedding model downloads on first use (~23 MB) into ~/.cache/huggingface. Subsequent runs are instant.
All form inputs use light-only color schemes (color-scheme: light set on :root in globals.css) because the UI wasn't designed for dark mode. Per-input bg-white is set explicitly to prevent UA dark form styling from leaking through.
Type-check before committing: npx tsc --noEmit.

License

Personal project — use freely, no warranty.

22 KiB Raw Permalink Blame History

Job Tracker

How it works

Architecture

Lifecycle of one job application

Table of contents

Features

Application pipeline

Unified event feed

AI generation (via Claude CLI)

Contacts & networking

Per-company research

RAG (Retrieval-Augmented Generation)

Dashboard

Knowledge base

Tech stack

Quick start

Prerequisites

Setup

Usage walkthrough

1. Create your profile (one-time)

2. Add an application

3. Generate documents

4. Manage the pipeline on the application page

5. Track contacts

6. Configure AI prompts (optional)

7. Check the dashboard

Project structure

Database

Key tables

Migrating from the legacy SQLite version

RAG system

How it works

What gets indexed and when

Management

API reference

Configurable AI prompts

Data export / migration

Development notes

License

22 KiB

Raw Permalink Blame History