Trawl

nitish-hegde-vst-au4

A local-first job-search dashboard. Polls five ATSes daily, discovers new sponsoring companies weekly. No accounts, no API keys, no telemetry.

#ats #dashboard #fastify #job-search #local-first #mcp-server #searxng #sqlite #svelte #typescript

Download

Trawl

A local-first job-search dashboard. Polls five ATSes daily, discovers new sponsoring companies weekly, surfaces everything in one Svelte UI. No accounts, no API keys, no telemetry.

Why I built this

I got tired of running the same Boolean searches across five career portals every morning, by hand. Google Alerts kept missing things. LinkedIn's idea of a relevant role isn't mine. So I built the dashboard I wanted: every company I care about polled once a day, discovery surfacing new ones weekly, matching logic in a single JSON file I can read and edit. It runs on my laptop, I check it daily, and I tighten the matching whenever I find a new pattern worth catching. Built it for myself first — if it's useful to anyone else, even better.

Highlights

Five ATS adapters — Greenhouse, Lever, Ashby, SmartRecruiters, Workable. Per-host concurrency caps so no single ATS gets hammered. Auto-archives companies after three consecutive 404s so dead slugs stop being polled.
Four discovery sources — Stetsenko's relocation list, awesome-remote-job, SearXNG Boolean queries, and a GitHub aggregator-repo parser with a two-stage review queue (approve repo → its contents flow into the company queue).
MCP server included — pnpm mcp exposes the SQLite DB as Claude-callable tools. Ask "what should I follow up on this week?" in any Claude session.
Self-hosted SearXNG for Boolean search. No third-party API, no rate limits, no keys.
Single SQLite file (data/trawl.db) with WAL + foreign keys + a numbered-migrations system so schema changes ship as discrete reviewable files instead of accreted ALTERs.
Exact-pinned deps + supply-chain notes in the README. pnpm blocks postinstall scripts by default.
Svelte 5 runes + Fastify 5 sharing a single port via @fastify/middie mounting Vite middleware in dev, @fastify/static in prod.

See it run

https://github.com/user-attachments/assets/12087329-4383-45c2-b431-bee7c834ec16

Quick start

# 1. Install dependencies (pnpm required)
pnpm install

# 2. Build better-sqlite3's native binding. pnpm blocks postinstall scripts
#    by default; this approves the one legitimate native compile we need.
cd node_modules/.pnpm/better-sqlite3@*/node_modules/better-sqlite3 && npm run install && cd -

# 3. Set up the local SearXNG container (one-time)
cp docker/searxng/settings.yml.example docker/searxng/settings.yml
sed -i '' "s/REPLACE_WITH_RANDOM_HEX/$(openssl rand -hex 32)/" docker/searxng/settings.yml
docker compose up -d searxng

# 4. Start the app on a single port
pnpm dev
# → http://localhost:3030

The server seeds five sample companies, polls them on startup, kicks off discovery, and surfaces matched jobs in the dashboard. Add or remove companies from the Companies tab. Tune the title/stack filters in config/stack_keywords.json.

How it works

Polling

Every tracked company is polled daily at 09:00 local. For each job posting, the matcher runs two checks: the title must contain one of your titleKeywords, and the body must mention at least one stack token from stackPatterns. Matches land in the dashboard tagged with their reasons (e.g. senior frontend, React, TypeScript).

pollAll() uses per-host request pools (5 concurrent per ATS, 3 for SmartRecruiters per their documented 5 req/sec/IP limit). The dashboard polls /api/poll/status every second while a poll is running and shows live progress (Polling 12 / 30...). A consecutive_failures counter on each company increments on 404s and resets on success; after three in a row the company is auto-archived.

Discovery

Weekly, Mondays at 09:30. Four sources run in sequence:

Stetsenko's GitHub list — parses AndrewStetsenko/tech-jobs-with-relocation/README.md for companies hiring internationally.
awesome-remote-job — parses lukasz-madon/awesome-remote-job/README.md for remote-DNA companies.
SearXNG Boolean search — runs each query in config/discovery_queries.json against your local SearXNG instance, extracts result URLs, detects ATS slugs.
GitHub aggregator-repo search — queries the GitHub Search API for high-signal job-list repos (stars ≥ 50, recently pushed). New repos go into a Repo sources review queue. Approved repos get their READMEs re-parsed weekly, with any new companies surfaced into the standard discovery queue.

A two-stage review keeps the noise low: trust a repo once, and from then on its contents flow automatically into the companies queue. Both queues have one-click Approve / Dismiss.

MCP server

pnpm mcp starts an MCP server (stdio transport) that exposes the DB as tools for job review, discovery review, repo-source review, application status, outreach status, and discovery triggers. Wire it into Claude Desktop or the CLI and you can query your job tracker conversationally.

{
  "mcpServers": {
    "trawl": {
      "command": "pnpm",
      "args": ["mcp"],
      "cwd": "/absolute/path/to/trawl"
    }
  }
}

If your MCP client does not support cwd, run the command through a shell from the repo directory or use your client's equivalent working-directory setting.

Architecture

Layer	Tech	Why
HTTP server	Fastify 5 (TypeScript)	Async-first, JSON-schema validation, structured pino logger built in
UI	Svelte 5 (runes API) + Vite 6	Smallest runtime, compile-time reactivity, fits the simple CRUD shape
Single-port serving	`@fastify/middie` mounting Vite middleware in dev; `@fastify/static` in prod	No dual-port DX friction, no SPA framework overkill
DB	SQLite via `better-sqlite3`, WAL + foreign keys, numbered migrations	File-based, zero infra, synchronous writes are fine here
Scheduler	`node-cron` inside the Fastify process	One process, no Redis, simple
Concurrency	Per-host request pools (5 concurrent per ATS, 3 for SmartRecruiters)	Cap bursts to any single ATS host, keep wall-clock fast
Search backend (discovery)	Self-hosted SearXNG via Docker	Free, no signup, no CAPTCHA, JSON output
Repo-source discovery	GitHub Search API (unauthenticated)	Free, 60 req/hr, no token needed for the weekly cadence

File layout

trawl/
├── src/
│   ├── index.ts                # Thin entry — calls server.ts
│   ├── server.ts               # Fastify bootstrap, route registration, cron
│   ├── config.ts               # Env vars + module-level constants (PORT, ROOT, IS_DEV, …)
│   ├── db.ts                   # SQLite open + migrations + seed companies + insertCompany helper
│   ├── migrate.ts              # Numbered-migrations runner
│   ├── migrations/             # 0001-initial-schema.sql — the single schema file (add 0002+ as needed)
│   ├── matching.ts             # titleMatches / bodyMatches / matchReasons from stack_keywords.json
│   ├── polling.ts              # pollAll(), per-company fetch, auto-archive on consecutive 404s
│   ├── discovery.ts            # Four discovery sources + repo-parse post-step
│   ├── ats.ts                  # ATS adapters, RequestPool primitive, per-host pools, detectAts()
│   ├── country.ts              # FLAG_TO_COUNTRY, COUNTRY_HINTS, inferCountry(), extractCountry()
│   ├── domain.ts               # Shared TypeScript types (Ats, Job, Company, …)
│   ├── state.ts                # persistentState() proxy — survives restarts via app_state K/V table
│   ├── routes/                 # Fastify route modules grouped by domain
│   │   ├── jobs.ts
│   │   ├── companies.ts
│   │   ├── discoveries.ts
│   │   ├── repo-sources.ts
│   │   └── system.ts
│   ├── util/                   # Pure helpers (country normalization, fetch retry, slug, text)
│   └── mcp.ts                  # Independent MCP server — opens the same SQLite file
├── client/                     # Svelte SPA
│   ├── index.html
│   ├── main.ts
│   ├── app.css
│   └── lib/
│       ├── App.svelte
│       ├── Dashboard.svelte          # Top-level page — composes the smaller components below
│       ├── DigestStrip.svelte        # "N new jobs this week" header strip
│       ├── FunnelStrip.svelte        # Application-status pipe chips
│       ├── JobFilters.svelte         # Search, country chips, filters bar
│       ├── JobTable.svelte           # The matched-jobs table
│       ├── JobRow.svelte             # A single row (extracted for clarity)
│       ├── NotesEditor.svelte        # Expanded notes/recruiter editor under a row
│       ├── Paginator.svelte
│       ├── Companies.svelte          # Tracked companies + add-company form
│       ├── Discoveries.svelte        # Review queue from auto-discovery (companies)
│       ├── RepoSources.svelte        # Review queue for GitHub aggregator repos
│       ├── SystemStatus.svelte       # Cron schedule + last-run timestamps
│       ├── util/                     # similar.ts (Jaccard JD similarity), listControls.ts
│       └── api.ts                    # Fetch wrappers + shared error helpers
├── config/                     # Editable JSON config (matching, discovery, seed companies)
├── docker/searxng/             # SearXNG config (real settings.yml is gitignored)
├── docker-compose.yml
├── data/                       # SQLite runtime data (gitignored)
├── package.json
├── tsconfig.json / tsconfig.app.json
├── eslint.config.js
├── .prettierrc.json
├── vite.config.ts
├── svelte.config.js
├── .env.example
├── AGENTS.md                   # Repo-level guidelines for human + AI contributors
├── CLAUDE.md                   # Specifics for Claude Code working in this repo
└── README.md

Configuration files

File	Purpose	Edit when
`config/companies.json`	Initial company seeds. Inserted on first run via `INSERT OR IGNORE`.	You want to bake new companies into the default seed (rare — use the UI).
`config/stack_keywords.json`	`titleKeywords` array + `stackPatterns` array (pattern/flags/label). Controls what titles and JD bodies are considered a match.	Tune matching — add/remove keywords without touching source code.
`config/discovery_queries.json`	Array of `{ query: "..." }` Boolean queries sent to SearXNG.	You want to add/remove search patterns (e.g., new ATS hosts, different roles).
`config/repo_discovery_queries.json`	Array of plain strings — keyword queries for GitHub Search API.	You want to broaden / narrow the kinds of aggregator repos surfaced.
`docker/searxng/settings.yml`	GITIGNORED. Real SearXNG config with generated secret_key.	Auto-created during setup. Regenerate by re-running the openssl + sed pair.

Environment variables

Var	Default	Purpose
`PORT`	`3030`	Fastify port.
`NODE_ENV`	`development`	When `production`, serves built Svelte from `dist/client/` via `@fastify/static`. In dev mode, mounts Vite in-process for HMR.
`SEARXNG_URL`	`http://localhost:8888`	Where SearXNG is reachable.

No API keys, no tokens, no quotas. Everything is local-only or hits free public APIs unauthenticated.

Secrets and what's in git

One generated secret in this project: the SearXNG secret_key. SearXNG uses it locally to sign session cookies. Generated via openssl rand -hex 32 during setup. It lives only in docker/searxng/settings.yml, which is gitignored.

File	Has secret?	Tracked?
`docker/searxng/settings.yml.example`	placeholder only	yes
`docker/searxng/settings.yml`	real generated value	NO (gitignored)
`data/trawl.db`	local DB only, no creds	NO (gitignored)

Verify the gitignore before any git add:

git check-ignore -v docker/searxng/settings.yml data/trawl.db

Every external service (Greenhouse, Lever, Ashby, SmartRecruiters, Workable, GitHub raw, SearXNG) is hit unauthenticated. No keys to leak.

Scripts

Command	What it does
`pnpm dev`	Fastify + Vite middleware on `:3030`. HMR for Svelte, hot-reload for backend via `tsx watch`.
`pnpm start`	Production mode: `NODE_ENV=production`, serves built assets via `@fastify/static`. Run `pnpm build` first.
`pnpm mcp`	Start the MCP server (stdio transport). Wire into Claude Desktop / CLI via the JSON snippet above.
`pnpm build`	Vite builds Svelte to `dist/client/`.
`pnpm check`	`svelte-check` against `tsconfig.app.json` — client-side type check.
`pnpm lint`	ESLint (flat config, v9) on TS + Svelte.
`pnpm format`	Prettier `--write .`.
`pnpm format:check`	Prettier `--check .` (CI-style).
`docker compose up -d searxng`	Start the SearXNG container (port 8888).
`docker compose stop searxng`	Stop it.

Polling internals

pollAll() is fire-and-forget:

POST /api/poll returns 202 immediately. The function runs in the background and updates the pollState persisted via state.ts.
pollAll() runs Promise.all across all companies, gated by per-host request pools (5 concurrent per ATS, 3 for SmartRecruiters).
Each company fetches its postings, runs titleMatches() && bodyMatches(), upserts into jobs with ON CONFLICT(id) DO UPDATE — so re-polling re-evaluates with current filter logic.
A consecutive_failures counter on each company increments on 404 fetches and resets on the first success. After three in a row the company is auto-archived (so dead slugs stop wasting cycles).
Dashboard polls GET /api/poll/status every second while a poll is running. Shows Polling X/Y... live.

Daily cron also calls pollAll() at 09:00 local. State (startedAt, finishedAt, last-error, last-auto-archived list) is persisted to the app_state table so the dashboard renders correct values after a restart.

Discovery internals

runDiscovery() runs four sources in sequence, plus a post-step for parsing approved repos:

stetsenko-github — fetches AndrewStetsenko/tech-jobs-with-relocation/README.md. Parses the "Companies hiring internationally" table. Extracts ATS slugs from careers URLs via detectAts().
awesome-remote-job — fetches lukasz-madon/awesome-remote-job/README.md. Parses the "Companies with 'remote DNA'" section.
boolean-search — only runs if SearXNG is reachable at ${SEARXNG_URL}/healthz. Runs each query in config/discovery_queries.json against SearXNG (?format=json), extracts result URLs, runs detectAts().
repo-sources — queries GitHub Search API (/search/repositories?q=...&sort=stars) for each query in config/repo_discovery_queries.json. Filters: stars ≥ 50, pushed within the last 24 months. Inserts candidates into repo_sources with status pending. Reviewable via the Repo sources tab.
repo-parse (post-step) — for every repo_sources row with status='approved' whose parsed_at is null or older than 7 days, fetches its README, runs a generic Markdown-link extractor [name](url), runs detectAts() on each link, and inserts new companies into discoveries with source=github:OWNER/REPO.

Two-stage review: repo → companies. A repo is approved once (you trust the list) and from then on its contents flow automatically into the company discoveries queue.

Weekly cron also calls runDiscovery() on Mondays at 09:30 local.

MCP tools

Tool	What it does
`list_jobs`	List active jobs by default. Filter by match, country, pipeline status, company, or archive.
`list_discoveries`	List auto-discovered companies. Filter by source or review status.
`list_repo_sources`	List GitHub aggregator repos. Defaults to non-dismissed repos.
`list_companies`	List tracked companies with archive and polling-health fields.
`get_company`	Look up one company by slug.
`approve_discovery` / `dismiss_discovery`	Review one pending company discovery.
`approve_all_discoveries`	Approve all pending discoveries; returns newly inserted company count.
`dismiss_all_discoveries`	Dismiss all pending discoveries.
`approve_repo_source`	Approve and parse one pending repo source.
`unapprove_repo_source`	Move one approved repo source back to pending.
`dismiss_repo_source`	Dismiss one pending repo source.
`set_application_status`	Set a manual application status; submitted jobs get the same follow-up behavior as the UI.
`mark_applied`	Legacy alias for `set_application_status`.
`set_outreach_status`	Set outreach status for a job.
`run_discovery`	Run the full discovery pipeline unless one is already running.
`discover_repo_sources`	Run only GitHub repo-source discovery.

The MCP server imports the same DB bootstrap path as Fastify, so migrations, seed data, and startup hygiene run consistently. Read tools query SQLite directly. Write tools call shared helpers used by the Fastify routes for discovery approval, application status, follow-up dates, outreach, and repo-source review, so MCP and the UI do not drift. The current server opens the same SQLite file as Fastify; WAL mode makes concurrent reads safe, and writes are serialized by SQLite's write lock. No authentication — only expose via local stdio.

DB schema and migrations

Schema lives in src/migrations/ as numbered files — .sql for pure DDL, .ts for backfills that need JS logic. src/migrate.ts lists files, applies any whose id isn't already in the schema_migrations table, and records each successful apply — every migration runs inside its own transaction.

Adding a column / index / data backfill means adding the next numbered file. Don't edit a frozen migration.

Core tables:

companies          (slug PK, name, ats, country, archived, name_resolved,
                    consecutive_failures, added_at)
jobs               (id PK, company_slug FK, title, location, country, url,
                    matched, match_reasons, notes, dedup_key, recruiter,
                    archived, seen_at)
discoveries        (id, name, slug, ats, country, source, sample_url, status,
                    found_at, UNIQUE(slug, ats))
repo_sources       (id, full_name UNIQUE, description, stars, pushed_at,
                    html_url, status, parsed_at, found_at)
job_descriptions   (job_id PK FK, body, captured_at)
app_state          (key PK, value, updated_at)
schema_migrations  (id PK, applied_at)

Constraint highlights: ats is checked against the five supported adapters. country is checked to be either empty or a two-letter uppercase code. status columns on discoveries and repo_sources are CHECK-constrained to their allowed enum values. Foreign keys are enabled at connection time (PRAGMA foreign_keys = ON).

Supply chain hygiene

All direct deps come from known maintainers / official orgs:

Fastify family (fastify, @fastify/static, @fastify/middie): NearForm + Fastify core team.
Svelte family (svelte, @sveltejs/vite-plugin-svelte, svelte-check): Svelte core.
Vite: VoidZero.
better-sqlite3: WiseLibs.
node-cron: Lucas Merencia.
tsx: Hiroki Osame.
TypeScript, @tsconfig/svelte: Microsoft.
ESLint, globals: ESLint Foundation.
Prettier, prettier-plugin-svelte: Prettier + Svelte orgs.
@modelcontextprotocol/sdk: Anthropic.

All pinned to exact versions in package.json (no ^, no ~) so a malicious patch can't auto-upgrade you. Run pnpm audit periodically. pnpm blocks postinstall scripts by default; only better-sqlite3's native compile is manually approved (legitimate).

Rate limits

Service	Limit	Approach
Greenhouse / Lever / Ashby / Workable	No documented public limit	5-concurrent per host via request pool. Polling cadence is daily. No throttle needed.
SmartRecruiters	5 req/sec/IP (documented)	3-concurrent (pool reduced from 5) so brief bursts stay comfortably under.
GitHub Search API (`/search/repositories`)	10 req/min unauthenticated (strict)	2-second throttle between queries. 7 queries × 2s = 14s total per discovery run, well under the limit. If you bump to >5 queries, increase the throttle proportionally.
GitHub raw content (`raw.githubusercontent.com/...`)	5000 req/hr	~2–10 fetches per discovery cycle. Nothing.
SearXNG (local)	None — it's your container	`limiter: false` in config.

If a 429 hits any ATS, the pollAll() loop catches per-company errors and continues with the rest. The failed company gets retried on the next poll cycle; if it 404s three polls in a row it auto-archives.

Cron schedule

Job	When (local time)	What it does
Poll	Daily at 09:00	Fetches new postings from every tracked company
Discovery	Mondays at 09:30	Runs all four discovery sources + re-parses approved GitHub repos

Both also run once on server startup so a fresh launch gets immediate data. The Dashboard shows last-run timestamps and next-fire countdowns at the top, refreshed every 30 seconds, via GET /api/system/status.

Troubleshooting

Symptom	Likely cause	Fix
`Discovery (boolean-search): SearXNG not reachable`	Container not running	`docker compose up -d searxng`
`EADDRINUSE: address already in use 127.0.0.1:3030`	Previous instance still running	`lsof -ti:3030 \| xargs kill -9`
`Could not locate the bindings file` (better-sqlite3)	Native build skipped by pnpm	`cd node_modules/.pnpm/better-sqlite3@*/node_modules/better-sqlite3 && npm run install`

License

MIT. See LICENSE.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing