A comprehensive platform for baseball statistics built with Go, Svelte & Flutter serving data from the Lahman Baseball Database and Retrosheet.
task build
task server:start
The API will be available at http://localhost:8080, with interactive documentation at http://localhost:8080/docs/.
task build
task build:etl
./tmp/baseball --help
./tmp/baseball-etl --help
For the full non-optional loading contract, see data-loading.md.
Dataset root resolution for etl and db commands:
--data-rootBASEBALL_DATA_ROOTdataData is local-first: keep source files under the resolved data root (data by default).
When Retrosheet files are missing for a target window, fetch them with ETL commands before running a load.
Quick local example for a complete representative slice:
cp conf/conf.example.toml conf.toml
./tmp/baseball db recreate --config conf.toml
./tmp/baseball db migrate --config conf.toml
./tmp/baseball-etl worker
./tmp/baseball-etl run --profile=dev
./tmp/baseball-etl maintenance --profile=dev --mv-refresh-mode=auto
./tmp/baseball-etl validate --profile=dev
./tmp/baseball-etl status
Fetch examples (when Retrosheet files for your window are not already present):
./tmp/baseball-etl fetch retrosheet --years=2023-2025
./tmp/baseball-etl fetch negroleagues
./tmp/baseball-etl fetch chadwick --force
./tmp/baseball-etl cleanup retrosheet --dry-run
For large Retrosheet slices, keep migration and recomputation separate, and process in bounded year windows:
./tmp/baseball db migrate --config conf.toml
./tmp/baseball-etl run --profile=dev --years=2023-2025
./tmp/baseball db refresh-views player_game_batting_stats player_game_pitching_stats player_game_fielding_stats team_game_stats
./tmp/baseball db refresh-views no_hitters cycles multi_hr_games triple_plays extra_inning_games
./tmp/baseball db refresh-views season_batting_leaders season_pitching_leaders career_batting_leaders career_pitching_leaders
./tmp/baseball db refresh-views player_id_map team_franchise_map park_map
db migrate is structural/idempotent; treat materialized view refresh as an explicit incremental operation.
./tmp/baseball-etl worker is the long-running queue consumer.
./tmp/baseball-etl run is the canonical enqueue entrypoint for extract/load.
./tmp/baseball-etl maintenance is the canonical enqueue entrypoint for MV recompute + serving sync.
Treat ETL as a batched worker flow on shared VMs: prefer scoped --years runs over unbounded full-history jobs unless you are operating a larger host.
run is enqueue-first by default; use --enqueue-only=false only when you explicitly want one command to enqueue + drain locally.
maintenance enqueues and drains by default; use --enqueue-only=true to enqueue-only.
Queue operator commands:
./tmp/baseball-etl jobs ls --status queued,running,retry_wait --limit 100
./tmp/baseball-etl jobs clear --reason "recover stale running jobs"
For exhaustive production-style ingestion:
./tmp/baseball-etl run --profile=prod --mode=full
./tmp/baseball-etl maintenance --profile=prod --mv-refresh-mode=auto
./tmp/baseball-etl validate --profile=prod
Retrosheet --era values: fed, nlg, boomer, pitcher, turf, steroid, moneyball, statcast, modern.
The full ETL command also accepts --years and --era to customize the Retrosheet window.
It also accepts --data-root when data is mounted outside defaults.
Batched worker pattern (recommended for VM safety):
./tmp/baseball-etl fetch retrosheet --years=2022-2023
./tmp/baseball-etl run --profile=prod --years=2022-2023
./tmp/baseball-etl maintenance --profile=prod --years=2022-2023 --mv-refresh-mode=auto
./tmp/baseball-etl validate --profile=prod --years=2022-2023
./tmp/baseball-etl fetch retrosheet --years=2024-2025
./tmp/baseball-etl run --profile=prod --years=2024-2025
./tmp/baseball-etl maintenance --profile=prod --years=2024-2025 --mv-refresh-mode=auto
./tmp/baseball-etl validate --profile=prod --years=2024-2025
# Start the HTTP API (pass --debug to disable rate limiting locally)
./tmp/baseball server start --config conf.toml
# Smoke-test endpoints with formatted output
./tmp/baseball server fetch 'search/games?q=dodgers%202024'
# Check readiness
./tmp/baseball server health
Every command accepts --config to point at a custom conf.toml, inherits rate limits from your server configuration, and prints structured output
Think of baseball server fetch as a built-in curl for API paths. It:
players?name=ruth) and automatically targets /v1--raw when you need plain JSON for jqThe REST API lives at /v1 (or the host/port defined in conf.toml), covering players, teams, stats, games, events, pitches, search, and metadata.
Query individual pitches derived from Retrosheet sequences, including count state and game context. See Pitch-Level API docs and Pitch sequencing internals.
Derived endpoints provide streak detection, player splits, run differential windows, game win probability curves, and win expectancy lookups. See Derived & Advanced docs.
Search supports natural language parsing for games (teams, season, postseason context, aliases). See Search docs.
/docs (or /docs/{slug})/explorerRecommended docs entry points:
task swagger:generate
task --list
This project uses data from: