![]() |
![]() |
![]() |
![]() |
Self-hosted AI model server. Single binary, no cloud, no API keys. Run HuggingFace models locally with a web chat UI and an OpenAI-compatible API.
turbolab manages two inference backends depending on the model format:
/proc/meminfo, filters out files that won't fit (with 1GB headroom), then ranks by quant type — Q4_0 first for raw CPU throughput on AVX2, Q4_K_M second for quality. Falls back to smallest file if everything is filtered.Thread count propagates to OMP, MKL, and OpenBLAS simultaneously so you're not leaving cores on the table.
If the inference process crashes, turbolab restarts it automatically. Three consecutive fast crashes (under 5s uptime) and it gives up and logs the failure rather than looping forever.
/v1/ API — works with any client that supports it--no-cpu-only/api/status/api/eventsturbolab updateturbolab setupDownload the latest binary for your platform from Releases:
| Platform | Binary |
|---|---|
| Linux x86_64 | turbolab_linux_amd64 |
| Linux ARM64 | turbolab_linux_arm64 |
chmod +x turbolab_linux_amd64
sudo mv turbolab_linux_amd64 /usr/local/bin/turbolab
# First-time setup — installs turboquant into a venv, downloads llama-server
# Requires python3 on PATH
turbolab setup
# Start the server (default port 7860)
turbolab serve
# Open http://localhost:7860
turbolab models search <query> # Search HuggingFace
turbolab update # Self-update to latest release
turbolab serve --port 8080 # Custom port
turbolab serve --bits 8 # Higher precision (default 4)
turbolab serve --no-cpu-only # Enable GPU layers
turbolab setup handles the rest)Not network-safe — no authentication or security middleware. Designed for trusted networks (homelab/localhost) only.
git clone https://github.com/usr-wwelsh/turbolab
cd turbolab
make build
Requires Go 1.23+ and Node 20+.
MIT — usr-wwelsh