A fully offline, voice-activated AI assistant with a British personality. You talk, it listens, it responds. No cloud, no API keys, no internet required.
It started as a Whisper experiment. It got out of hand.
SPACE to talk, release to processEverything runs on your machine. No subscriptions. No data leaving your laptop.
I wanted to learn Whisper. That was it. That was the whole plan.
Then I got curious about TTS voices. Then local LLMs. Then I thought it'd be funny to make it sound like Jarvis from Iron Man. Then I figured if I'd already spent this long on it, I may as well give it a proper UI. (UI was all Claude)
It's not trying to be anything groundbreaking. It's a personal project I built to use in places without reliable Wi-Fi — libraries, cafés, airports — when I just want to ask something quickly and get a sensible answer back.
Three TTS libraries. Days of environment conflicts. Zero working voices I actually liked.
Eventually switched to Edge TTS. Took ten minutes. Sounded great. Lesson learned.
System dependencies
Python
pip install -r requirements.txt
# 1. Pull the model
ollama pull mistral:7b
# 2. Install Python deps
pip install -r requirements.txt
# 3. Install frontend deps
cd jarvis-ui && npm install
Two terminals. Both need to be open.
# Terminal 1
python jarvis_server.py
# Terminal 2
cd jarvis-ui && npm run tauri dev
Hold SPACE to speak. Release to process. ESC to quit.
First run of
npm run tauri devtakes 5–10 minutes while Rust compiles. Every run after that is fast.
SPACE held → mic records via sounddevice
SPACE released → Whisper transcribes (low-confidence audio gets discarded)
→ Mistral generates a short response via Ollama
→ Edge TTS speaks it back
→ WebSocket pushes state + transcript to the UI in real time
afplay for audio; swap it for mpg123 on LinuxMIT. Take it, break it, improve it.