A lightweight macOS and Windows menu bar app for AI-powered dictation. Press a hotkey anywhere on your system, speak, and polished text appears at your cursor — powered by cloud APIs at zero cost via free tiers.
Built as an open-source alternative to tools like Wispr Flow.
Pre-built installers are available on the Releases page:
| Platform | File |
|---|---|
| macOS (Apple Silicon) | VoiceFlow-x.x.x-arm64.dmg |
| Windows 10/11 (x64) | VoiceFlowSetup-x.x.x-x64.exe |
macOS — first launch: VoiceFlow is not notarized by Apple, so macOS will block it on first open. Right-click
VoiceFlow.app→ Open → Open to bypass the Gatekeeper warning. You only need to do this once.
@file references| Platform | Status |
|---|---|
| macOS 13+ (Apple Silicon) | Supported |
| Windows 10/11 (x64) | Supported |
VoiceFlow uses external APIs for transcription and text cleanup. All have free tiers sufficient for personal use.
| Provider | Model | Free Tier |
|---|---|---|
| Groq (default) | Whisper Large v3 Turbo | 2 hours/day |
| Deepgram | Nova-3 | $200 credit |
| AssemblyAI | Universal-2 | $50 credit |
| Provider | Model | Free Tier |
|---|---|---|
| Gemini (default) | 2.5 Flash-Lite | 1,500 req/day |
| Groq | Llama 3.3 70B | 14,400 req/day |
| OpenAI | GPT-4o-mini | Paid only |
| Ollama | Any local model | Free (local) |
macOS only: Accessibility and Input Monitoring permissions (prompted on first launch)
Windows only: Microsoft Edge WebView2 Runtime (usually pre-installed on Windows 11)
git clone https://github.com/neonunez/voiceflow.git
cd voiceflow
macOS:
python3.11 -m venv .venv
.venv/bin/pip install -r requirements.txt -r requirements-macos.txt
Windows:
py -3.11 -m venv .venv
.venv\Scripts\pip install -r requirements.txt -r requirements-windows.txt
cd ui
npm install
npm run build
cd ..
Copy the example config and fill in your keys:
cp config.example.json config.json
Open config.json and add your API keys for the providers you want to use. At minimum you need one transcription key (Groq recommended) and one cleanup key (Gemini recommended):
{
"groq_api_key": "your-groq-key-here",
"gemini_api_key": "your-gemini-key-here"
}
All other settings can be left at their defaults and changed later from the settings window.
macOS:
./scripts/dev-start.sh
Windows:
scripts\dev-start.bat
The app launches as a menu bar icon. Click it and select Open VoiceFlow to open the settings window.
The default hotkeys on each platform:
| Mode | macOS | Windows |
|---|---|---|
| Hold to record | Cmd+Shift+Space |
Ctrl+Win |
| Toggle record | Cmd+Shift+D |
Alt+Win |
Both hotkeys are fully remappable from the Settings tab.
macOS: Grant Accessibility and Input Monitoring permissions when prompted — these are required for global hotkey detection.
Requires PyInstaller:
.venv/bin/pip install pyinstaller
Build the .app:
./scripts/bundle.sh
# output: dist/VoiceFlow.app
Package as a .dmg (requires dmgbuild and librsvg):
pip install dmgbuild
brew install librsvg
./scripts/package-dmg.sh
# output: dist/VoiceFlow-x.x.x-arm64.dmg
Requires PyInstaller and Inno Setup:
.venv\Scripts\pip install pyinstaller
scripts\bundle-windows.bat
# output: dist\VoiceFlow\VoiceFlow.exe
scripts\package-installer.bat
# output: dist\VoiceFlow-Setup.exe
voiceflow/
├── main.py # Entry point, app lifecycle, pipeline orchestration
├── recorder.py # Audio capture (sounddevice, 16kHz mono)
├── transcriber.py # Transcription API abstraction
├── cleaner.py # Cleanup provider abstraction
├── injector.py # Text injection (clipboard paste)
├── dictionary.py # Custom vocabulary
├── history.py # SQLite transcription history
├── config.py # Config file read/write
├── tray.py # System tray icon
├── overlay.py # Floating waveform overlay
├── media.py # Audio mute/restore + sound cues
├── bridge.py # Python ↔ frontend JS bridge
├── login_item.py # Launch at login
├── platform/ # Platform-specific implementations (macOS, Windows)
└── ui/
├── src/ # Svelte frontend source
└── dist/ # Built frontend (generated — not committed)
All settings live in config.json (auto-created from defaults on first run if not present):
| Key | Default | Description |
|---|---|---|
transcription_provider |
"groq" |
groq, deepgram, or assemblyai |
cleanup_provider |
"gemini" |
gemini, groq, openai, or ollama |
groq_api_key |
"" |
Groq API key |
deepgram_api_key |
"" |
Deepgram API key |
assemblyai_api_key |
"" |
AssemblyAI API key |
gemini_api_key |
"" |
Gemini API key |
openai_api_key |
"" |
OpenAI API key |
ollama_model |
"qwen2.5:3b" |
Ollama model name (when cleanup provider is ollama) |
hotkey_mode |
"hold" |
hold or toggle |
auto_mute_audio |
true |
Mute system audio during recording |
sound_feedback |
true |
Play start/stop sound cues |
launch_at_login |
false |
Start VoiceFlow at system login |
history_retention_days |
7 |
Days to keep transcription history (0 = forever) |
developer_mode |
false |
Coding-aware cleanup mode |
MIT — see LICENSE