voiceflow Svelte Themes

Voiceflow

A free/cheap speech-to-text alternative

VoiceFlow

A lightweight macOS and Windows menu bar app for AI-powered dictation. Press a hotkey anywhere on your system, speak, and polished text appears at your cursor — powered by cloud APIs at zero cost via free tiers.

Built as an open-source alternative to tools like Wispr Flow.


Download

Pre-built installers are available on the Releases page:

Platform File
macOS (Apple Silicon) VoiceFlow-x.x.x-arm64.dmg
Windows 10/11 (x64) VoiceFlowSetup-x.x.x-x64.exe

macOS — first launch: VoiceFlow is not notarized by Apple, so macOS will block it on first open. Right-click VoiceFlow.appOpenOpen to bypass the Gatekeeper warning. You only need to do this once.


Features

  • Hold-to-record or toggle mode — two hotkey styles, fully configurable
  • Works in any app — system-level text injection via clipboard paste
  • LLM text cleanup — fixes transcription errors, adds punctuation, never rephrases
  • Switchable providers — choose your transcription and cleanup API independently
  • Custom dictionary — teach VoiceFlow your technical vocabulary
  • Transcription history — searchable log with copy and delete
  • Floating waveform overlay — visual feedback during recording
  • Developer mode — coding-aware cleanup that preserves casing, symbols, and @file references
  • Minimal footprint — ~35–50 MB idle, ~0% CPU

Platform Support

Platform Status
macOS 13+ (Apple Silicon) Supported
Windows 10/11 (x64) Supported

API Providers

VoiceFlow uses external APIs for transcription and text cleanup. All have free tiers sufficient for personal use.

Transcription (pick one)

Provider Model Free Tier
Groq (default) Whisper Large v3 Turbo 2 hours/day
Deepgram Nova-3 $200 credit
AssemblyAI Universal-2 $50 credit

Cleanup (pick one)

Provider Model Free Tier
Gemini (default) 2.5 Flash-Lite 1,500 req/day
Groq Llama 3.3 70B 14,400 req/day
OpenAI GPT-4o-mini Paid only
Ollama Any local model Free (local)

Requirements

  • Python 3.11+
  • Node.js 18+ (build-time only — for the frontend)
  • API keys for your chosen providers (see above)

macOS only: Accessibility and Input Monitoring permissions (prompted on first launch)

Windows only: Microsoft Edge WebView2 Runtime (usually pre-installed on Windows 11)


Setup

1. Clone the repo

git clone https://github.com/neonunez/voiceflow.git
cd voiceflow

2. Create the Python virtual environment

macOS:

python3.11 -m venv .venv
.venv/bin/pip install -r requirements.txt -r requirements-macos.txt

Windows:

py -3.11 -m venv .venv
.venv\Scripts\pip install -r requirements.txt -r requirements-windows.txt

3. Build the frontend

cd ui
npm install
npm run build
cd ..

4. Configure your API keys

Copy the example config and fill in your keys:

cp config.example.json config.json

Open config.json and add your API keys for the providers you want to use. At minimum you need one transcription key (Groq recommended) and one cleanup key (Gemini recommended):

{
  "groq_api_key": "your-groq-key-here",
  "gemini_api_key": "your-gemini-key-here"
}

All other settings can be left at their defaults and changed later from the settings window.

5. Run

macOS:

./scripts/dev-start.sh

Windows:

scripts\dev-start.bat

The app launches as a menu bar icon. Click it and select Open VoiceFlow to open the settings window.


Hotkeys

The default hotkeys on each platform:

Mode macOS Windows
Hold to record Cmd+Shift+Space Ctrl+Win
Toggle record Cmd+Shift+D Alt+Win

Both hotkeys are fully remappable from the Settings tab.

macOS: Grant Accessibility and Input Monitoring permissions when prompted — these are required for global hotkey detection.


Building the App Bundle

macOS (.app + .dmg)

Requires PyInstaller:

.venv/bin/pip install pyinstaller

Build the .app:

./scripts/bundle.sh
# output: dist/VoiceFlow.app

Package as a .dmg (requires dmgbuild and librsvg):

pip install dmgbuild
brew install librsvg
./scripts/package-dmg.sh
# output: dist/VoiceFlow-x.x.x-arm64.dmg

Windows (.exe installer)

Requires PyInstaller and Inno Setup:

.venv\Scripts\pip install pyinstaller
scripts\bundle-windows.bat
# output: dist\VoiceFlow\VoiceFlow.exe

scripts\package-installer.bat
# output: dist\VoiceFlow-Setup.exe

Project Structure

voiceflow/
├── main.py              # Entry point, app lifecycle, pipeline orchestration
├── recorder.py          # Audio capture (sounddevice, 16kHz mono)
├── transcriber.py       # Transcription API abstraction
├── cleaner.py           # Cleanup provider abstraction
├── injector.py          # Text injection (clipboard paste)
├── dictionary.py        # Custom vocabulary
├── history.py           # SQLite transcription history
├── config.py            # Config file read/write
├── tray.py              # System tray icon
├── overlay.py           # Floating waveform overlay
├── media.py             # Audio mute/restore + sound cues
├── bridge.py            # Python ↔ frontend JS bridge
├── login_item.py        # Launch at login
├── platform/            # Platform-specific implementations (macOS, Windows)
└── ui/
    ├── src/             # Svelte frontend source
    └── dist/            # Built frontend (generated — not committed)

Configuration Reference

All settings live in config.json (auto-created from defaults on first run if not present):

Key Default Description
transcription_provider "groq" groq, deepgram, or assemblyai
cleanup_provider "gemini" gemini, groq, openai, or ollama
groq_api_key "" Groq API key
deepgram_api_key "" Deepgram API key
assemblyai_api_key "" AssemblyAI API key
gemini_api_key "" Gemini API key
openai_api_key "" OpenAI API key
ollama_model "qwen2.5:3b" Ollama model name (when cleanup provider is ollama)
hotkey_mode "hold" hold or toggle
auto_mute_audio true Mute system audio during recording
sound_feedback true Play start/stop sound cues
launch_at_login false Start VoiceFlow at system login
history_retention_days 7 Days to keep transcription history (0 = forever)
developer_mode false Coding-aware cleanup mode

License

MIT — see LICENSE

Top categories

Loading Svelte Themes