Voiceflow

neonunez
2

A free/cheap speech-to-text alternative

#dictation #macos #python #speech-to-text #svelte #windows

VoiceFlow

A lightweight macOS and Windows menu bar app for AI-powered dictation. Press a hotkey anywhere on your system, speak, and polished text appears at your cursor — powered by cloud APIs at zero cost via free tiers.

Built as an open-source alternative to tools like Wispr Flow.

Download

Pre-built installers are available on the Releases page:

Platform	File
macOS (Apple Silicon)	`VoiceFlow-x.x.x-arm64.dmg`
Windows 10/11 (x64)	`VoiceFlowSetup-x.x.x-x64.exe`

macOS — first launch: VoiceFlow is not notarized by Apple, so macOS will block it on first open. Right-click VoiceFlow.app → Open → Open to bypass the Gatekeeper warning. You only need to do this once.

Features

Hold-to-record or toggle mode — two hotkey styles, fully configurable
Works in any app — system-level text injection via clipboard paste
LLM text cleanup — fixes transcription errors, adds punctuation, never rephrases
Switchable providers — choose your transcription and cleanup API independently
Custom dictionary — teach VoiceFlow your technical vocabulary
Transcription history — searchable log with copy and delete
Floating waveform overlay — visual feedback during recording
Developer mode — coding-aware cleanup that preserves casing, symbols, and @file references
Minimal footprint — ~35–50 MB idle, ~0% CPU

Platform Support

Platform	Status
macOS 13+ (Apple Silicon)	Supported
Windows 10/11 (x64)	Supported

API Providers

VoiceFlow uses external APIs for transcription and text cleanup. All have free tiers sufficient for personal use.

Transcription (pick one)

Provider	Model	Free Tier
Groq (default)	Whisper Large v3 Turbo	2 hours/day
Deepgram	Nova-3	$200 credit
AssemblyAI	Universal-2	$50 credit

Cleanup (pick one)

Provider	Model	Free Tier
Gemini (default)	2.5 Flash-Lite	1,500 req/day
Groq	Llama 3.3 70B	14,400 req/day
OpenAI	GPT-4o-mini	Paid only
Ollama	Any local model	Free (local)

Requirements

Python 3.11+
Node.js 18+ (build-time only — for the frontend)
API keys for your chosen providers (see above)

macOS only: Accessibility and Input Monitoring permissions (prompted on first launch)

Windows only: Microsoft Edge WebView2 Runtime (usually pre-installed on Windows 11)

Setup

1. Clone the repo

git clone https://github.com/neonunez/voiceflow.git
cd voiceflow

2. Create the Python virtual environment

macOS:

python3.11 -m venv .venv
.venv/bin/pip install -r requirements.txt -r requirements-macos.txt

Windows:

py -3.11 -m venv .venv
.venv\Scripts\pip install -r requirements.txt -r requirements-windows.txt

3. Build the frontend

cd ui
npm install
npm run build
cd ..

4. Configure your API keys

Copy the example config and fill in your keys:

cp config.example.json config.json

Open config.json and add your API keys for the providers you want to use. At minimum you need one transcription key (Groq recommended) and one cleanup key (Gemini recommended):

{
  "groq_api_key": "your-groq-key-here",
  "gemini_api_key": "your-gemini-key-here"
}

All other settings can be left at their defaults and changed later from the settings window.

5. Run

macOS:

./scripts/dev-start.sh

Windows:

scripts\dev-start.bat

The app launches as a menu bar icon. Click it and select Open VoiceFlow to open the settings window.

Hotkeys

The default hotkeys on each platform:

Mode	macOS	Windows
Hold to record	`Cmd+Shift+Space`	`Ctrl+Win`
Toggle record	`Cmd+Shift+D`	`Alt+Win`

Both hotkeys are fully remappable from the Settings tab.

macOS: Grant Accessibility and Input Monitoring permissions when prompted — these are required for global hotkey detection.

Building the App Bundle

macOS (.app + .dmg)

Requires PyInstaller:

.venv/bin/pip install pyinstaller

Build the .app:

./scripts/bundle.sh
# output: dist/VoiceFlow.app

Package as a .dmg (requires dmgbuild and librsvg):

pip install dmgbuild
brew install librsvg
./scripts/package-dmg.sh
# output: dist/VoiceFlow-x.x.x-arm64.dmg

Windows (.exe installer)

Requires PyInstaller and Inno Setup:

.venv\Scripts\pip install pyinstaller
scripts\bundle-windows.bat
# output: dist\VoiceFlow\VoiceFlow.exe

scripts\package-installer.bat
# output: dist\VoiceFlow-Setup.exe

Project Structure

voiceflow/
├── main.py              # Entry point, app lifecycle, pipeline orchestration
├── recorder.py          # Audio capture (sounddevice, 16kHz mono)
├── transcriber.py       # Transcription API abstraction
├── cleaner.py           # Cleanup provider abstraction
├── injector.py          # Text injection (clipboard paste)
├── dictionary.py        # Custom vocabulary
├── history.py           # SQLite transcription history
├── config.py            # Config file read/write
├── tray.py              # System tray icon
├── overlay.py           # Floating waveform overlay
├── media.py             # Audio mute/restore + sound cues
├── bridge.py            # Python ↔ frontend JS bridge
├── login_item.py        # Launch at login
├── platform/            # Platform-specific implementations (macOS, Windows)
└── ui/
    ├── src/             # Svelte frontend source
    └── dist/            # Built frontend (generated — not committed)

Configuration Reference

All settings live in config.json (auto-created from defaults on first run if not present):

Key	Default	Description
`transcription_provider`	`"groq"`	`groq`, `deepgram`, or `assemblyai`
`cleanup_provider`	`"gemini"`	`gemini`, `groq`, `openai`, or `ollama`
`groq_api_key`	`""`	Groq API key
`deepgram_api_key`	`""`	Deepgram API key
`assemblyai_api_key`	`""`	AssemblyAI API key
`gemini_api_key`	`""`	Gemini API key
`openai_api_key`	`""`	OpenAI API key
`ollama_model`	`"qwen2.5:3b"`	Ollama model name (when cleanup provider is `ollama`)
`hotkey_mode`	`"hold"`	`hold` or `toggle`
`auto_mute_audio`	`true`	Mute system audio during recording
`sound_feedback`	`true`	Play start/stop sound cues
`launch_at_login`	`false`	Start VoiceFlow at system login
`history_retention_days`	`7`	Days to keep transcription history (0 = forever)
`developer_mode`	`false`	Coding-aware cleanup mode

License

MIT — see LICENSE

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing