Local push-to-talk speech-to-text desktop app.
Hold a key, speak, release — the transcription is typed at your cursor.
Runs 100% offline using whisper.cpp.
Grab the latest build from the Releases page.
Debian / Ubuntu — .deb:
sudo apt install ./BVoice_0.1.2_amd64.deb
Fedora / RHEL / openSUSE — .rpm:
sudo dnf install ./BVoice-0.1.2-1.x86_64.rpm
After install you'll find BVoice in your application menu. On first launch the selected whisper model (75–466 MB) downloads to `/.local/share/bvoice/models/`.
The packages declare a runtime dependency on xdotool — used to type the transcription at the cursor.
To uninstall, the registered package name is b-voice (the bundler kebab-cases BVoice):
sudo apt remove b-voice # Debian / Ubuntu
sudo dnf remove b-voice # Fedora / RHEL / openSUSE
The terminal command is still bvoice — only the package name carries the hyphen.
tiny.en / base.en / small.en, full or quantized q5_1 / q8_0)beam_size ≥ 2)Settings persist at ~/.config/bvoice/config.toml:
| Key | Type | Default | Description |
|---|---|---|---|
model |
string | base.en |
Whisper model; append -q5_1 or -q8_0 for quantized |
input_device |
string|null | null |
Input device name; null = system default |
beam_size |
u32 | 1 |
Beam search size; 1 = greedy |
use_vad |
bool | false |
Trim silence with Silero VAD before transcription |
vad_threshold |
f32 | 0.5 |
VAD speech probability threshold (0–1); active when on |
overlay_position |
[i32, i32] | bottom-right | Desktop overlay position; written automatically when you drag it |
The trigger is hardcoded to Ctrl + Win and is not user-configurable.
The overlay icon and overlay_position update on drag — the rest is editable from the Settings window and persists on Save.
cargo install tauri-cli --version '^2.0' --lockedsudo apt install \
libwebkit2gtk-4.1-dev libsoup-3.0-dev libayatana-appindicator3-dev \
libasound2-dev libpulse-dev libclang-dev libssl-dev libstdc++-12-dev \
pkg-config build-essential
npm install
npm run tauri dev # development
npm run tauri build # release bundles (.deb + .rpm)
setup (background thread): model::ensure_model ─▶ transcribe::init (whisper-rs context)
hotkey (rdev, X11 XRecord) ─▶ state machine (Ctrl+Win chord)
│
armed ▼
audio::start (cpal capture on dedicated thread,
PulseAudio source via libpulse-binding)
│
released ▼
audio::stop (downmix to mono → rubato 48→16 kHz)
│
(if use_vad) ▼
vad::trim_silence_with (Silero VAD, configurable threshold)
│
▼
transcribe::transcribe (whisper-rs, beam_size≥2 → beam search,
else greedy; nonverbal segments filtered)
│
▼
inject::paste (xdotool type --delay 0 —
types at cursor, no clipboard)
watchdog thread: forces reset if Recording > 60s or Transcribing > 45s
MIT — see LICENSE.