Bvoice

ErfanFathi
2

Push-to-talk dictation for Linux. Hold a key, speak, release the transcription is typed at your cursor. 100% offline, powered by whisper.cpp. Built with Rust, Tauri and Svelte.

#dictation #linux #offline #privacy #push-to-talk #rust #speech-to-text #stt #svelte #tauri

Download

BVoice

Local push-to-talk speech-to-text desktop app.
Hold a key, speak, release — the transcription is typed at your cursor.
Runs 100% offline using whisper.cpp.

Install

Grab the latest build from the Releases page.

Debian / Ubuntu — .deb:

sudo apt install ./BVoice_0.1.2_amd64.deb

Fedora / RHEL / openSUSE — .rpm:

sudo dnf install ./BVoice-0.1.2-1.x86_64.rpm

After install you'll find BVoice in your application menu. On first launch the selected whisper model (~~75–466 MB) downloads to `~~/.local/share/bvoice/models/`.

The packages declare a runtime dependency on xdotool — used to type the transcription at the cursor.

To uninstall, the registered package name is b-voice (the bundler kebab-cases BVoice):

sudo apt remove b-voice          # Debian / Ubuntu
sudo dnf remove b-voice          # Fedora / RHEL / openSUSE

The terminal command is still bvoice — only the package name carries the hyphen.

Features

Push-to-talk trigger: hold Ctrl + Win (instant arm — no hold delay)
Local transcription with whisper.cpp (tiny.en / base.en / small.en, full or quantized q5_1 / q8_0)
Optional Silero VAD silence trim with tunable threshold
FFT-based resampling (rubato) for high-quality 48 kHz → 16 kHz conversion
Greedy decoding by default; configurable beam search (set beam_size ≥ 2)
Live-applied settings for threshold, input device, and model swap — no restart
Always-on-top desktop overlay reflects state (idle / recording / transcribing) with red and orange pulse animations; draggable, position persists
Single-instance enforcement; optional autostart on login
Types output directly at the cursor — never touches your clipboard

Usage

Launch BVoice — a tray icon appears (no window by default).
Click the tray icon → Settings to configure model, input device, beam size, VAD, and autostart.
Focus any text field (editor, terminal, browser, …).
Hold Ctrl + Win, speak, release.
The transcription is typed at the cursor.

Platform support

Linux / X11 — primary target, tested on Ubuntu GNOME
Wayland — not supported (global hotkeys and synthesized typing require compositor-specific portals)
macOS / Windows — not yet ported

Configuration

Settings persist at ~/.config/bvoice/config.toml:

Key	Type	Default	Description
`model`	string	`base.en`	Whisper model; append `-q5_1` or `-q8_0` for quantized
`input_device`	string\|null	`null`	Input device name; null = system default
`beam_size`	u32	`1`	Beam search size; `1` = greedy
`use_vad`	bool	`false`	Trim silence with Silero VAD before transcription
`vad_threshold`	f32	`0.5`	VAD speech probability threshold (0–1); active when on
`overlay_position`	[i32, i32]	bottom-right	Desktop overlay position; written automatically when you drag it

The trigger is hardcoded to Ctrl + Win and is not user-configurable.

The overlay icon and overlay_position update on drag — the rest is editable from the Settings window and persists on Save.

Build from source

Prerequisites

Rust (stable) via rustup
Node.js 20+ and npm
Tauri CLI: cargo install tauri-cli --version '^2.0' --locked

Linux system packages (Ubuntu/Debian):

sudo apt install \
  libwebkit2gtk-4.1-dev libsoup-3.0-dev libayatana-appindicator3-dev \
  libasound2-dev libpulse-dev libclang-dev libssl-dev libstdc++-12-dev \
  pkg-config build-essential

Run / build

npm install
npm run tauri dev          # development
npm run tauri build        # release bundles (.deb + .rpm)

Architecture

setup (background thread):  model::ensure_model  ─▶  transcribe::init  (whisper-rs context)

hotkey (rdev, X11 XRecord)  ─▶ state machine (Ctrl+Win chord)
                                   │
                             armed ▼
                             audio::start          (cpal capture on dedicated thread,
                                                    PulseAudio source via libpulse-binding)
                                   │
                          released ▼
                             audio::stop           (downmix to mono → rubato 48→16 kHz)
                                   │
                       (if use_vad) ▼
                             vad::trim_silence_with  (Silero VAD, configurable threshold)
                                   │
                                   ▼
                             transcribe::transcribe  (whisper-rs, beam_size≥2 → beam search,
                                                      else greedy; nonverbal segments filtered)
                                   │
                                   ▼
                             inject::paste         (xdotool type --delay 0 —
                                                    types at cursor, no clipboard)

watchdog thread: forces reset if Recording > 60s or Transcribing > 45s

License

MIT — see LICENSE.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing