Adaptive dictation for developers. Local Whisper transcription with a 3-layer correction pipeline that improves as you use it.
Built by PureTensor Inc.
Tala -- Icelandic for "to speak."
TalaX is functional but early-stage. Here is what works today and what is still in progress.
Working:
Working but cold-start dependent:
save_corrections IPC -> word-level diff -> pattern extraction -> auto-apply flag rebuild -> pipeline reload (including n-gram retrain). This works within and across app sessions. However, effectiveness scales with usage -- a fresh install has no training data.Planned / not yet implemented:
Tauri v2 (Svelte 5 frontend, Rust backend)
|
+-- whisper.cpp (whisper-rs) -- local STT, CPU-only
| tiny.en bundled, small.en-q5_1 recommended
|
+-- 3-Layer Correction Pipeline
| L1: Dictionary -- regex substitution from learned patterns
| L2: N-gram -- interpolated trigram model (0.6/0.3/0.1)
| L3: Heuristic -- Levenshtein, Double Metaphone, compounds
|
+-- SQLite (WAL) -- corrections, sessions, learning loop
|
+-- System Integration
Global hotkey (rdev) -- push-to-talk
Text injection (arboard + enigo) -- clipboard or keystroke
Audio capture (cpal) -- 16kHz mono with energy VAD
System tray -- recording indicator
Ctrl+Shift+Space)When you review and correct a transcription, the diff is extracted at the word level and stored as correction patterns. Patterns that recur 3+ times with high confidence are promoted to auto-apply. The n-gram model retrains on your reviewed corpus each time the pipeline reloads, improving context-aware corrections over time.
Layer 1 -- Dictionary: Exact regex substitutions from learned patterns. Longest-match-first ordering, automatic case preservation.
Layer 2 -- N-gram: Interpolated trigram model (0.6 tri + 0.3 bi + 0.1 uni) trained on your reviewed correction history. Flags low-probability words given their context. Starts inert on a fresh profile and activates as you build up reviewed transcriptions.
Layer 3 -- Heuristic Expander: Catches what the first two layers miss:
Separate correction databases per context. Switch freely between profiles. Config, profiles, and models now live under the platform-managed Tauri config/data directories rather than a hard-coded path.
<app-config-dir>/
config.toml
<app-data-dir>/
profiles/
default/
corrections.db
ngram.bin
domain_context.json
profile.toml
work-devops/
...
models/
ggml-small.en-q5_1.bin
...
Downloaded from HuggingFace on first use with progress tracking and integrity verification.
| Model | Size | Speed | Accuracy |
|---|---|---|---|
| tiny.en | 75 MB | Fastest | Basic |
| base.en | 142 MB | Fast | Moderate |
| small.en-q5_1 | 181 MB | Balanced | Good (recommended) |
| medium.en-q5_0 | 515 MB | Slower | High |
| large-v3-turbo-q5_0 | 574 MB | Slowest | Highest |
libasound2-dev, libx11-dev, libxtst-dev, cmake, pkg-configgit clone https://github.com/puretensor/talax-dictation.git
cd talax-dictation
cd ui && npm install && cd ..
# Development mode
cargo tauri dev
# Engine tests
cargo test -p talax-engine
# Type-check frontend
cd ui && npx svelte-check
The engine test suite currently includes 137 unit tests, 34 integration tests, and doctests. Coverage focuses on:
| Area | Covers |
|---|---|
| Dictionary corrector | Word boundaries, case preservation, longest-match behavior |
| N-gram corrector | Training, save/load, scoring, vocabulary behavior |
| Heuristic expander | Levenshtein, Metaphone, compounds, acronyms, numbers |
| Database | Corrections, patterns, auto-apply, domain context |
| Profiles | CRUD, clone independence, reset behavior |
| Audio | Energy detection, VAD state transitions, ring buffer behavior |
| Whisper | Params, conversion, model metadata, serialization |
| Hotkey | Parse, key mapping, validation, serialization |
| Inject | Config, serialization, mode handling |
| Integration | Full pipeline, multi-layer correction, reload behavior |
crates/talax-app/gen/ is treated as disposable generated output and is not committed.cargo test -p talax-engine, cd ui && npm run check, and a manual desktop smoke test on the target platform.| Component | Role |
|---|---|
| Tauri v2 | Desktop app shell |
| Svelte 5 | Frontend UI (7 views: Dictate, Editor, Profiles, Patterns, Stats, Settings, Onboarding) |
| whisper-rs (whisper.cpp) | Local speech-to-text |
| rusqlite | Correction database |
| rdev | Global hotkey detection |
| cpal | Audio capture with VAD |
| arboard + enigo | Text injection (clipboard paste / keystroke simulation) |
talax/
Cargo.toml # Workspace: talax-engine + talax-app
crates/
talax-engine/ # Core library (no UI dependency)
src/
audio/ # cpal capture, VAD, ring buffer
db/ # SQLite schema, corrections, sessions
hotkey/ # Global hotkey detection
inject/ # Text injection
pipeline/ # dict_corrector, ngram, heuristic expander
profile/ # Voice profile management
whisper/ # Transcriber + model manager
talax-app/ # Tauri v2 application
src/
commands.rs # 21 IPC command handlers
recording.rs # Recording state machine
tray.rs # System tray
ui/ # Svelte 5 frontend
Business Source License 1.1. Copyright 2026 PureTensor Inc. Converts to Apache 2.0 on 2030-03-28. See LICENSE.