Talax Dictation

puretensor

Local-first adaptive dictation for developers

#developer-tools #dictation #local-first #rust #speech-to-text #svelte #tauri #whisper

TalaX

Adaptive dictation for developers. Local Whisper transcription with a 3-layer correction pipeline that improves as you use it.

Built by PureTensor Inc.

Tala -- Icelandic for "to speak."

Learns your vocabulary -- corrections are persisted to SQLite; after 3 consistent fixes for the same word, corrections auto-apply
3-layer correction pipeline -- dictionary substitution (<1ms), n-gram context scoring (<50ms), heuristic expansion (<10ms)
Fully local -- whisper.cpp via whisper-rs, no audio leaves your machine, no subscription
Voice profiles -- separate correction databases per context (work, personal, project-specific)

Development Status

TalaX is functional but early-stage. Here is what works today and what is still in progress.

Working:

Local Whisper transcription (whisper.cpp via whisper-rs, CPU-only)
Push-to-talk recording with energy-based VAD and pre-roll buffer
L1 Dictionary correction -- regex substitution from learned patterns, auto-apply after 3 consistent fixes
L3 Heuristic correction -- Levenshtein fuzzy matching, Double Metaphone phonetic matching, compound word detection, acronym restoration, number normalization
SQLite persistence -- sessions, segments, and correction patterns survive app restarts
Voice profiles with independent correction databases
Text injection via clipboard paste or keystroke simulation
Model download with progress tracking and integrity verification
Tauri v2 desktop app with Svelte 5 frontend (7 views)

Working but cold-start dependent:

L2 N-gram correction -- the interpolated trigram model is fully implemented (training, scoring, save/load, apply), but it only activates after you have reviewed and corrected enough transcriptions to build a meaningful language model. On a fresh profile, L2 is inert.
The correction feedback loop is wired end-to-end: frontend edit -> save_corrections IPC -> word-level diff -> pattern extraction -> auto-apply flag rebuild -> pipeline reload (including n-gram retrain). This works within and across app sessions. However, effectiveness scales with usage -- a fresh install has no training data.

Planned / not yet implemented:

Cross-profile pattern sharing
Audio excerpt extraction for fine-tuning
Batch correction review workflows
Platform-specific packaging and distribution

Architecture

Tauri v2 (Svelte 5 frontend, Rust backend)
  |
  +-- whisper.cpp (whisper-rs) -- local STT, CPU-only
  |     tiny.en bundled, small.en-q5_1 recommended
  |
  +-- 3-Layer Correction Pipeline
  |     L1: Dictionary    -- regex substitution from learned patterns
  |     L2: N-gram        -- interpolated trigram model (0.6/0.3/0.1)
  |     L3: Heuristic     -- Levenshtein, Double Metaphone, compounds
  |
  +-- SQLite (WAL) -- corrections, sessions, learning loop
  |
  +-- System Integration
        Global hotkey (rdev) -- push-to-talk
        Text injection (arboard + enigo) -- clipboard or keystroke
        Audio capture (cpal) -- 16kHz mono with energy VAD
        System tray -- recording indicator

How It Works

Hold the hotkey (default: Ctrl+Shift+Space)
Speak naturally
Release
TalaX transcribes locally, runs corrections, and injects text into your active app

When you review and correct a transcription, the diff is extracted at the word level and stored as correction patterns. Patterns that recur 3+ times with high confidence are promoted to auto-apply. The n-gram model retrains on your reviewed corpus each time the pipeline reloads, improving context-aware corrections over time.

Correction Pipeline

Layer 1 -- Dictionary: Exact regex substitutions from learned patterns. Longest-match-first ordering, automatic case preservation.

Layer 2 -- N-gram: Interpolated trigram model (0.6 tri + 0.3 bi + 0.1 uni) trained on your reviewed correction history. Flags low-probability words given their context. Starts inert on a fresh profile and activates as you build up reviewed transcriptions.

Layer 3 -- Heuristic Expander: Catches what the first two layers miss:

Fuzzy matching (Levenshtein distance <= 2)
Phonetic matching (Double Metaphone)
Compound word detection (split/join)
Acronym restoration (lowercase to uppercase)
Number normalization ("v two" to "v2", "node zero" to "node0")

Voice Profiles

Separate correction databases per context. Switch freely between profiles. Config, profiles, and models now live under the platform-managed Tauri config/data directories rather than a hard-coded path.

<app-config-dir>/
    config.toml

<app-data-dir>/
    profiles/
        default/
            corrections.db
            ngram.bin
            domain_context.json
            profile.toml
        work-devops/
            ...
    models/
        ggml-small.en-q5_1.bin
        ...

Whisper Models

Downloaded from HuggingFace on first use with progress tracking and integrity verification.

Model	Size	Speed	Accuracy
tiny.en	75 MB	Fastest	Basic
base.en	142 MB	Fast	Moderate
small.en-q5_1	181 MB	Balanced	Good (recommended)
medium.en-q5_0	515 MB	Slower	High
large-v3-turbo-q5_0	574 MB	Slowest	Highest

Development

Prerequisites

Rust (stable, edition 2024)
Node.js 24+ recommended
Linux system libraries: libasound2-dev, libx11-dev, libxtst-dev, cmake, pkg-config

Build and Run

git clone https://github.com/puretensor/talax-dictation.git
cd talax-dictation
cd ui && npm install && cd ..

# Development mode
cargo tauri dev

# Engine tests
cargo test -p talax-engine

# Type-check frontend
cd ui && npx svelte-check

Test Coverage

The engine test suite currently includes 137 unit tests, 34 integration tests, and doctests. Coverage focuses on:

Area	Covers
Dictionary corrector	Word boundaries, case preservation, longest-match behavior
N-gram corrector	Training, save/load, scoring, vocabulary behavior
Heuristic expander	Levenshtein, Metaphone, compounds, acronyms, numbers
Database	Corrections, patterns, auto-apply, domain context
Profiles	CRUD, clone independence, reset behavior
Audio	Energy detection, VAD state transitions, ring buffer behavior
Whisper	Params, conversion, model metadata, serialization
Hotkey	Parse, key mapping, validation, serialization
Inject	Config, serialization, mode handling
Integration	Full pipeline, multi-layer correction, reload behavior

Release Validation

Generated Tauri schema output under crates/talax-app/gen/ is treated as disposable generated output and is not committed.
Before packaging a release, run cargo test -p talax-engine, cd ui && npm run check, and a manual desktop smoke test on the target platform.

Tech Stack

Component	Role
Tauri v2	Desktop app shell
Svelte 5	Frontend UI (7 views: Dictate, Editor, Profiles, Patterns, Stats, Settings, Onboarding)
whisper-rs (whisper.cpp)	Local speech-to-text
rusqlite	Correction database
rdev	Global hotkey detection
cpal	Audio capture with VAD
arboard + enigo	Text injection (clipboard paste / keystroke simulation)

Project Structure

talax/
    Cargo.toml                   # Workspace: talax-engine + talax-app
    crates/
        talax-engine/         # Core library (no UI dependency)
            src/
                audio/           # cpal capture, VAD, ring buffer
                db/              # SQLite schema, corrections, sessions
                hotkey/          # Global hotkey detection
                inject/          # Text injection
                pipeline/        # dict_corrector, ngram, heuristic expander
                profile/         # Voice profile management
                whisper/         # Transcriber + model manager
        talax-app/            # Tauri v2 application
            src/
                commands.rs      # 21 IPC command handlers
                recording.rs     # Recording state machine
                tray.rs          # System tray
    ui/                          # Svelte 5 frontend

License

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing