Modular AI-agent & voice pipeline in Rust. Features async-first orchestration (VAD, STT, LLM, TTS), Home Assistant integration, and a real-time Tauri/Svelte desktop UI.
Herzen is a high-performance, local-first voice assistant and AI orchestration engine designed with privacy and modularity at its core. Built entirely in Rust, it provides a complete pipeline for ambient intelligenceโfrom voice activity detection and speech recognition to semantic intent matching and LLM-driven responses.
Herzen utilizes a sophisticated workspace-based architecture where every stage of the voice-to-action pipeline is decoupled into specialized, reusable crates:
Powered by Tokio, the core daemon manages complex concurrent workflows without blocking. It leverages tokio::broadcast channels to emit real-time telemetry and pipeline events, which are streamed via WebSockets to the desktop interface for a low-latency, "alive" user experience.
Herzen features a multi-layered intent matching system designed for precision and flexibility:
The system includes a modern Tauri + Svelte desktop application that acts as a control panel, providing real-time visibility into the agent's internal state, pipeline progress, and model performance.
crates/
โโโ herzen-audio # Low-level audio capture and playback
โโโ herzen-config # Centralized typed configuration (TOML + JSON Schema)
โโโ herzen-context # Global state and conversation context management
โโโ herzen-core # Pipeline orchestration logic and shared traits
โโโ herzen-daemon # Main service entry point and lifecycle management
โโโ herzen-ha # Home Assistant WebSocket integration
โโโ herzen-llm # LLM provider abstraction and model pool
โโโ herzen-router # Intent routing and skill dispatch
โโโ herzen-server # Axum-based API and WebSocket broadcast server
โโโ herzen-skills # Semantic skill engine and intent matching
โโโ herzen-stt # Speech-to-Text (STT) module
โโโ herzen-tts # Text-to-Speech (TTS) module
โโโ herzen-vad # Voice Activity Detection (VAD)
Herzen is built on the principle of Local-First AI. By keeping the entire pipelineโfrom speech recognition to large language model inferenceโwithin the local network, Herzen ensures maximum privacy and minimal latency. The architecture is deliberately modular, allowing for the seamless swapping of components (e.g., STT engines or LLM providers) without modifying the core orchestration logic.