T2t

Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.

#accessibility #clipboard #desktop-app #dictation #local-first #macos #offline #productivity #push-to-talk #rust

Demo Download

t2t

Voice-to-text with intelligence. Hold fn to talk, hold fn+ctrl to command.

Download

Download for macOS →

View all releases on GitHub →

Note: The app is not code-signed yet. On first launch, macOS may show a security warning. To open it:

Right-click the app → Open, then click Open in the dialog

Or run: xattr -cr /Applications/t2t.app in Terminal

Heads up: This is an unsigned build while we polish things up. Each time you update to a new version, you'll need to remove t2t from System Settings → Privacy & Security → Accessibility (and Microphone if needed), then re-add it. We'll get it properly signed soon!

How It Works

Hold Fn key → records microphone audio
Release Fn key → transcribes using local Whisper model
Typing mode (red bar): Hold Fn alone → pastes transcription into focused text field, preserves clipboard
Agent mode (cyan bar): Hold Fn+Ctrl → speaks commands to AI agent
- MCP mode (if configured): Connects to MCP servers, uses their tools via OpenRouter AI
- AppleScript mode (fallback): Generates and executes AppleScript for macOS automation
Visual feedback: red/cyan bar while recording (based on mode), amber while processing

Requirements

macOS (currently macOS only; tested on Apple Silicon)
Accessibility permission - Required for Fn key detection and focusing the correct field before paste
Microphone permission - Required for audio recording
OpenRouter API key (for agent mode) - Get one at openrouter.ai

The app will prompt you if permissions are missing.

Getting Started

Download and install the app from t2t.now
Grant permissions when prompted (Accessibility and Microphone)
Get an OpenRouter API key at openrouter.ai (required for agent mode)
Open settings: Click the menu bar icon → View Settings
Configure agent mode (optional):
- Add your OpenRouter API key in settings
- Optionally configure MCP servers for extended automation

Settings & Analytics

The settings window (Menu bar icon → View Settings) includes three tabs:

Analytics Tab

View your transcription usage statistics:

Total Words: Lifetime count of all transcribed words
Lifetime Average: Average words per minute across all sessions
Session Average: Average words per minute for current session
Sessions: Total number of transcription sessions
Hours Active: Total time spent transcribing
Recent Activity: 48-hour hourly activity chart

Settings Tab

Configure your t2t installation:

Theme: Toggle between light and dark mode
OpenRouter API Key: Set your API key for agent mode
AI Model Selection: Choose which model to use for agent mode
- Supports all OpenRouter models
- Auto-refresh available to fetch latest models
MCP Servers: Add, configure, and manage MCP servers
- Test connections and view available tools
- Enable/disable servers individually
- Supports stdio, HTTP, and SSE transports

History Tab

See History & Logging section below.

MCP (Model Context Protocol) Support

When MCP servers are configured in settings, agent mode uses MCP instead of AppleScript. This enables:

Extensible automation: Connect to any MCP-compatible service (databases, APIs, file systems, etc.)
Tool-based execution: AI agent uses tools provided by your MCP servers
Multiple servers: Connect to multiple MCP servers simultaneously
Transport options: Supports stdio, HTTP, and SSE transports

To configure: Menu bar icon → View Settings → Settings tab → MCP Servers section. Requires an OpenRouter API key.

Vision Support & Automatic Screenshots

t2t automatically captures and includes a screenshot with every agent call, enabling vision-capable models to "see" your screen context. This works seamlessly with any model - vision-capable models process the image, while text-only models simply ignore it.

How It Works

Automatic capture: When you use agent mode (Fn+Ctrl), a screenshot is captured before sending your prompt
Universal support: Screenshots are included with all agent calls, regardless of model selection
Smart routing: OpenRouter automatically routes to vision-capable models when available, or ignores the image for text-only models
Seamless integration: Screenshots are included in the API request without any additional UI or user action
Privacy: Screenshots are only sent to the API (not stored locally), and thumbnails are visible in History

Privacy & Permissions

Screen Recording permission: macOS may prompt for screen recording permission the first time you use agent mode
No local storage: Full screenshots are not saved to disk - they're only sent to the API
Thumbnails: Small thumbnails (150x150px) are stored locally in History for reference
Error handling: If screenshot capture fails (e.g., permission denied), the agent falls back to text-only mode

Technical Details

Screenshots are captured using macOS screencapture command
Images are encoded as base64 PNG and included in the OpenAI-compatible message format
The screenshot is included in both initial requests and follow-up requests after tool execution
Vision-capable models (GPT-4 Vision, Claude 3.5 Sonnet, etc.) can process the image to understand your screen context

History & Logging

t2t automatically logs all transcriptions and agent calls for review and debugging.

Features

Transcription history: All voice transcriptions are saved with timestamps
Agent call logging: Complete request/response logs for all OpenRouter API calls
Screenshot thumbnails: Tiny thumbnails (150x150px) of screenshots captured with all agent calls
Search: Fast local search across all history entries
Expandable details: Click any entry to view full request/response JSON and tool calls

Accessing History

Menu bar icon → View Settings → History tab

Configuration

History limit: Set T2T_HISTORY_LIMIT environment variable (default: 1000 entries)
Storage: History is stored locally in history.json via Tauri's store plugin
Privacy: All data stays on your machine - nothing is sent to external services

What's Logged

Transcriptions:

Timestamp
Transcribed text

Agent Calls:

Timestamp
Transcript (your voice input)
Model used
Full request JSON (messages, parameters)
Full response JSON (AI output, tool calls)
Tool calls executed (if any)
Screenshot thumbnail (captured automatically with each agent call)
Success/error status

First Run

On first launch, the app automatically downloads the Whisper model (~~150MB) to `~~/.cache/whisper/ggml-base.en.bin`. This happens in the background.

For Developers

Setup

# Install dependencies (in desktop/)
cd desktop && bun install

# Development
bun dev              # From root, or:
cd desktop && bun tauri dev

# Build
bun build            # From root, or:
cd desktop && bun tauri build

Requirements

Rust (install via rustup)
Bun (recommended) or Node.js 18+

Tech Stack

Frontend: Svelte 5 + SvelteKit
Backend: Rust + Tauri
STT: whisper-rs (local Whisper.cpp model)
AI: OpenRouter API (direct calls, no infrastructure needed)
MCP: Model Context Protocol client (local stdio/HTTP/SSE)
Hotkey: macOS event monitoring (Fn key) + fallbacks
Audio capture: native (Rust via cpal)

Architecture: Fully local. Only OpenRouter API calls go out. No servers, workers, or infrastructure required.

Debugging

Logs: ~/Library/Logs/t2t.log
Model location: ~/.cache/whisper/ggml-base.en.bin
History storage: history.json (via Tauri store, location depends on Tauri config)

License

MIT

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing