t2t Svelte Themes

T2t

Voice-to-text with MCP support. System-wide dictation (hold fn) and AI agent mode (hold fn+ctrl) that connects to any MCP server. Cross-platform desktop app with local Whisper transcription.

t2t

Voice-to-text with intelligence. Hold fn to talk, hold fn+ctrl to command.

Download

Download for macOS →

View all releases on GitHub →

Note: The app is not code-signed yet. On first launch, macOS may show a security warning. To open it:

  • Right-click the app → Open, then click Open in the dialog
  • Or run: xattr -cr /Applications/t2t.app in Terminal

Heads up: This is an unsigned build while we polish things up. Each time you update to a new version, you'll need to remove t2t from System Settings → Privacy & Security → Accessibility (and Microphone if needed), then re-add it. We'll get it properly signed soon!

How It Works

  • Hold Fn key → records microphone audio
  • Release Fn key → transcribes using local Whisper model
  • Typing mode (red bar): Hold Fn alone → pastes transcription into focused text field, preserves clipboard
  • Agent mode (cyan bar): Hold Fn+Ctrl → speaks commands to AI agent
    • MCP mode (if configured): Connects to MCP servers, uses their tools via OpenRouter AI
    • AppleScript mode (fallback): Generates and executes AppleScript for macOS automation
  • Visual feedback: red/cyan bar while recording (based on mode), amber while processing

Requirements

  • macOS (currently macOS only; tested on Apple Silicon)
  • Accessibility permission - Required for Fn key detection and focusing the correct field before paste
  • Microphone permission - Required for audio recording
  • OpenRouter API key (for agent mode) - Get one at openrouter.ai

The app will prompt you if permissions are missing.

Getting Started

  1. Download and install the app from t2t.now
  2. Grant permissions when prompted (Accessibility and Microphone)
  3. Get an OpenRouter API key at openrouter.ai (required for agent mode)
  4. Open settings: Click the menu bar icon → View Settings
  5. Configure agent mode (optional):
    • Add your OpenRouter API key in settings
    • Optionally configure MCP servers for extended automation

Settings & Analytics

The settings window (Menu bar icon → View Settings) includes three tabs:

Analytics Tab

View your transcription usage statistics:

  • Total Words: Lifetime count of all transcribed words
  • Lifetime Average: Average words per minute across all sessions
  • Session Average: Average words per minute for current session
  • Sessions: Total number of transcription sessions
  • Hours Active: Total time spent transcribing
  • Recent Activity: 48-hour hourly activity chart

Settings Tab

Configure your t2t installation:

  • Theme: Toggle between light and dark mode
  • OpenRouter API Key: Set your API key for agent mode
  • AI Model Selection: Choose which model to use for agent mode
    • Supports all OpenRouter models
    • Auto-refresh available to fetch latest models
  • MCP Servers: Add, configure, and manage MCP servers
    • Test connections and view available tools
    • Enable/disable servers individually
    • Supports stdio, HTTP, and SSE transports

History Tab

See History & Logging section below.

MCP (Model Context Protocol) Support

When MCP servers are configured in settings, agent mode uses MCP instead of AppleScript. This enables:

  • Extensible automation: Connect to any MCP-compatible service (databases, APIs, file systems, etc.)
  • Tool-based execution: AI agent uses tools provided by your MCP servers
  • Multiple servers: Connect to multiple MCP servers simultaneously
  • Transport options: Supports stdio, HTTP, and SSE transports

To configure: Menu bar icon → View Settings → Settings tab → MCP Servers section. Requires an OpenRouter API key.

Vision Support & Automatic Screenshots

t2t automatically captures and includes a screenshot with every agent call, enabling vision-capable models to "see" your screen context. This works seamlessly with any model - vision-capable models process the image, while text-only models simply ignore it.

How It Works

  • Automatic capture: When you use agent mode (Fn+Ctrl), a screenshot is captured before sending your prompt
  • Universal support: Screenshots are included with all agent calls, regardless of model selection
  • Smart routing: OpenRouter automatically routes to vision-capable models when available, or ignores the image for text-only models
  • Seamless integration: Screenshots are included in the API request without any additional UI or user action
  • Privacy: Screenshots are only sent to the API (not stored locally), and thumbnails are visible in History

Privacy & Permissions

  • Screen Recording permission: macOS may prompt for screen recording permission the first time you use agent mode
  • No local storage: Full screenshots are not saved to disk - they're only sent to the API
  • Thumbnails: Small thumbnails (150x150px) are stored locally in History for reference
  • Error handling: If screenshot capture fails (e.g., permission denied), the agent falls back to text-only mode

Technical Details

  • Screenshots are captured using macOS screencapture command
  • Images are encoded as base64 PNG and included in the OpenAI-compatible message format
  • The screenshot is included in both initial requests and follow-up requests after tool execution
  • Vision-capable models (GPT-4 Vision, Claude 3.5 Sonnet, etc.) can process the image to understand your screen context

History & Logging

t2t automatically logs all transcriptions and agent calls for review and debugging.

Features

  • Transcription history: All voice transcriptions are saved with timestamps
  • Agent call logging: Complete request/response logs for all OpenRouter API calls
  • Screenshot thumbnails: Tiny thumbnails (150x150px) of screenshots captured with all agent calls
  • Search: Fast local search across all history entries
  • Expandable details: Click any entry to view full request/response JSON and tool calls

Accessing History

Menu bar icon → View SettingsHistory tab

Configuration

  • History limit: Set T2T_HISTORY_LIMIT environment variable (default: 1000 entries)
  • Storage: History is stored locally in history.json via Tauri's store plugin
  • Privacy: All data stays on your machine - nothing is sent to external services

What's Logged

Transcriptions:

  • Timestamp
  • Transcribed text

Agent Calls:

  • Timestamp
  • Transcript (your voice input)
  • Model used
  • Full request JSON (messages, parameters)
  • Full response JSON (AI output, tool calls)
  • Tool calls executed (if any)
  • Screenshot thumbnail (captured automatically with each agent call)
  • Success/error status

First Run

On first launch, the app automatically downloads the Whisper model (150MB) to `/.cache/whisper/ggml-base.en.bin`. This happens in the background.

For Developers

Setup

# Install dependencies (in desktop/)
cd desktop && bun install

# Development
bun dev              # From root, or:
cd desktop && bun tauri dev

# Build
bun build            # From root, or:
cd desktop && bun tauri build

Requirements

  • Rust (install via rustup)
  • Bun (recommended) or Node.js 18+

Tech Stack

  • Frontend: Svelte 5 + SvelteKit
  • Backend: Rust + Tauri
  • STT: whisper-rs (local Whisper.cpp model)
  • AI: OpenRouter API (direct calls, no infrastructure needed)
  • MCP: Model Context Protocol client (local stdio/HTTP/SSE)
  • Hotkey: macOS event monitoring (Fn key) + fallbacks
  • Audio capture: native (Rust via cpal)

Architecture: Fully local. Only OpenRouter API calls go out. No servers, workers, or infrastructure required.

Debugging

  • Logs: ~/Library/Logs/t2t.log
  • Model location: ~/.cache/whisper/ggml-base.en.bin
  • History storage: history.json (via Tauri store, location depends on Tauri config)

License

MIT

Top categories

Loading Svelte Themes