svelte-openai-whisper-speech-recognition-api Svelte Themes

Svelte Openai Whisper Speech Recognition Api

An app exploring two different methods of speech recognition/transcription.

Svelte Voice Notes Transcription

A modern web application built with SvelteKit that demonstrates two different approaches to speech-to-text transcription: browser-native Speech Recognition API and OpenAI's Whisper API. This app allows users to record, transcribe, and manage voice notes using either transcription method.

Demo

Watch the demo video

Features

  • Dual transcription methods:
    • Browser's native Speech Recognition API for real-time transcription
    • OpenAI's Whisper API for high-accuracy transcription
  • Voice recording controls (start, stop, pause, resume)
  • Note management system (create, save, load, delete)
  • Real-time transcription display
  • Persistent storage of notes using localStorage

Architecture

Core Components

  1. Speech Handlers:

    • SpeechHandler: Manages browser-native speech recognition
    • SpeechHandlerOpenAi: Handles Whisper API integration
  2. State:

    • VoiceNotesHandler: Manages note storage and retrieval
    • Svelte context API for state sharing
  3. UI Components:

    • Recorder: Controls for voice recording
    • CreateDialog: Note creation interface
    • LoadNoteDialog: Note loading interface

Implementation Details

Browser Speech Recognition

// Uses the Web Speech API
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
this.recognition = new SpeechRecognition();
this.recognition.continuous = true;
this.recognition.interimResults = true;

OpenAI Whisper Integration

// Handles audio chunks and sends to Whisper API
async transcribeAudio(audioBlob: Blob): Promise<string> {
    const file = new File([audioBlob], 'recording.webm', { type: MIME_TYPE });
    const formData = new FormData();
    formData.append('file', file);

    const response = await fetch('/api/transcribe', {
        method: 'POST',
        body: formData
    });
}

Key Features Implementation

Recording Controls

The app provides a comprehensive set of recording controls:

  • Start/Stop recording
  • Pause/Resume recording
  • Real-time transcription display
  • Error handling and user feedback

Note Management

Notes are managed through the VoiceNotesHandler class:

  • Create new notes with titles and transcriptions
  • Update existing notes
  • Delete notes
  • Load and display saved notes
  • Persistent storage using localStorage

Usage

  1. Starting a Recording:

    • Click the "Start Recording" button
    • Grant microphone permissions when prompted
    • Speak into your microphone
  2. Managing Recordings:

    • Use the pause/resume button to temporarily stop recording
    • Click "Stop Recording" to finish
    • Save the transcription as a note
  3. Managing Notes:

    • Create new notes with the "+" button
    • Load existing notes using the load dialog
    • Edit transcriptions directly in the textarea
    • Copy transcriptions to clipboard
    • Delete unwanted notes

Setup

  1. Clone the repository
  2. Install dependencies:
    pnpm install
    
  3. Set up environment variables:
    OPENAI_API_KEY=your_api_key_here
    
  4. Start the development server:
    pnpm dev
    

Configuration

The app includes configurable parameters:

  • MIN_CHUNK_SIZE: Minimum size for audio chunks
  • DEFAULT_INTERVAL: Default recording interval
  • DEFAULT_CONFIDENCE: Default confidence threshold for transcription

Top categories

Loading Svelte Themes