Svelte Openai Whisper Speech Recognition Api

ichbtrv
1

An app exploring two different methods of speech recognition/transcription.

Svelte Voice Notes Transcription

A modern web application built with SvelteKit that demonstrates two different approaches to speech-to-text transcription: browser-native Speech Recognition API and OpenAI's Whisper API. This app allows users to record, transcribe, and manage voice notes using either transcription method.

Demo

Watch the demo video

Features

Dual transcription methods:
- Browser's native Speech Recognition API for real-time transcription
- OpenAI's Whisper API for high-accuracy transcription
Voice recording controls (start, stop, pause, resume)
Note management system (create, save, load, delete)
Real-time transcription display
Persistent storage of notes using localStorage

Architecture

Core Components

Speech Handlers:
- SpeechHandler: Manages browser-native speech recognition
- SpeechHandlerOpenAi: Handles Whisper API integration
State:
- VoiceNotesHandler: Manages note storage and retrieval
- Svelte context API for state sharing
UI Components:
- Recorder: Controls for voice recording
- CreateDialog: Note creation interface
- LoadNoteDialog: Note loading interface

Implementation Details

Browser Speech Recognition

// Uses the Web Speech API
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
this.recognition = new SpeechRecognition();
this.recognition.continuous = true;
this.recognition.interimResults = true;

OpenAI Whisper Integration

// Handles audio chunks and sends to Whisper API
async transcribeAudio(audioBlob: Blob): Promise<string> {
    const file = new File([audioBlob], 'recording.webm', { type: MIME_TYPE });
    const formData = new FormData();
    formData.append('file', file);

    const response = await fetch('/api/transcribe', {
        method: 'POST',
        body: formData
    });
}

Key Features Implementation

Recording Controls

The app provides a comprehensive set of recording controls:

Start/Stop recording
Pause/Resume recording
Real-time transcription display
Error handling and user feedback

Note Management

Notes are managed through the VoiceNotesHandler class:

Create new notes with titles and transcriptions
Update existing notes
Delete notes
Load and display saved notes
Persistent storage using localStorage

Usage

Starting a Recording:
- Click the "Start Recording" button
- Grant microphone permissions when prompted
- Speak into your microphone
Managing Recordings:
- Use the pause/resume button to temporarily stop recording
- Click "Stop Recording" to finish
- Save the transcription as a note
Managing Notes:
- Create new notes with the "+" button
- Load existing notes using the load dialog
- Edit transcriptions directly in the textarea
- Copy transcriptions to clipboard
- Delete unwanted notes

Setup

Clone the repository
Install dependencies:
```
pnpm install
```
Set up environment variables:
```
OPENAI_API_KEY=your_api_key_here
```
Start the development server:
```
pnpm dev
```

Configuration

The app includes configurable parameters:

MIN_CHUNK_SIZE: Minimum size for audio chunks
DEFAULT_INTERVAL: Default recording interval
DEFAULT_CONFIDENCE: Default confidence threshold for transcription

Top categories