A modern web application built with SvelteKit that demonstrates two different approaches to speech-to-text transcription: browser-native Speech Recognition API and OpenAI's Whisper API. This app allows users to record, transcribe, and manage voice notes using either transcription method.
Speech Handlers:
SpeechHandler
: Manages browser-native speech recognitionSpeechHandlerOpenAi
: Handles Whisper API integrationState:
VoiceNotesHandler
: Manages note storage and retrievalUI Components:
Recorder
: Controls for voice recordingCreateDialog
: Note creation interfaceLoadNoteDialog
: Note loading interface// Uses the Web Speech API
const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
this.recognition = new SpeechRecognition();
this.recognition.continuous = true;
this.recognition.interimResults = true;
// Handles audio chunks and sends to Whisper API
async transcribeAudio(audioBlob: Blob): Promise<string> {
const file = new File([audioBlob], 'recording.webm', { type: MIME_TYPE });
const formData = new FormData();
formData.append('file', file);
const response = await fetch('/api/transcribe', {
method: 'POST',
body: formData
});
}
The app provides a comprehensive set of recording controls:
Notes are managed through the VoiceNotesHandler
class:
Starting a Recording:
Managing Recordings:
Managing Notes:
pnpm install
OPENAI_API_KEY=your_api_key_here
pnpm dev
The app includes configurable parameters:
MIN_CHUNK_SIZE
: Minimum size for audio chunksDEFAULT_INTERVAL
: Default recording intervalDEFAULT_CONFIDENCE
: Default confidence threshold for transcription