speech input and output (speech synthesis and speech recognition) using google cloud speech-to-text and whisper for text-to-speech
multiple languages (for speech input and output)
listening mode: automatically start and stop voice recording when speaking
streamed response
serverside or browserside api calls
Create a .env
file with the following content:
# .env
GOOGLE_API_KEY=your-google-api-key
OPENAI_API_KEY=your-openai-api-key
Where your google API key has the Speech-to-Text API enabled.
Install dependencies with npm install
and start a development server:
# install dependencies
npm install
# start dev server
npm run dev
Open the localhost url in your browser and start chatting.