# Multishot.ai
multishot.ai is a free AI/LLM chatbot
Go to multishot.ai in Chrome or WebGPU enabled browser to try it out.
Features
This project demonstrates how to create a real-time conversational AI from models hosted in your browser or commercially available.
It uses FastAPI to create a web server that accepts user inputs and streams generated responses back to the user in a Svelte UI app.
The app also supports the running of LLM's in the browser via webllm and it's completely private.
Have a look at the live version here, multishot.ai, although it needs Chrome or Edge that have WebGPU support.
Goals
- App is stateless ✅
- Unlike most AI apps with a Streamlit, Gradio, Flask, and Django UI, to name a couple. These frameworks are good for the desktop but cost prohibitive to run 24/7 in the cloud. This app is designed to be stateless and can be run on a serverless platform like Vercel or Netlify and AWS S3 for static hosting allowing the app to be run for free or at a very low cost.
- webllm ✅ - it's like Ollama but runs completely in the browser.
- webllm is a high-performance, in-browser language model inference engine that leverages WebGPU for hardware acceleration, enabling powerful LLM operations directly within web browsers without server-side processing. Thanks to the open-source efforts like LLaMA, Alpaca, Vicuna and Dolly, we start to see an exciting future of building our own open source language models and personal AI assistant.
- Move the toggle from
remote
to local
to find the local models and it will download one for you and you're good to start chatting.
- Llama8b or any webllm model
- Completely private and secure ✅ - no one can listen in on your conversations.
- Composability ✅
- Svelte ✅
- version 4 ✅
- Embedable and composable design ✅
- Serverless ready ✅
- CDN support ✅
- Responsive design for Mobile phones ✅
- Mobile first approach ✅
- Skeleton ✅
- TailWindCSS ✅
- Theme selector and persistence
- Contains UI animations ✅
- Python backend supporting FastAPI ✅
- LangChain ✅
- Frontend and backend support multiple models (agentic in nature). The app supports the following API's
- OpenAI ✅
- Anthropic ✅
- Ollama (local/remote) ✅
- Groq ✅
- Code highlighting ✅
- Copy code button ✅
- What's next
- add web-scraping
- document upload
- deletion of chats ✅
- workflows
- RAG with RAPTOR
- Stable Diffusion?
Installation and Usage
With the addition of running models locally with webllm in your browser,
you can run this app without the python
backend and skip that installation and steps by starting
at running the UI in the ./ui/
directory with npm run dev -- --port=3333
.
Select the local
toggle button in the UI to download and select the model to run prompts locally
without sending data over the web. It's a great way to keep your LLM chattery private and for the author to pay nothing for your use on the live demo at multishot.ai.
- Clone the repository
- If you want to run the app without a python backend stip the following steps and start at the bottom with
pnpm
or npm
.
- Install Python (Python 3.7+ is recommended).
- Create a virtual environment
python -m venv .venv
- Activate your virtual environment
source .venv/bin/activate
- Install necessary libraries. This project uses FastAPI, uvicorn, LangChain, among others.
- In case you haven't done so activate your virtual environment
source .venv/bin/activate
- In
server
directory run: pip install -r requirements.txt
.
- Add your OpenAI API key to the
./server/.env
and use example.env
as a template in the server
directory.
- Start the FastAPI server by running
uvicorn server.main:app --reload
- Start the UI with
pnpm
but you can use npm
if you prefer and have time.
cd ./ui/
pnpm install --save-dev vite
pnpm build
- if you want to build for production
pnpm exec vite --port=3333
or npm run dev -- --port=3333
- Your UI will run on
http://localhost:3333/
and your backend on http://127.0.0.1:8000/static/index.html
.