This project demonstrates an innovative architecture for deploying lightweight, optimized large language models (LLMs) in web applications. By leveraging the WebGPU web API, the solution enables efficient inference on client-side devices, promoting energy efficiency and sustainability in conversational AI. This demo is created for ICCRET 2025 paper submission.
To get started with this project, clone the repository and install the necessary dependencies:
git clone https://github.com/SouZe-San/webllm-bot.git
cd webllm-bot
bun install
Run the application locally:
bun run dev
Open your browser and navigate to http://localhost:5173
to access the demo customer support site and the chatbot.
We would like to thank the teams behind the WebLLM JavaScript Library, which facilitated the integration of pre-compiled models and loading to WebGPU.