This project is an automatic summarization tool that transcribes and summarizes video/audio recordings using whisper.cpp. The summary generation can be performed by an arbitrary OpenAI instance (e.g. self-hosted). It processes media files using ffmpeg
, transcribes them with whisper
, and summarizes them with the configured OpenAI instance. The frontend interface is built using SvelteKit and communicates with the backend via gRPC
.
Here is a preview of the current working state:
Setting up an OpenAI instance can be skipped, I configured an Ollama instance with deepseek-r1:1.5b
and deepseek-r1:14b
models on my Oracle VM at https://engelbert.ip-ddns.com. This is already configured in the compose.yml
. One piece of configuration is still necessary:
.env
file in the root of the repositoryOPENAI_TOKEN
with a valid token for the configured instanceAlternatives to the preconfigured instance are:
OPENAI_ENDPOINT
variable to the host that runs your instance.The only important thing here is that the OpenAI endpoint hosts an API that conforms to the OpenAI API reference.
Once that is configured, you just need to execute the following command in the root directory of the repository:
docker compose up -d
This automatically downloads all required dependencies and starts the worker and frontend. The frontend is then reachable at http://localhost:8080
ffmpeg
.whisper.cpp
transcribes the audio.Ensure you have the following dependencies installed before building and running the project:
Download a Whisper model before running the project. Models are available at Whisper.cpp models. Example:
cd services/worker/models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin
Note that this must not be done when the project is started via docker compose.
Clone the repository and navigate to the project directory:
git clone https://github.com/multimedia-workforce/summary-gen.git
cd summary-gen
cd services/worker
cmake --preset=<os>-64-release
cmake --preset=<os>-64-release --build
./build/<os>-64-release/worker models/ggml-tiny.bin 50051
Where <os>
is one of mac
, win
, lin
. The worker will now listen for grpc messages at localhost:50051
cd frontend
npm install
npm run dev
The frontend should now be available at http://localhost:5173.