Smart Session Summary

This project is an automatic summarization tool that transcribes and summarizes video/audio recordings using whisper.cpp. The summary generation can be performed by an arbitrary OpenAI instance (e.g. self-hosted). It processes media files using ffmpeg, transcribes them with whisper, and summarizes them with the configured OpenAI instance. The frontend interface is built using SvelteKit and communicates with the backend via gRPC.

Features

  • Transcribe audio and video files into text
  • Generate meeting summaries from transcripts
  • gRPC-based communication between the backend and frontend
  • SvelteKit-based web interface for user interaction
  • Persistence
  • Analytics backend
  • Analytics frontend

Here is a preview of the current working state:

Transcription

History

Analytics

Installation

Setting up an OpenAI instance can be skipped, I configured an Ollama instance with deepseek-r1:1.5b and deepseek-r1:14b models on my Oracle VM at https://engelbert.ip-ddns.com. This is already configured in the compose.yml. One piece of configuration is still necessary:

  1. Create an .env file in the root of the repository
  2. Add a variable OPENAI_TOKEN with a valid token for the configured instance

Alternatives to the preconfigured instance are:

  • Using the official OpenAI or the DeepSeek API. Note that this will probably result in costs for you, as these official APIs charge a price per query/tokens.
  • Using a self-hosted OpenAI instance that runs somewhere else. In this case you just need to set the OPENAI_ENDPOINT variable to the host that runs your instance.

The only important thing here is that the OpenAI endpoint hosts an API that conforms to the OpenAI API reference.

Once that is configured, you just need to execute the following command in the root directory of the repository:

docker compose up -d

This automatically downloads all required dependencies and starts the worker and frontend. The frontend is then reachable at http://localhost:8080

Usage

  1. Upload an audio or video file via the web interface.
  2. The worker processes the media file using ffmpeg.
  3. whisper.cpp transcribes the audio.
  4. The transcription is streamed to the web interface.
  5. Start a summary for the current transcription.
  6. View your previous transcripts and summaries as smart sessions

Development Prerequisites

Ensure you have the following dependencies installed before building and running the project:

Dependencies

Additional Requirements

Download a Whisper model before running the project. Models are available at Whisper.cpp models. Example:

cd services/worker/models
wget https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-tiny.bin

Note that this must not be done when the project is started via docker compose.

Installation

Clone the repository and navigate to the project directory:

git clone https://github.com/multimedia-workforce/summary-gen.git
cd summary-gen

Running the Worker

cd services/worker
cmake --preset=<os>-64-release
cmake --preset=<os>-64-release --build
./build/<os>-64-release/worker models/ggml-tiny.bin 50051

Where <os> is one of mac, win, lin. The worker will now listen for grpc messages at localhost:50051

Setting Up the Frontend

cd frontend
npm install
npm run dev

The frontend should now be available at http://localhost:5173.

Top categories

Loading Svelte Themes