A local web application for transcribing and analyzing audio files using local AI models (FasterWhisper, Pyannote, Ollama). Focuses on privacy and performance, with optimizations for Apple Silicon.
OS: macOS (Apple Silicon recommended) or Linux. Windows might require adjustments.
Python: Python 3.11 or 3.12 strongly recommended. (Version 3.13 currently has known compatibility issues with dependencies like pydub
).
Node.js: Required for the frontend (includes npm
). Version 18+ recommended.
Ollama: Installed and running (ollama.com). Ensure required models are pulled (see Installation step 7).
Hugging Face Account & Token:
.env
).System Dependencies:
brew install ffmpeg cmake pkg-config protobuf
sudo apt update && sudo apt install -y ffmpeg cmake pkg-config libprotobuf-dev protobuf-compiler
Python Packages: Listed in requirements.txt
.
git clone [https://github.com/GPTSam/TranscriberApp.git](https://github.com/GPTSam/TranscriberApp.git)
cd TranscriberApp
# Replace python3.11 with your specific command if needed
python3.11 -m venv venv
source venv/bin/activate
# (Windows: venv\Scripts\activate)
(Your terminal prompt should now start with (venv)
)brew install ffmpeg cmake pkg-config protobuf
)# Use the correct Python interpreter for make if needed
make PYTHON_INTERPRETER=python3.11 install
# Or directly with pip:
# pip install --upgrade pip
# pip install -r requirements.txt
pyannote/speaker-diarization-3.1
)..env
file: cp .env.example .env
.env
and add your token: HUGGING_FACE_TOKEN=hf_YOUR_ACTUAL_TOKEN_HERE
config.yaml
(or the defaults if you haven't edited it yet). At minimum for testing:ollama pull llama3:8b
ollama pull mistral:7b # Often used for name detection/summary fallback
# Pull others listed in config.yaml -> llm_models as needed
config_schema.yaml
exists and has content.config.yaml
if it doesn't exist:# Run from project root, with venv active
python -m src.utils.generate_config_from_schema
config.yaml
: Open config.yaml
and set input_audio
to a valid relative path for an audio file in the audio/
directory (e.g., audio/sample.mp3
). Review other settings like whisper_model
and llm_models
for your local setup.Important: Both the Backend and Frontend servers need to be running simultaneously for the Web UI.
1. Run the Backend (Flask API Server):
TranscriberApp
).source venv/bin/activate
# Replace python3.11 if you used a different version for venv
make PYTHON_INTERPRETER=python3.11 run-web
# Or directly: python app.py
http://localhost:5001
.2. Run the Frontend (Svelte Dev Server):
frontend
directory: cd frontend
npm install
npm run dev
http://localhost:5173
.3. Use the Web Interface:
http://localhost:5173
).WAITING_FOR_REVIEW
, the Review Dialog will appear.COMPLETED
.4. Command Line Interface (Testing/Scripting):
# Use config.yaml settings (ensure input_audio is set)
make PYTHON_INTERPRETER=python3.11 run-cli
# Override input and run in advanced mode
make PYTHON_INTERPRETER=python3.11 run-cli ARGS="--input-audio audio/my_audio.m4a --mode advanced"
logs/
: Daily log files (YYYY-MM-DD.log
).transcripts/intermediate_transcript.json
: Raw diarized transcript.transcripts/intermediate_proposed_map.json
: (Optional) LLM proposed map.transcripts/intermediate_context.json
: (Optional) Context for name detection.transcripts/final_transcript.json
: Transcript with final names.results/transcript.html
: Formatted HTML transcript.results/summary.txt
: 'Fast' mode output.results/advanced_analysis.json
: 'Advanced' mode output.llm_training_data.db
: SQLite DB with job results.SPEAKER_00
).Job results logged to llm_training_data.db
.
(Expanded based on our findings)
.env
token.lsof -i :5001
), change FLASK_PORT
.ModuleNotFoundError
: Ensure venv active, run make ... install
or pip install -r requirements.txt
.sentencepiece
): Ensure system dependencies (cmake
, pkg-config
, protobuf
) installed via package manager (e.g., Homebrew). Use Python 3.11/3.12, as 3.13 has issues (audioop
missing).ERR_CONNECTION_REFUSED
(Port 5173): Ensure frontend dev server (npm run dev
) is running in the frontend
directory.make ... run-web
) is running. Check backend terminal logs for tracebacks. Verify API proxy in vite.config.js
targets the correct backend port (5001).EACCES
Cache Errors: Run the sudo chown -R ...
command suggested by npm to fix cache permissions.int8
.MIT License - see LICENSE file.
Samuel Willems / Legendaddy / willems.samuel@gmail.com