AI-Powered Transcription and Media Analysis Platform
OpenTranscribe is a powerful, containerized web application for transcribing and analyzing audio/video files using state-of-the-art AI models. Built with modern technologies and designed for scalability, it provides an end-to-end solution for speech-to-text conversion, speaker identification, and content analysis.
Note: This application is 99.9% created by AI using Windsurf and various commercial LLMs, demonstrating the power of AI-assisted development.
# Required
- Docker and Docker Compose
- 8GB+ RAM (16GB+ recommended)
# Recommended for optimal performance
- NVIDIA GPU with CUDA support
Run this one-liner to download and set up OpenTranscribe using our pre-built Docker Hub images:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
Then follow the on-screen instructions. The setup script will:
opentranscribe.sh
)Note: The script will prompt you for your HuggingFace token during setup. If you provide it, AI models will be downloaded and cached before Docker starts, ensuring the app is ready to use immediately. If you skip this step, models will download on first use (10-30 minute delay).
Once setup is complete, start OpenTranscribe with:
cd opentranscribe
./opentranscribe.sh start
The Docker images are available on Docker Hub as separate repositories:
davidamacey/opentranscribe-backend
: Backend service (also used for celery-worker and flower)davidamacey/opentranscribe-frontend
: Frontend serviceAccess the web interface at http://localhost:5173
Clone the Repository
git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
# Make utility script executable
chmod +x opentr.sh
Environment Configuration
# Copy environment template
cp .env.example .env
# Edit .env file with your settings (optional for development)
# Key variables:
# - HUGGINGFACE_TOKEN (required for speaker diarization)
# - GPU settings for optimal performance
Start OpenTranscribe
# Start in development mode (with hot reload)
./opentr.sh start dev
# Or start in production mode
./opentr.sh start prod
Access the Application
The opentr.sh
script provides comprehensive management for all application operations:
# Start the application
./opentr.sh start [dev|prod] # Start in development or production mode
./opentr.sh stop # Stop all services
./opentr.sh status # Show container status
./opentr.sh logs [service] # View logs (all or specific service)
# Service management
./opentr.sh restart-backend # Restart API and workers without database reset
./opentr.sh restart-frontend # Restart frontend only
./opentr.sh restart-all # Restart all services without data loss
# Container rebuilding (after code changes)
./opentr.sh rebuild-backend # Rebuild backend with new code
./opentr.sh rebuild-frontend # Rebuild frontend with new code
./opentr.sh build # Rebuild all containers
# Data operations (β οΈ DESTRUCTIVE)
./opentr.sh reset [dev|prod] # Complete reset - deletes ALL data!
./opentr.sh init-db # Initialize database without container reset
# Backup and restore
./opentr.sh backup # Create timestamped database backup
./opentr.sh restore [file] # Restore from backup file
# Maintenance
./opentr.sh clean # Remove unused containers and images
./opentr.sh health # Check service health status
./opentr.sh shell [service] # Open shell in container
# Available services: backend, frontend, postgres, redis, minio, opensearch, celery-worker
# View specific service logs
./opentr.sh logs backend # API server logs
./opentr.sh logs celery-worker # AI processing logs
./opentr.sh logs frontend # Frontend development logs
./opentr.sh logs postgres # Database logs
# Follow logs in real-time
./opentr.sh logs backend -f
User Registration
Upload or Record Content
Monitor Processing
Explore Your Content
Configure AI Features (Optional)
ποΈ Device Selection β π Level Monitoring β βΈοΈ Session Control β β¬οΈ Background Upload
π€ LLM Configuration β π Custom Prompts β π Content Analysis β π BLUF Summaries
π₯ Automatic Detection β π€ AI Recognition β π·οΈ Profile Management β π Cross-Media Tracking
β¬οΈ Concurrent Uploads β π Progress Tracking β π Retry Logic β π Queue Management
π Keyword Search β π§ Semantic Search β π·οΈ Smart Filtering β π― Waveform Navigation
π Create Collections β π Organize Files β π·οΈ Bulk Operations β π― Inline Editing
π Progress Updates β π Status Tracking β π WebSocket Integration β β
Completion Alerts
π Multiple Formats β πΊ Subtitle Files β π API Access β π¬ Media Downloads
OpenTranscribe/
βββ π backend/ # Python FastAPI backend
β βββ π app/ # Application modules
β β βββ π api/ # REST API endpoints
β β βββ π models/ # Database models
β β βββ π services/ # Business logic
β β βββ π tasks/ # Background AI processing
β β βββ π utils/ # Common utilities
β β βββ π db/ # Database configuration
β βββ π scripts/ # Admin and maintenance scripts
β βββ π tests/ # Comprehensive test suite
β βββ π README.md # Backend documentation
βββ π frontend/ # Svelte frontend application
β βββ π src/ # Source code
β β βββ π components/ # Reusable UI components
β β βββ π routes/ # Page components
β β βββ π stores/ # State management
β β βββ π styles/ # CSS and themes
β βββ π README.md # Frontend documentation
βββ π database/ # Database initialization
βββ π models_ai/ # AI model storage (runtime)
βββ π scripts/ # Utility scripts
βββ π docker-compose.yml # Container orchestration
βββ π opentr.sh # Main utility script
βββ π README.md # This file
# Database
DATABASE_URL=postgresql://postgres:password@postgres:5432/opentranscribe
# Security
SECRET_KEY=your-super-secret-key-here
JWT_SECRET_KEY=your-jwt-secret-key
# Object Storage
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin
MINIO_BUCKET_NAME=transcribe-app
# Required for speaker diarization - see setup instructions below
HUGGINGFACE_TOKEN=your_huggingface_token_here
# Model configuration
WHISPER_MODEL=large-v2 # large-v2, medium, small, base
COMPUTE_TYPE=float16 # float16, int8
BATCH_SIZE=16 # Reduce if GPU memory limited
# Speaker detection
MIN_SPEAKERS=1 # Minimum speakers to detect
MAX_SPEAKERS=10 # Maximum speakers to detect
# Model caching (recommended)
MODEL_CACHE_DIR=./models # Directory to store downloaded AI models
OpenTranscribe offers flexible AI deployment options. Choose the approach that best fits your infrastructure:
π§ Quick Setup Options:
Cloud-Only (Recommended for Most Users)
# Configure for OpenAI in .env
LLM_PROVIDER=openai
OPENAI_API_KEY=your_openai_key
OPENAI_MODEL_NAME=gpt-4o-mini
# Start without local LLM
./opentr.sh start dev
Local vLLM (High-Performance GPUs)
# Configure for vLLM in .env
LLM_PROVIDER=vllm
VLLM_MODEL_NAME=gpt-oss-20b
# Start with vLLM service (requires 16GB+ VRAM)
docker compose -f docker-compose.yml -f docker-compose.vllm.yml up
Local Ollama (Consumer GPUs)
# Configure for Ollama in .env
LLM_PROVIDER=ollama
OLLAMA_MODEL_NAME=llama3.2:3b-instruct-q4_K_M
# Edit docker-compose.vllm.yml and uncomment ollama service
# Then start with both compose files
docker compose -f docker-compose.yml -f docker-compose.vllm.yml up
π Complete Provider Configuration:
# Cloud Providers (configure in .env)
LLM_PROVIDER=openai # openai, anthropic, custom (openrouter)
OPENAI_API_KEY=your_openai_key # OpenAI GPT models
ANTHROPIC_API_KEY=your_claude_key # Anthropic Claude models
OPENROUTER_API_KEY=your_or_key # OpenRouter (multi-provider)
# Local Providers (requires additional Docker services)
LLM_PROVIDER=vllm # Local vLLM server
LLM_PROVIDER=ollama # Local Ollama server
π― Deployment Scenarios:
LLM_PROVIDER
empty for transcription-only modeSee LLM_DEPLOYMENT_OPTIONS.md for detailed setup instructions.
OpenTranscribe automatically downloads and caches AI models for optimal performance. Models are saved locally and reused across container restarts.
Default Setup:
./models/
directory in your project folderDirectory Structure:
./models/
βββ huggingface/ # PyAnnote + WhisperX models
β βββ hub/ # WhisperX transcription models (~1.5GB)
β βββ transformers/ # PyAnnote transformer models
βββ torch/ # PyTorch cache
βββ hub/checkpoints/ # Wav2Vec2 alignment model (~360MB)
βββ pyannote/ # PyAnnote diarization models (~500MB)
Custom Cache Location:
# Set custom directory in your .env file
MODEL_CACHE_DIR=/path/to/your/models
# Examples:
MODEL_CACHE_DIR=~/ai-models # Home directory
MODEL_CACHE_DIR=/mnt/storage/models # Network storage
MODEL_CACHE_DIR=./cache # Project subdirectory
Storage Requirements:
OpenTranscribe requires a HuggingFace token for speaker diarization and voice fingerprinting features. Follow these steps:
You must accept the user agreements for these models:
Add your token to the environment configuration:
For Production Installation:
# The setup script will prompt you for your token
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
For Manual Installation:
# Add to .env file
echo "HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" >> .env
Note: Without a valid HuggingFace token, speaker diarization will be disabled and speakers will not be automatically detected or identified across different media files.
# GPU settings
USE_GPU=true # Enable GPU acceleration
CUDA_VISIBLE_DEVICES=0 # GPU device selection
# Resource limits
MAX_UPLOAD_SIZE=4GB # Maximum file size (supports GoPro videos)
CELERY_WORKER_CONCURRENCY=2 # Concurrent tasks
For production use, ensure you:
Security Configuration
# Generate strong secrets
openssl rand -hex 32 # For SECRET_KEY
openssl rand -hex 32 # For JWT_SECRET_KEY
# Set strong database passwords
# Configure proper firewall rules
# Set up SSL/TLS certificates
Performance Optimization
# Use production environment
NODE_ENV=production
# Configure resource limits
# Set up monitoring and logging
# Configure backup strategies
Reverse Proxy Setup
# Example NGINX configuration
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /api {
proxy_pass http://localhost:5174;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
# Start development with hot reload
./opentr.sh start dev
# Backend development
cd backend/
pip install -r requirements.txt
pytest tests/ # Run tests
black app/ # Format code
flake8 app/ # Lint code
# Frontend development
cd frontend/
npm install
npm run dev # Development server
npm run test # Run tests
npm run lint # Lint code
# Backend tests
./opentr.sh shell backend
pytest tests/ # All tests
pytest tests/api/ # API tests only
pytest --cov=app tests/ # With coverage
# Frontend tests
cd frontend/
npm run test # Unit tests
npm run test:e2e # End-to-end tests
npm run test:components # Component tests
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
# Check GPU availability
nvidia-smi
# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
# Set CPU-only mode if needed
echo "USE_GPU=false" >> .env
Symptoms:
Permission denied: '/home/appuser/.cache/huggingface/hub'
Permission denied: '/home/appuser/.cache/yt-dlp'
Cause: Docker creates model cache directories with root ownership, but containers run as non-root user (UID 1000) for security.
Solution:
# Option 1: Run the automated permission fix script (recommended)
cd opentranscribe # Or your installation directory
./scripts/fix-model-permissions.sh
# Option 2: Manual fix using Docker
docker run --rm -v ./models:/models busybox chown -R 1000:1000 /models
# Option 3: Manual fix using sudo (if available)
sudo chown -R 1000:1000 ./models
sudo chmod -R 755 ./models
Prevention for New Installations:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
Why This Happens:
Verification:
# Check directory ownership (should show UID 1000 or your user)
ls -la models/
# Test write permissions
touch models/huggingface/test.txt && rm models/huggingface/test.txt
# Reduce model size
echo "WHISPER_MODEL=medium" >> .env
echo "BATCH_SIZE=8" >> .env
echo "COMPUTE_TYPE=int8" >> .env
# Monitor memory usage
docker stats
USE_GPU=true
)WHISPER_MODEL=medium
)# Reset database
./opentr.sh reset dev
# Check database logs
./opentr.sh logs postgres
# Verify database is running
./opentr.sh shell postgres psql -U postgres -l
# Check service status
./opentr.sh status
# Clean up resources
./opentr.sh clean
# Full reset (β οΈ deletes all data)
./opentr.sh reset dev
# GPU optimization
COMPUTE_TYPE=float16 # Use half precision
BATCH_SIZE=32 # Increase for more GPU memory
WHISPER_MODEL=large-v2 # Best accuracy
# CPU optimization (if no GPU)
COMPUTE_TYPE=int8 # Use quantization
BATCH_SIZE=1 # Reduce memory usage
WHISPER_MODEL=base # Faster processing
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ using AI assistance and modern open-source technologies.
OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.