AI-Powered Transcription and Media Analysis Platform
OpenTranscribe is a powerful, containerized web application for transcribing and analyzing audio/video files using state-of-the-art AI models. Built with modern technologies and designed for scalability, it provides an end-to-end solution for speech-to-text conversion, speaker identification, and content analysis.
Note: This application is 99.9% created by AI using Windsurf and various commercial LLMs, demonstrating the power of AI-assisted development.
# Required
- Docker and Docker Compose
- 8GB+ RAM (16GB+ recommended)
# Recommended for optimal performance
- NVIDIA GPU with CUDA support
Run this one-liner to download and set up OpenTranscribe using our pre-built Docker Hub images:
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
Then follow the on-screen instructions. The setup script will:
opentranscribe.sh
)Once setup is complete, start OpenTranscribe with:
cd opentranscribe
./opentranscribe.sh start
The Docker images are available on Docker Hub as separate repositories:
davidamacey/opentranscribe-backend
: Backend service (also used for celery-worker and flower)davidamacey/opentranscribe-frontend
: Frontend serviceAccess the web interface at http://localhost:5173
Clone the Repository
git clone https://github.com/davidamacey/OpenTranscribe.git
cd OpenTranscribe
# Make utility script executable
chmod +x opentr.sh
Environment Configuration
# Copy environment template
cp .env.example .env
# Edit .env file with your settings (optional for development)
# Key variables:
# - HUGGINGFACE_TOKEN (required for speaker diarization)
# - GPU settings for optimal performance
Start OpenTranscribe
# Start in development mode (with hot reload)
./opentr.sh start dev
# Or start in production mode
./opentr.sh start prod
Access the Application
The opentr.sh
script provides comprehensive management for all application operations:
# Start the application
./opentr.sh start [dev|prod] # Start in development or production mode
./opentr.sh stop # Stop all services
./opentr.sh status # Show container status
./opentr.sh logs [service] # View logs (all or specific service)
# Service management
./opentr.sh restart-backend # Restart API and workers without database reset
./opentr.sh restart-frontend # Restart frontend only
./opentr.sh restart-all # Restart all services without data loss
# Container rebuilding (after code changes)
./opentr.sh rebuild-backend # Rebuild backend with new code
./opentr.sh rebuild-frontend # Rebuild frontend with new code
./opentr.sh build # Rebuild all containers
# Data operations (โ ๏ธ DESTRUCTIVE)
./opentr.sh reset [dev|prod] # Complete reset - deletes ALL data!
./opentr.sh init-db # Initialize database without container reset
# Backup and restore
./opentr.sh backup # Create timestamped database backup
./opentr.sh restore [file] # Restore from backup file
# Maintenance
./opentr.sh clean # Remove unused containers and images
./opentr.sh health # Check service health status
./opentr.sh shell [service] # Open shell in container
# Available services: backend, frontend, postgres, redis, minio, opensearch, celery-worker
# View specific service logs
./opentr.sh logs backend # API server logs
./opentr.sh logs celery-worker # AI processing logs
./opentr.sh logs frontend # Frontend development logs
./opentr.sh logs postgres # Database logs
# Follow logs in real-time
./opentr.sh logs backend -f
User Registration
Upload Your First File
Monitor Processing
Explore Your Transcript
๐ฅ Automatic Detection โ ๐ค AI Recognition โ ๐ท๏ธ Profile Management โ ๐ Cross-Media Tracking
๐ Keyword Search โ ๐ง Semantic Search โ ๐ท๏ธ Smart Filtering
๐ Create Collections โ ๐ Organize Files โ ๐ท๏ธ Bulk Operations
๐ Multiple Formats โ ๐บ Subtitle Files โ ๐ API Access
OpenTranscribe/
โโโ ๐ backend/ # Python FastAPI backend
โ โโโ ๐ app/ # Application modules
โ โ โโโ ๐ api/ # REST API endpoints
โ โ โโโ ๐ models/ # Database models
โ โ โโโ ๐ services/ # Business logic
โ โ โโโ ๐ tasks/ # Background AI processing
โ โ โโโ ๐ utils/ # Common utilities
โ โ โโโ ๐ db/ # Database configuration
โ โโโ ๐ scripts/ # Admin and maintenance scripts
โ โโโ ๐ tests/ # Comprehensive test suite
โ โโโ ๐ README.md # Backend documentation
โโโ ๐ frontend/ # Svelte frontend application
โ โโโ ๐ src/ # Source code
โ โ โโโ ๐ components/ # Reusable UI components
โ โ โโโ ๐ routes/ # Page components
โ โ โโโ ๐ stores/ # State management
โ โ โโโ ๐ styles/ # CSS and themes
โ โโโ ๐ README.md # Frontend documentation
โโโ ๐ database/ # Database initialization
โโโ ๐ models_ai/ # AI model storage (runtime)
โโโ ๐ scripts/ # Utility scripts
โโโ ๐ docker-compose.yml # Container orchestration
โโโ ๐ opentr.sh # Main utility script
โโโ ๐ README.md # This file
# Database
DATABASE_URL=postgresql://postgres:password@postgres:5432/opentranscribe
# Security
SECRET_KEY=your-super-secret-key-here
JWT_SECRET_KEY=your-jwt-secret-key
# Object Storage
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin
MINIO_BUCKET_NAME=transcribe-app
# Required for speaker diarization - see setup instructions below
HUGGINGFACE_TOKEN=your_huggingface_token_here
# Model configuration
WHISPER_MODEL=large-v2 # large-v2, medium, small, base
COMPUTE_TYPE=float16 # float16, int8
BATCH_SIZE=16 # Reduce if GPU memory limited
# Speaker detection
MIN_SPEAKERS=1 # Minimum speakers to detect
MAX_SPEAKERS=10 # Maximum speakers to detect
OpenTranscribe requires a HuggingFace token for speaker diarization and voice fingerprinting features. Follow these steps:
You must accept the user agreements for these models:
Add your token to the environment configuration:
For Production Installation:
# The setup script will prompt you for your token
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash
For Manual Installation:
# Add to .env file
echo "HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" >> .env
Note: Without a valid HuggingFace token, speaker diarization will be disabled and speakers will not be automatically detected or identified across different media files.
# GPU settings
USE_GPU=true # Enable GPU acceleration
CUDA_VISIBLE_DEVICES=0 # GPU device selection
# Resource limits
MAX_UPLOAD_SIZE=4GB # Maximum file size (supports GoPro videos)
CELERY_WORKER_CONCURRENCY=2 # Concurrent tasks
For production use, ensure you:
Security Configuration
# Generate strong secrets
openssl rand -hex 32 # For SECRET_KEY
openssl rand -hex 32 # For JWT_SECRET_KEY
# Set strong database passwords
# Configure proper firewall rules
# Set up SSL/TLS certificates
Performance Optimization
# Use production environment
NODE_ENV=production
# Configure resource limits
# Set up monitoring and logging
# Configure backup strategies
Reverse Proxy Setup
# Example NGINX configuration
server {
listen 80;
server_name your-domain.com;
location / {
proxy_pass http://localhost:5173;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /api {
proxy_pass http://localhost:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
}
# Start development with hot reload
./opentr.sh start dev
# Backend development
cd backend/
pip install -r requirements.txt
pytest tests/ # Run tests
black app/ # Format code
flake8 app/ # Lint code
# Frontend development
cd frontend/
npm install
npm run dev # Development server
npm run test # Run tests
npm run lint # Lint code
# Backend tests
./opentr.sh shell backend
pytest tests/ # All tests
pytest tests/api/ # API tests only
pytest --cov=app tests/ # With coverage
# Frontend tests
cd frontend/
npm run test # Unit tests
npm run test:e2e # End-to-end tests
npm run test:components # Component tests
We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.
# Check GPU availability
nvidia-smi
# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
# Set CPU-only mode if needed
echo "USE_GPU=false" >> .env
# Reduce model size
echo "WHISPER_MODEL=medium" >> .env
echo "BATCH_SIZE=8" >> .env
echo "COMPUTE_TYPE=int8" >> .env
# Monitor memory usage
docker stats
USE_GPU=true
)WHISPER_MODEL=medium
)# Reset database
./opentr.sh reset dev
# Check database logs
./opentr.sh logs postgres
# Verify database is running
./opentr.sh shell postgres psql -U postgres -l
# Check service status
./opentr.sh status
# Clean up resources
./opentr.sh clean
# Full reset (โ ๏ธ deletes all data)
./opentr.sh reset dev
# GPU optimization
COMPUTE_TYPE=float16 # Use half precision
BATCH_SIZE=32 # Increase for more GPU memory
WHISPER_MODEL=large-v2 # Best accuracy
# CPU optimization (if no GPU)
COMPUTE_TYPE=int8 # Use quantization
BATCH_SIZE=1 # Reduce memory usage
WHISPER_MODEL=base # Faster processing
This project is licensed under the MIT License - see the LICENSE file for details.
Built with โค๏ธ using AI assistance and modern open-source technologies.
OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.