OpenTranscribe Svelte Themes

Opentranscribe

Self-hosted AI-powered transcription platform with speaker diarization, search, and collaboration features. Built with Svelte, FastAPI, and Docker for easy deployment.

OpenTranscribe Logo

AI-Powered Transcription and Media Analysis Platform

OpenTranscribe is a powerful, containerized web application for transcribing and analyzing audio/video files using state-of-the-art AI models. Built with modern technologies and designed for scalability, it provides an end-to-end solution for speech-to-text conversion, speaker identification, and content analysis.

Note: This application is 99.9% created by AI using Windsurf and various commercial LLMs, demonstrating the power of AI-assisted development.

โœจ Key Features

๐ŸŽง Advanced Transcription

  • High-Accuracy Speech Recognition: Powered by WhisperX with faster-whisper backend
  • Word-Level Timestamps: Precise timing for every word using WAV2VEC2 alignment
  • Multi-Language Support: Transcribe in multiple languages with automatic English translation
  • Batch Processing: 70x realtime speed with large-v2 model on GPU

๐Ÿ‘ฅ Smart Speaker Management

  • Automatic Speaker Diarization: Identify different speakers using PyAnnote.audio
  • Cross-Video Speaker Recognition: AI-powered voice fingerprinting to identify speakers across different media files
  • Speaker Profile System: Create and manage global speaker profiles that persist across all transcriptions
  • AI-Powered Speaker Suggestions: Automatic speaker identification with confidence scores and verification workflow
  • Custom Speaker Labels: Edit and manage speaker names and information with intelligent suggestions
  • Speaker Analytics: View speaking time distribution, cross-media appearances, and interaction patterns

๐ŸŽฌ Rich Media Support

  • Universal Format Support: Audio (MP3, WAV, FLAC, M4A) and Video (MP4, MOV, AVI, MKV)
  • Large File Support: Upload files up to 4GB for GoPro and high-quality video content
  • Interactive Media Player: Click transcript to navigate playback
  • Metadata Extraction: Comprehensive file information using ExifTool
  • Subtitle Export: Generate SRT/VTT files for accessibility
  • File Reprocessing: Re-run AI analysis while preserving user comments and annotations

๐Ÿ” Powerful Search & Discovery

  • Hybrid Search: Combine keyword and semantic search capabilities
  • Full-Text Indexing: Lightning-fast content search with OpenSearch
  • Advanced Filtering: Filter by speaker, date, tags, duration, and more
  • Smart Tagging: Organize content with custom tags and categories
  • Collections System: Group related media files into organized collections for better project management

๐Ÿ“Š Analytics & Insights

  • Content Analysis: Word count, speaking time, and conversation flow
  • Speaker Statistics: Individual speaker metrics and participation
  • Sentiment Analysis: Understand tone and emotional content
  • Automated Summaries: Generate concise summaries using local LLMs

๐Ÿ’ฌ Collaboration Features

  • Time-Stamped Comments: Add annotations at specific moments
  • User Management: Role-based access control (admin/user)
  • Export Options: Download transcripts in multiple formats
  • Real-Time Updates: Live progress tracking with detailed WebSocket notifications
  • Enhanced Progress Tracking: 13 granular processing stages with descriptive messages
  • Collection Management: Create, organize, and share collections of related media files

๐Ÿ› ๏ธ Technology Stack

Frontend

  • Svelte - Reactive UI framework with excellent performance
  • TypeScript - Type-safe development with modern JavaScript
  • Progressive Web App - Offline capabilities and native-like experience
  • Responsive Design - Seamless experience across all devices

Backend

  • FastAPI - High-performance async Python web framework
  • SQLAlchemy 2.0 - Modern ORM with type safety
  • Celery + Redis - Distributed task processing for AI workloads
  • WebSocket - Real-time communication for live updates

AI/ML Stack

  • WhisperX - Advanced speech recognition with alignment
  • PyAnnote.audio - Speaker diarization and voice analysis
  • Faster-Whisper - Optimized inference engine
  • Local LLMs - Privacy-focused text processing

Infrastructure

  • PostgreSQL - Reliable relational database
  • MinIO - S3-compatible object storage
  • OpenSearch - Full-text and vector search engine
  • Docker - Containerized deployment
  • NGINX - Production web server

๐Ÿš€ Quick Start

Prerequisites

# Required
- Docker and Docker Compose
- 8GB+ RAM (16GB+ recommended)

# Recommended for optimal performance
- NVIDIA GPU with CUDA support

Quick Installation (Using Docker Hub Images)

Run this one-liner to download and set up OpenTranscribe using our pre-built Docker Hub images:

curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

Then follow the on-screen instructions. The setup script will:

  • Download the production Docker Compose file
  • Configure environment variables including GPU support (default GPU device ID: 2)
  • Help you set up your Hugging Face token (required for speaker diarization)
  • Set up the management script (opentranscribe.sh)

Once setup is complete, start OpenTranscribe with:

cd opentranscribe
./opentranscribe.sh start

The Docker images are available on Docker Hub as separate repositories:

  • davidamacey/opentranscribe-backend: Backend service (also used for celery-worker and flower)
  • davidamacey/opentranscribe-frontend: Frontend service

Access the web interface at http://localhost:5173

Manual Installation (From Source)

  1. Clone the Repository

    git clone https://github.com/davidamacey/OpenTranscribe.git
    cd OpenTranscribe
       
    # Make utility script executable
    chmod +x opentr.sh
    
  2. Environment Configuration

    # Copy environment template
    cp .env.example .env
       
    # Edit .env file with your settings (optional for development)
    # Key variables:
    # - HUGGINGFACE_TOKEN (required for speaker diarization)
    # - GPU settings for optimal performance
    
  3. Start OpenTranscribe

    # Start in development mode (with hot reload)
    ./opentr.sh start dev
       
    # Or start in production mode
    ./opentr.sh start prod
    
  4. Access the Application

๐Ÿ“‹ OpenTranscribe Utility Commands

The opentr.sh script provides comprehensive management for all application operations:

Basic Operations

# Start the application
./opentr.sh start [dev|prod]     # Start in development or production mode
./opentr.sh stop                 # Stop all services
./opentr.sh status               # Show container status
./opentr.sh logs [service]       # View logs (all or specific service)

Development Workflow

# Service management
./opentr.sh restart-backend      # Restart API and workers without database reset
./opentr.sh restart-frontend     # Restart frontend only
./opentr.sh restart-all          # Restart all services without data loss

# Container rebuilding (after code changes)
./opentr.sh rebuild-backend      # Rebuild backend with new code
./opentr.sh rebuild-frontend     # Rebuild frontend with new code
./opentr.sh build                # Rebuild all containers

Database Management

# Data operations (โš ๏ธ DESTRUCTIVE)
./opentr.sh reset [dev|prod]     # Complete reset - deletes ALL data!
./opentr.sh init-db              # Initialize database without container reset

# Backup and restore
./opentr.sh backup               # Create timestamped database backup
./opentr.sh restore [file]       # Restore from backup file

System Administration

# Maintenance
./opentr.sh clean                # Remove unused containers and images
./opentr.sh health               # Check service health status
./opentr.sh shell [service]      # Open shell in container

# Available services: backend, frontend, postgres, redis, minio, opensearch, celery-worker

Monitoring and Debugging

# View specific service logs
./opentr.sh logs backend         # API server logs
./opentr.sh logs celery-worker   # AI processing logs
./opentr.sh logs frontend        # Frontend development logs
./opentr.sh logs postgres        # Database logs

# Follow logs in real-time
./opentr.sh logs backend -f

๐ŸŽฏ Usage Guide

Getting Started

  1. User Registration

    • Navigate to http://localhost:5173
    • Create an account or use default admin credentials
    • Set up your profile and preferences
  2. Upload Your First File

    • Click "Upload Files" or drag-and-drop media files (up to 4GB)
    • Supported formats: MP3, WAV, MP4, MOV, and more
    • Files are automatically queued for processing
  3. Monitor Processing

    • Watch detailed real-time progress with 13 processing stages
    • View task status in Flower monitor
    • Receive live WebSocket notifications for all status changes
  4. Explore Your Transcript

    • Click on transcript text to navigate media playback
    • Edit speaker names and add custom labels
    • Add time-stamped comments and annotations
    • Reprocess files to improve accuracy while preserving your edits

Advanced Features

Speaker Management

๐Ÿ‘ฅ Automatic Detection โ†’ ๐Ÿค– AI Recognition โ†’ ๐Ÿท๏ธ Profile Management โ†’ ๐Ÿ” Cross-Media Tracking
  • Speakers are automatically detected and assigned labels using advanced AI diarization
  • AI suggests speaker identities based on voice fingerprinting across your media library
  • Create global speaker profiles that persist across all your transcriptions
  • Accept or reject AI suggestions with confidence scores to improve accuracy over time
  • Track speaker appearances across multiple media files with detailed analytics

Search and Discovery

๐Ÿ” Keyword Search โ†’ ๐Ÿง  Semantic Search โ†’ ๐Ÿท๏ธ Smart Filtering
  • Search transcript content with advanced filters
  • Use semantic search to find related concepts
  • Organize content with custom tags and categories

Collections Management

๐Ÿ“ Create Collections โ†’ ๐Ÿ“‚ Organize Files โ†’ ๐Ÿท๏ธ Bulk Operations
  • Group related media files into named collections
  • Filter library view by specific collections
  • Bulk add/remove files from collections
  • Manage collection metadata and descriptions

Export and Integration

๐Ÿ“„ Multiple Formats โ†’ ๐Ÿ“บ Subtitle Files โ†’ ๐Ÿ”— API Access
  • Export transcripts as TXT, JSON, or CSV
  • Generate SRT/VTT subtitle files
  • Access data programmatically via REST API

๐Ÿ“ Project Structure

OpenTranscribe/
โ”œโ”€โ”€ ๐Ÿ“ backend/                 # Python FastAPI backend
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ app/                # Application modules
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ api/            # REST API endpoints
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ models/         # Database models
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ services/       # Business logic
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ tasks/          # Background AI processing
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ utils/          # Common utilities
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“ db/             # Database configuration
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ scripts/            # Admin and maintenance scripts
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ tests/              # Comprehensive test suite
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ README.md           # Backend documentation
โ”œโ”€โ”€ ๐Ÿ“ frontend/               # Svelte frontend application
โ”‚   โ”œโ”€โ”€ ๐Ÿ“ src/                # Source code
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ components/     # Reusable UI components
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ routes/         # Page components
โ”‚   โ”‚   โ”œโ”€โ”€ ๐Ÿ“ stores/         # State management
โ”‚   โ”‚   โ””โ”€โ”€ ๐Ÿ“ styles/         # CSS and themes
โ”‚   โ””โ”€โ”€ ๐Ÿ“„ README.md           # Frontend documentation
โ”œโ”€โ”€ ๐Ÿ“ database/               # Database initialization
โ”œโ”€โ”€ ๐Ÿ“ models_ai/              # AI model storage (runtime)
โ”œโ”€โ”€ ๐Ÿ“ scripts/                # Utility scripts
โ”œโ”€โ”€ ๐Ÿ“„ docker-compose.yml      # Container orchestration
โ”œโ”€โ”€ ๐Ÿ“„ opentr.sh               # Main utility script
โ””โ”€โ”€ ๐Ÿ“„ README.md               # This file

๐Ÿ”ง Configuration

Environment Variables

Core Application

# Database
DATABASE_URL=postgresql://postgres:password@postgres:5432/opentranscribe

# Security
SECRET_KEY=your-super-secret-key-here
JWT_SECRET_KEY=your-jwt-secret-key

# Object Storage
MINIO_ROOT_USER=minioadmin
MINIO_ROOT_PASSWORD=minioadmin
MINIO_BUCKET_NAME=transcribe-app

AI Processing

# Required for speaker diarization - see setup instructions below
HUGGINGFACE_TOKEN=your_huggingface_token_here

# Model configuration
WHISPER_MODEL=large-v2              # large-v2, medium, small, base
COMPUTE_TYPE=float16                # float16, int8
BATCH_SIZE=16                       # Reduce if GPU memory limited

# Speaker detection
MIN_SPEAKERS=1                      # Minimum speakers to detect
MAX_SPEAKERS=10                     # Maximum speakers to detect

๐Ÿ”‘ HuggingFace Token Setup

OpenTranscribe requires a HuggingFace token for speaker diarization and voice fingerprinting features. Follow these steps:

1. Generate HuggingFace Token

  1. Visit HuggingFace Settings > Access Tokens
  2. Click "New token" and select "Read" access
  3. Copy the generated token

2. Accept Model User Agreements

You must accept the user agreements for these models:

3. Configure Token

Add your token to the environment configuration:

For Production Installation:

# The setup script will prompt you for your token
curl -fsSL https://raw.githubusercontent.com/davidamacey/OpenTranscribe/master/setup-opentranscribe.sh | bash

For Manual Installation:

# Add to .env file
echo "HUGGINGFACE_TOKEN=hf_xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx" >> .env

Note: Without a valid HuggingFace token, speaker diarization will be disabled and speakers will not be automatically detected or identified across different media files.

Performance Tuning

# GPU settings
USE_GPU=true                        # Enable GPU acceleration
CUDA_VISIBLE_DEVICES=0              # GPU device selection

# Resource limits
MAX_UPLOAD_SIZE=4GB                 # Maximum file size (supports GoPro videos)
CELERY_WORKER_CONCURRENCY=2         # Concurrent tasks

Production Deployment

For production use, ensure you:

  1. Security Configuration

    # Generate strong secrets
    openssl rand -hex 32  # For SECRET_KEY
    openssl rand -hex 32  # For JWT_SECRET_KEY
       
    # Set strong database passwords
    # Configure proper firewall rules
    # Set up SSL/TLS certificates
    
  2. Performance Optimization

    # Use production environment
    NODE_ENV=production
       
    # Configure resource limits
    # Set up monitoring and logging
    # Configure backup strategies
    
  3. Reverse Proxy Setup

    # Example NGINX configuration
    server {
        listen 80;
        server_name your-domain.com;
           
        location / {
            proxy_pass http://localhost:5173;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
        
        location /api {
            proxy_pass http://localhost:8080;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
        }
    }
    

๐Ÿงช Development

Development Environment

# Start development with hot reload
./opentr.sh start dev

# Backend development
cd backend/
pip install -r requirements.txt
pytest tests/                    # Run tests
black app/                       # Format code
flake8 app/                      # Lint code

# Frontend development  
cd frontend/
npm install
npm run dev                      # Development server
npm run test                     # Run tests
npm run lint                     # Lint code

Testing

# Backend tests
./opentr.sh shell backend
pytest tests/                    # All tests
pytest tests/api/                # API tests only
pytest --cov=app tests/          # With coverage

# Frontend tests
cd frontend/
npm run test                     # Unit tests
npm run test:e2e                 # End-to-end tests
npm run test:components          # Component tests

Contributing

We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines.

๐Ÿ” Troubleshooting

Common Issues

GPU Not Detected

# Check GPU availability
nvidia-smi

# Verify Docker GPU support
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

# Set CPU-only mode if needed
echo "USE_GPU=false" >> .env

Memory Issues

# Reduce model size
echo "WHISPER_MODEL=medium" >> .env
echo "BATCH_SIZE=8" >> .env
echo "COMPUTE_TYPE=int8" >> .env

# Monitor memory usage
docker stats

Slow Transcription

  • Use GPU acceleration (USE_GPU=true)
  • Reduce model size (WHISPER_MODEL=medium)
  • Increase batch size if you have GPU memory
  • Split large files into smaller segments

Database Connection Issues

# Reset database
./opentr.sh reset dev

# Check database logs
./opentr.sh logs postgres

# Verify database is running
./opentr.sh shell postgres psql -U postgres -l

Container Issues

# Check service status
./opentr.sh status

# Clean up resources
./opentr.sh clean

# Full reset (โš ๏ธ deletes all data)
./opentr.sh reset dev

Getting Help

  • ๐Ÿ“š Documentation: Check README files in each component directory
  • ๐Ÿ› Issues: Report bugs on GitHub Issues
  • ๐Ÿ’ฌ Discussions: Ask questions in GitHub Discussions
  • ๐Ÿ“Š Monitoring: Use Flower dashboard for task debugging

๐Ÿ“ˆ Performance & Scalability

Hardware Recommendations

Minimum Requirements

  • 8GB RAM
  • 4 CPU cores
  • 50GB disk space
  • Any modern GPU (optional but recommended)
  • 16GB+ RAM
  • 8+ CPU cores
  • 100GB+ SSD storage
  • NVIDIA GPU with 8GB+ VRAM (RTX 3070 or better)
  • High-speed internet for model downloads

Production Scale

  • 32GB+ RAM
  • 16+ CPU cores
  • Multiple GPUs for parallel processing
  • Fast NVMe storage
  • Load balancer for multiple instances

Performance Tuning

# GPU optimization
COMPUTE_TYPE=float16              # Use half precision
BATCH_SIZE=32                     # Increase for more GPU memory
WHISPER_MODEL=large-v2            # Best accuracy

# CPU optimization (if no GPU)
COMPUTE_TYPE=int8                 # Use quantization
BATCH_SIZE=1                      # Reduce memory usage
WHISPER_MODEL=base                # Faster processing

๐Ÿ” Security Considerations

Data Privacy

  • All processing happens locally - no data sent to external services
  • Optional: Disable external model downloads for air-gapped environments
  • User data is encrypted at rest and in transit
  • Configurable data retention policies

Access Control

  • Role-based permissions (admin/user)
  • File ownership validation
  • API rate limiting
  • Secure session management

Network Security

  • All services run in isolated Docker network
  • Configurable firewall rules
  • Optional SSL/TLS termination
  • Secure default configurations

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • OpenAI Whisper - Foundation speech recognition model
  • WhisperX - Enhanced alignment and diarization
  • PyAnnote.audio - Speaker diarization capabilities
  • FastAPI - Modern Python web framework
  • Svelte - Reactive frontend framework
  • Docker - Containerization platform

Built with โค๏ธ using AI assistance and modern open-source technologies.

OpenTranscribe demonstrates the power of AI-assisted development while maintaining full local control over your data and processing.

Top categories

Loading Svelte Themes