A self-hosted, privacy-focused NVR (Network Video Recorder) with real-time Speech-to-Text and high-accuracy Face Recognition.
⚠️ Warning: This project is under active, experimental development. Features change often, and the main branch is frequently broken. We prioritize moving fast and testing new AI integrations over stability at this stage. Use at your own risk.
TheWallflower has evolved into a "Split-Pipeline" architecture, offloading heavy video tasks to go2rtc while the Python backend focuses on AI processing (Whisper + InsightFace).
/data/faces/known/{name}/).| Component | Technology |
|---|---|
| Frontend | Svelte 5 (Runes) + TailwindCSS v4 |
| Backend | FastAPI + SQLModel + Alembic |
| Video Engine | go2rtc (Embedded) |
| Speech AI | WhisperLive + Faster-Whisper + Silero VAD |
| Vision AI | InsightFace (buffalo_l) + ONNX Runtime |
| Database | SQLite (WAL Mode) |
| Container | Docker (Multi-stage build) |
# Clone the repository
git clone https://github.com/Jellman86/TheWallFlower.git
cd TheWallflower
# Copy and customize environment (essential for WebRTC)
cp .env.example .env
# Edit .env and set WEBRTC_ADVERTISED_IP to your server's local IP
# Start the services
docker compose up -d
The web UI will be available at http://localhost:8953
TheWallflower supports hardware acceleration for AI tasks:
WHISPER_IMAGE to the openvino variant in .env.WHISPER_IMAGE to the gpu variant and uncomment the deploy section in docker-compose.yml.┌─────────────────────────────────────────────────────────────────────────┐
│ TheWallflower Container │
│ │
│ ┌────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ FastAPI │◄────►│ go2rtc │◄────►│ RTSP Camera │ │
│ │ Backend │ │ (Video Engine) │ │ │ │
│ │ :8953 │ │ :8954/8955/8956│ │ │ │
└───────┬────────┘ └─────────────────┘ └─────────────────┘ │
│ │
│ Audio Worker Pipeline: │
│ FFmpeg ──► Bandpass ──► Energy Gate ──► Silero VAD │
│ │
▼ │
┌────────────────┐ │
│ WhisperLive │ ◄── Only verified speech chunks reach here │
│ (External) │ │
│ :9090 │ │
└────────────────┘ │
│
│ Face Worker Pipeline: │
│ Fetch Frame ──► InsightFace ──► Identify ──► DB Event │
│ │
│ Recording Worker: │
│ FFmpeg (Copy) ──► Segmented MP4s ──► /data/recordings │
│ │
└─────────────────────────────────────────────────────────────────────────┘
To skip the "Unknown" phase, you can pretrain the system with existing photos:
/data/faces/known/John_Smith/.jpg or .png photos of John into that folder.TheWallflower/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI application & SSE
│ │ ├── stream_manager.py # Worker lifecycle
│ │ ├── worker.py # Audio extraction & VAD
│ │ ├── workers/ # Background tasks (Face, Recording)
│ │ └── services/ # Business logic (Detection, Recording)
│ └── migrations/ # Alembic DB migrations
├── frontend/
│ ├── src/
│ │ ├── lib/
│ │ │ ├── components/ # WebRTCPlayer, FaceCard, RecordingsPanel
│ │ │ └── services/ # API client (api.js)
│ └── public/
├── docker-compose.yml
├── Dockerfile
└── docker-entrypoint.sh
MIT License - see LICENSE for details.