Focus Lock Rs

wheevu

App for generating tracked vertical fancams from landscape videos.

#face-recognition #ffmpeg #kalman-filter #onnx-runtime #tauri #yolov8 #computer-vision #object-tracking #rust #svelte

focus-lock-rs

Automated fancam generator. It takes a standard landscape video and a reference photo of a person (say, your bias), tracks them, and generates a stabilized, vertical (9:16) cropped video locked onto them.

It features a modular Rust core for high-speed video processing, a CLI for batch operations, and a modern Tauri v2 desktop application for easy usage.

Features

Person detection + identity lock — YOLOv8 for detection, ArcFace for target matching
Smooth tracking — Kalman-filtered motion for less jitter and more stable framing
Identity discovery (GUI) — candidate scan and manual validation before render
Performance-first pipeline — threaded video pipeline with optimized preprocessing and rendering
Smart output — automatic 1080x1920 crop with fallback handling for occlusion or target loss
Desktop + CLI — Tauri app for interactive use, CLI for direct batch processing

Architecture

This project is organized as a Cargo workspace:

fancam-core/ — Rust engine for detection, identity matching, tracking, and rendering
cli/ — command-line interface for batch and scripted workflows
src-tauri/ + ui/ — Tauri desktop app with a Svelte frontend

Logic Flow

Decode video frames with FFmpeg
Detect people with YOLOv8
Match the target identity with ArcFace
Track motion across frames with Kalman smoothing
Render a stabilized vertical crop to H.264

Prerequisites

Rust stable toolchain
Node.js for the desktop UI
FFmpeg native libraries installed on your system
ONNX models for YOLOv8 Nano and ArcFace / MobileFaceNet

Setup

git clone https://github.com/wheevu/focus-lock-rs.git
cd focus-lock-rs
cargo build --release -p cli

Create a models/ directory in the project root and add:

yolov8n.onnx
w600k_mbf.onnx
osnet_x0_25_msmt17.onnx (optional but recommended for harder multi-person tracking / occlusion recovery)

For macOS, ensure a CoreML-enabled libonnxruntime.dylib is available at models/onnxruntime/lib/ (or set ORT_DYLIB_PATH).

Desktop Application (GUI)

cd ui
npm install
npm run tauri:dev

For a production build:

npm run tauri:build

CLI

Generate a fancam from a landscape video and reference image:

cargo run --release -p cli -- fancam \
  --video "/path/to/concert.mp4" \
  --bias "/path/to/face_photo.jpg" \
  --output "output_fancam.mp4" \
  --yolo-model "models/yolov8n.onnx" \
  --face-model "models/w600k_mbf.onnx" \
  --threshold 0.6

License

MIT

Tracking, performance, and GUI details

Tracking and identity locking

The pipeline combines person detection and face recognition to keep the crop locked onto a specific subject rather than just the most visible person in frame.

YOLOv8-Nano detects person bounding boxes efficiently through ONNX Runtime
ArcFace compares cropped face embeddings against the provided reference image using cosine similarity
the configured --threshold value is used consistently across both CLI and GUI workflows
tracking includes relock bias from the last known position to improve recovery after brief occlusion
identity checks are throttled adaptively once lock-on is established to reduce unnecessary compute

This makes the tracker more stable in crowded performance footage where multiple people may appear and disappear across frames.

Identity discovery pass (GUI)

The desktop app includes a pre-tracking discovery flow designed to make target selection more reliable before rendering begins.

sampled frames are scanned first to propose likely identity candidates
the UI can accept an expected member count and trigger a smarter rescan if duplicates or count mismatches are detected
users can manually review candidates before rendering:
- exclude false positives
- resolve duplicates
- confirm low-confidence matches
once validated, the selected member card is used as an additional tracking prior alongside the reference image

The Tauri backend persists scan sessions and validates review state server-side before allowing a render to begin.

Scan session lifecycle

To make the review and render flow more robust, scan sessions track explicit lifecycle states:

proposed
validated
tracking
completed
failed

Audit events are recorded through the session lifecycle, and run_fancam enforces that a validated session and selected identity match exist on the backend side, not just in the UI.

The GUI also supports manual split requests per identity, with a split-rescan path that refreshes candidate clustering when the initial grouping is not clean enough.

Smoothing and motion stability

To avoid shaky or jumpy crops, the render path uses a 2D Kalman filter to smooth subject motion across frames.

This helps with:

reducing abrupt camera jumps
keeping the framing more natural
preserving momentum when the target is briefly lost
simulating the feel of a human camera operator rather than a raw detector box snap

If the subject becomes occluded, the filter predicts the next likely position based on previous motion until visual confirmation is regained.

Performance pipeline

The core processing path is built around a 3-thread decode / inference / encode pipeline with bounded channels.

Performance-oriented behavior includes:

recognition throttling before and after lock-on to reduce CPU stalls
capping ArcFace checks to top-confidence person candidates per frame
detection downscale for faster large-video processing
parallel tensor preparation where possible
fast SIMD face preprocessing
render buffer and resizer reuse to reduce per-frame allocations
periodic per-stage timing logs for detect, identify, and render

These optimizations are aimed at keeping the pipeline responsive and practical for longer videos without turning the whole thing into a heater-core cosplay.

Rendering behavior

Rendering is optimized for vertical fancam output while remaining resilient when tracking quality changes.

automatic 1080x1920 framing for vertical output
SIMD-accelerated resize path with fast_image_resize
Lanczos3 upscaling when the subject is distant in frame
fallback letterboxing when the target is lost or visibility drops too far

This keeps output usable even when the tracker cannot confidently maintain a tight crop for every frame.

Interfaces

The project supports two main usage paths:

Desktop app

The Tauri desktop application is intended for interactive use:

scan identities visually
validate candidates
select a target
run render jobs without touching the command line

CLI

The CLI is better suited for:

direct runs
repeated experiments
scripting and batch workflows
debugging model thresholds and pipeline behavior

Cross-platform scope

The project is designed to run across Windows, macOS, and Linux, with a shared Rust processing core and a Tauri-based desktop frontend.

Top categories

tailwind daisyui admin template popup mdsvex portfolio blog form ecommerce ui carousel auth dark seo image routing