sa3-inpainter-ui Svelte Themes

Sa3 Inpainter Ui

SA3 medium audio inpainter — MLX SAME-L decoder + FastAPI + vanilla Svelte UI

sa3-inpainter-ui

Browser UI for Stable Audio 3 medium — inpainting / vary / text-to-audio. MLX-backed SAME-L decoder so it runs on Apple Silicon without flash-attn.

Upstream: Stability-AI/stable-audio-3 · stabilityai/stable-audio-3-medium on HF

Has: paint-on-spectrogram inpainting · text-to-audio · audio-to-audio (vary) · scroll/pinch zoom anchored at cursor · shift-scroll pan · click-to-scrub playhead · lowpass + duck on playback over masked regions · per-latent frequency-colored waveform · ghost overlay for past inpaints · LoRA stacking with strength sliders · live system stats

Doesn't have: variant history / undo · per-region prompts · streaming per-step diffusion previews · multi-track · MIDI · frequency-bounded selections · mobile/touch layout · auth · cloud

Install

# python deps
uv sync

# frontend deps
cd webui && npm install && cd ..

You'll also need the SA3 medium weights from HuggingFace at ~/Projects/stable-audio-3/models/stable-audio-3-medium/ (or edit the LOCAL_MEDIUM path in backend/server.py).

Run

# backend on :5174 — ~30s to load the model
uv run python backend/server.py

# frontend on :5173 — Vite proxies /api → :5174
cd webui && npm run dev

Open http://localhost:5173.

LoRA library is read from $SA3_LORA_DIR (default ~/loras).


For an agent

Concrete things you'll trip on:

  • Apple Silicon or CUDA Linux/WSL. Backend uses torch+MPS for the DIT and MLX for the decoder on Mac. On Linux/WSL with flash-attn installed, swap MPS for CUDA and it runs there too. Bare Windows / CUDA-on-Mac aren't on the path.
  • Model weights are gated. Accept the license at https://huggingface.co/stabilityai/stable-audio-3-medium then hf download stabilityai/stable-audio-3-medium --local-dir ~/Projects/stable-audio-3/models/stable-audio-3-medium. The path is hard-coded as LOCAL_MEDIUM in backend/server.py — change it or symlink, your call.
  • uv sync covers everything. Python 3.11. The torch + mlx + mlx-metal + safetensors + fastapi + psutil stack — let uv resolve it.
  • Backend before frontend. Vite proxies /api → :5174; without the backend you get 502s and a red dot in the model status. The backend prints [backend] ready when the model finishes loading (~30s on first run, less on subsequent because of fs cache).
  • Port conflicts: backend binds 127.0.0.1:5174, frontend dev server :5173. Kill any other process on those ports first.
  • HuggingFace login isn't required at runtime once weights are local, but the first hf download needs hf auth login.
  • Where to look when something breaks: backend stderr is verbose ([generate], [truncate], [inpaint] log lines). Frontend logs [play], [vis-toggle] etc. to the browser console.

The architecture is small enough to read end-to-end in an hour:

backend/server.py            FastAPI app, model lifecycle, viz rendering, /api routes
mlx_sa3/ae.py                top-level decoder chain
mlx_sa3/nn_blocks.py         transformer + differential attention + band-mask SWA
mlx_sa3/weights.py           safetensors → mlx weight remap
webui/src/lib/session.svelte.js   shared reactive state + api client
webui/src/lib/MainCanvas.svelte   spectrogram + paint + zoom interaction
webui/src/App.svelte         layout + audio graph + playback wiring
design.md                    the design spec

Top categories

Loading Svelte Themes