Browser UI for Stable Audio 3 medium — inpainting / vary / text-to-audio. MLX-backed SAME-L decoder so it runs on Apple Silicon without flash-attn.
Upstream: Stability-AI/stable-audio-3 · stabilityai/stable-audio-3-medium on HF
Has: paint-on-spectrogram inpainting · text-to-audio · audio-to-audio (vary) · scroll/pinch zoom anchored at cursor · shift-scroll pan · click-to-scrub playhead · lowpass + duck on playback over masked regions · per-latent frequency-colored waveform · ghost overlay for past inpaints · LoRA stacking with strength sliders · live system stats
Doesn't have: variant history / undo · per-region prompts · streaming per-step diffusion previews · multi-track · MIDI · frequency-bounded selections · mobile/touch layout · auth · cloud
# python deps
uv sync
# frontend deps
cd webui && npm install && cd ..
You'll also need the SA3 medium weights from HuggingFace at ~/Projects/stable-audio-3/models/stable-audio-3-medium/ (or edit the LOCAL_MEDIUM path in backend/server.py).
# backend on :5174 — ~30s to load the model
uv run python backend/server.py
# frontend on :5173 — Vite proxies /api → :5174
cd webui && npm run dev
Open http://localhost:5173.
LoRA library is read from $SA3_LORA_DIR (default ~/loras).
Concrete things you'll trip on:
hf download stabilityai/stable-audio-3-medium --local-dir ~/Projects/stable-audio-3/models/stable-audio-3-medium. The path is hard-coded as LOCAL_MEDIUM in backend/server.py — change it or symlink, your call.uv sync covers everything. Python 3.11. The torch + mlx + mlx-metal + safetensors + fastapi + psutil stack — let uv resolve it./api → :5174; without the backend you get 502s and a red dot in the model status. The backend prints [backend] ready when the model finishes loading (~30s on first run, less on subsequent because of fs cache).127.0.0.1:5174, frontend dev server :5173. Kill any other process on those ports first.hf download needs hf auth login.[generate], [truncate], [inpaint] log lines). Frontend logs [play], [vis-toggle] etc. to the browser console.The architecture is small enough to read end-to-end in an hour:
backend/server.py FastAPI app, model lifecycle, viz rendering, /api routes
mlx_sa3/ae.py top-level decoder chain
mlx_sa3/nn_blocks.py transformer + differential attention + band-mask SWA
mlx_sa3/weights.py safetensors → mlx weight remap
webui/src/lib/session.svelte.js shared reactive state + api client
webui/src/lib/MainCanvas.svelte spectrogram + paint + zoom interaction
webui/src/App.svelte layout + audio graph + playback wiring
design.md the design spec