Iteratively refine AI-generated game UIs into pixel-faithful Svelte components, using a vision critic + code generator loop.
A pure-browser tool that takes a screenshot of a game UI you want to replicate and runs an iterative loop where:
No Webpack, no Vite, no React, no Svelte (the irony). One HTML file + a tiny Python server + a handful of compiled TS modules. Open a browser tab and you're refining.
You ask Claude/GPT/Gemini for a Svelte game UI component from a reference image and you get this:
target: β what the LLM gives you:
β
[ βΈ βΆ βΆβΆ β ] β [pause][play][>>][x]
β bevels β β flat
β gradients β β no gradients
β pixel detail β β generic Tailwind
The model knows what a button bar looks like, but it can't see its own output to verify if the bevels match, the gradient is right, the spacing is exact. So you iterate manually for an hour.
This tool closes the loop: the model sees the target, generates code, the browser renders it, a different VLM compares the render to the target and emits structured feedback, and the generator iterates. After 4-5 epochs you get something that actually looks like the target.
flowchart TB
T[π― Target<br/><b>image</b>]:::target --> G1[π§ Generator VLM<br/><b>writes Svelte + HTML</b>]:::gen
G1 --> R1[πΌοΈ Render<br/><b>in iframe</b>]:::render
R1 --> S[πΈ Screenshot<br/><b>html2canvas JPEG</b>]:::shot
S --> C[π Critic VLM<br/><b>compares target vs render</b>]:::critic
T --> C
C -->|π JSON scores + issues| G2[β»οΈ Generator refines<br/><b>using critique</b>]:::gen
G2 --> R1
C -->|βΈ pause / β°οΈ plateau| OUT[β
Final .svelte<br/><b>auto run done</b>]:::out
OUT --> H{β¨ Final touch?}:::ask
H -->|yes| HF[π€ Human feedback<br/><b>plain text</b>]:::human
HF --> FG[π Forced Gemini 3.1 Pro<br/><b>top model, ignores preset</b>]:::premium
FG --> OUT2[π Polished .svelte<br/><b>deliverable</b>]:::out
classDef target fill:#1e293b,stroke:#fbbf24,stroke-width:2px,color:#fbbf24
classDef gen fill:#581c87,stroke:#c084fc,stroke-width:2px,color:#f5f3ff
classDef render fill:#0c4a6e,stroke:#38bdf8,stroke-width:2px,color:#e0f2fe
classDef shot fill:#134e4a,stroke:#2dd4bf,stroke-width:2px,color:#ccfbf1
classDef critic fill:#164e63,stroke:#22d3ee,stroke-width:2px,color:#cffafe
classDef out fill:#14532d,stroke:#4ade80,stroke-width:2px,color:#dcfce7
classDef ask fill:#422006,stroke:#fbbf24,stroke-width:2px,color:#fde68a
classDef human fill:#4c1d95,stroke:#a78bfa,stroke-width:2px,color:#ede9fe
classDef premium fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff7ed
The architecture is heavily inspired by GameUIAgent (the 6-stage pipeline + 5 quality dimensions for the critic), UI2Code^N (the iterative drafting/polishing paradigm), VisRefiner (the diff-aligned learning approach), and AutoGameUI (the separation of UI artist concerns from UX functional concerns). See Related work below for citations.
sequenceDiagram
autonumber
participant U as π€ You
participant L as π Loop
participant G as π§ Generator<br/><b>Gemini 3.1 Pro</b>
participant I as πΌοΈ iframe
participant C as π Critic<br/><b>Gemini Flash-Lite</b>
U->>+L: βΆοΈ Click Start
L->>+G: π― target image + system prompt
G-->>-L: π¦ svelte block<br/>π html block
L->>+I: render via srcdoc
I-->>-L: πΈ screenshot<br/><b>JPEG html2canvas</b>
L->>+C: π― target + πΈ render + code
C-->>-L: π critique JSON<br/><b>5 scores + issues</b>
L->>L: π scoreHistory.push<br/>π¨ drawChart
Note over L: β» epoch N+1<br/>while scoreHistory.length < epochs
L->>+G: π― + πΈ + π critique + previous code
G-->>-L: β»οΈ refined svelte + html
L->>I: re-render
Note over U: βΈοΈ Pause when satisfied
U->>L: β¨ Type manual feedback
L->>+G: π forced Gemini 3.1 Pro<br/>π― + πΈ + π€ human text
G-->>-L: π final polished code
L-->>-U: β
Done
flowchart TB
App[π runRefinement loop<br/><b>main.ts</b>]:::app --> CM{π¦ callModel<br/><b>dispatcher</b>}:::dispatch
CM -->|π¦ google| CG[π‘ callGoogle<br/><b>v1beta/models/:generateContent</b>]:::google
CM -->|π§ openrouter| CO[π‘ callOpenRouter<br/><b>api/v1/chat/completions</b>]:::or
CG --> Gemini[π Google AI Studio<br/><b>Gemini 3.x / 2.5 / 2.0 / 1.5</b>]:::google
CO --> OR[π OpenRouter<br/><b>unified gateway</b>]:::or
OR --> Claude[π€ Claude 4.x<br/><b>Anthropic</b>]:::anth
OR --> GPT[π€ GPT-5.x<br/><b>OpenAI</b>]:::oai
OR --> Grok[π€ Grok 4.x<br/><b>xAI Β· 2M ctx</b>]:::xai
OR --> Other[π€ Qwen / Kimi / Nemotron / Llama<br/><b>open weights</b>]:::other
classDef app fill:#1e293b,stroke:#fbbf24,stroke-width:2px,color:#fde68a
classDef dispatch fill:#422006,stroke:#f59e0b,stroke-width:2px,color:#fde68a
classDef google fill:#0c4a6e,stroke:#38bdf8,stroke-width:2px,color:#e0f2fe
classDef or fill:#7c2d12,stroke:#fb923c,stroke-width:2px,color:#fff7ed
classDef anth fill:#451a03,stroke:#d97706,stroke-width:1.5px,color:#fef3c7
classDef oai fill:#14532d,stroke:#22c55e,stroke-width:1.5px,color:#dcfce7
classDef xai fill:#0f172a,stroke:#64748b,stroke-width:1.5px,color:#e2e8f0
classDef other fill:#581c87,stroke:#a855f7,stroke-width:1.5px,color:#f5f3ff
flowchart TB
R[π runs/]:::root --> S[ποΈ 20260407_152330_a3b8/<br/><b>session timestamp + random</b>]:::session
S --> AUTO[π€ Auto epochs<br/><b>generated by the loop</b>]:::auto
S --> MAN[π€ Manual epochs<br/><b>human feedback touches</b>]:::manual
AUTO --> A1[πΈ epoch1_render.jpg<br/><b>screenshot for the critic</b>]:::img
AUTO --> A2[π¦ epoch1_component.svelte<br/><b>deliverable</b>]:::code
AUTO --> A3[π epoch1_preview.html<br/><b>standalone preview</b>]:::code
AUTO --> A4[π epoch1_critique.json<br/><b>scores + issues</b>]:::critique
AUTO --> A5[π¬ epoch1_gen_response.txt<br/><b>raw generator output</b>]:::raw
AUTO --> A6[π¬ epoch1_critic_response.txt<br/><b>raw critic output</b>]:::raw
AUTO --> AN[β epochN_*<br/><b>same set per epoch</b>]:::more
MAN --> M1[βοΈ human1_feedback.txt<br/><b>your prompt</b>]:::human
MAN --> M2[π¦ human1_component.svelte<br/><b>polished deliverable</b>]:::code
MAN --> M3[π human1_preview.html<br/><b>polished preview</b>]:::code
MAN --> M4[πΈ human1_render.jpg<br/><b>final screenshot</b>]:::img
MAN --> MN[β humanN_*<br/><b>one set per feedback</b>]:::more
classDef root fill:#1e1b4b,stroke:#a78bfa,stroke-width:2px,color:#ede9fe
classDef session fill:#312e81,stroke:#818cf8,stroke-width:2px,color:#e0e7ff
classDef auto fill:#0c4a6e,stroke:#38bdf8,stroke-width:2px,color:#e0f2fe
classDef manual fill:#581c87,stroke:#c084fc,stroke-width:2px,color:#f5f3ff
classDef img fill:#134e4a,stroke:#2dd4bf,stroke-width:1.5px,color:#ccfbf1
classDef code fill:#14532d,stroke:#4ade80,stroke-width:1.5px,color:#dcfce7
classDef critique fill:#422006,stroke:#fbbf24,stroke-width:1.5px,color:#fde68a
classDef raw fill:#1f2937,stroke:#9ca3af,stroke-width:1.5px,color:#e5e7eb
classDef human fill:#4c1d95,stroke:#a78bfa,stroke-width:1.5px,color:#ede9fe
classDef more fill:#374151,stroke:#6b7280,stroke-width:1px,color:#d1d5db,stroke-dasharray:4 2
git clone https://github.com/GeraCollante/game-ui-refiner
cd game-ui-refiner
# Get a free Gemini API key at https://aistudio.google.com/app/apikey
cp .env.example .env
$EDITOR .env # paste your key into GEMINI_API_KEY=AIza...
python3 serve.py
# β http://localhost:8000
That's it. No npm install for end users β the compiled JS is committed in js/.
In the browser:
| Family | Provider | Models | Vision |
|---|---|---|---|
| Gemini 3.x | Google direct + OpenRouter | 3.1 Pro, 3.1 Flash-Lite, 3 Pro, 3 Flash | β |
| Gemini 2.5 | Google direct + OpenRouter | 2.5 Pro, 2.5 Flash, 2.5 Flash-Lite | β |
| Gemini 1.5/2.0 | Google direct | 2.0 Flash, 1.5 Pro/Flash | β |
| GPT-5.x | OpenRouter | 5.4, 5.4 mini, 5 | β |
| Claude 4.x | OpenRouter | Sonnet 4.6, Opus 4.6, Haiku 4.5 | β |
| Grok 4.x | OpenRouter | 4.20 (2M ctx), 4.20 multi-agent, 4 Fast, 4 | β |
| Qwen 3.5 | OpenRouter | 397B A17B | β |
| Kimi K2.5 | OpenRouter | K2.5 | β |
| Nemotron | OpenRouter | Nano 12B VL (+ :free variant) |
β |
| Llama 4 | OpenRouter | Maverick | β |
For up-to-date pricing and benchmarks, see Artificial Analysis.
| Preset | Critic | Generator | Cost / 4-epoch run | Notes |
|---|---|---|---|---|
| π₯ Smart (default) | Gemini 3.1 Flash-Lite | Gemini 3.1 Pro | ~$0.04 | Best balance |
| π₯ Smart-3 | Gemini 3 Flash | Gemini 3.1 Pro | ~$0.05 | If 3.1 Flash-Lite hallucinates |
| β‘ Speed | Gemini 3.1 Flash-Lite | Gemini 3 Flash | ~$0.02 | Iterate prompts fast |
| π Premium | Gemini 3.1 Pro | Gemini 3.1 Pro | ~$0.10 | Maximum quality |
| πͺ¨ Stable | Gemini 2.5 Flash | Gemini 2.5 Pro | ~$0.04 | Free tier friendly |
| π °οΈ Anthropic (OR only) | Gemini 3.1 Flash-Lite | Claude Sonnet 4.6 | ~$0.10 | A/B test Claude |
| π€ Grok (OR only) | Grok 4 Fast | Grok 4.20 | ~$0.04 | Top non-hallucination + IFBench |
| π€ Grok Full (OR only) | Grok 4.20 | Grok 4 | ~$0.12 | Premium Grok stack |
| π Free | Gemini 2.5 Flash | Gemini 2.5 Flash | $0 | Free tier (15 RPM) |
The Manual Feedback button always uses Gemini 3.1 Pro regardless of the active preset, because for final touch-ups you want the strongest model and cost is irrelevant.
A typical run with the Smart preset is ~$0.04 for 4 epochs (Google direct pricing). The Premium preset is ~$0.10. The Free preset is ~$0 if you stay within Gemini's 1000 requests/day free tier.
The header shows live cost and wall-clock time. Both reset on each run.
Browsers can't render .svelte files natively without a compiler. So the generator emits two blocks per response:
.svelte component (the deliverable)The two stay in sync because the same model emits both in one go.
JPEG quality 0.85 is 5β10Γ smaller than PNG for typical UI screenshots and visually indistinguishable for fidelity comparison. The savings compound across epochs (each epoch sends 1-2 screenshots in API requests).
The Prompts panel renders inline base64 image thumbnails, which Firefox decodes and keeps in RAM. Refreshing it on every API call (instead of only when you click the tab) was a Firefox memory hog. Now it's marked dirty and only rebuilt when you actually view it.
Because we kept getting bitten by these specific bugs:
</script> in JS strings β kills the inline script tag (the HTML parser doesn't understand JS)id="..." attributes after refactorstools/check.mjs parses every compiled js/*.js with acorn, runs Tarjan SCC over the call graph for indirect cycles, and cross-checks DOM IDs declared in HTML against $('foo') references in JS. Run it with npm run check or as part of npm run lint.
game-ui-refiner/
βββ index.html # ~220 lines: just markup, loads ./js/main.js as type=module
βββ serve.py # ~150 lines: tiny Python server, .env injection, /save endpoint
βββ src/ # TypeScript source (edit these)
β βββ types.ts # interfaces shared across modules
β βββ state.ts # global mutable state
β βββ config.ts # model catalogs, presets, dim colors
β βββ parser.ts # pure functions: parseDualOutput, extractJson, etc
β βββ api.ts # provider clients + message builders + screenshot
β βββ ui.ts # tabs, chart, history, ticker, save, log, dropdowns
β βββ main.ts # entry: runRefinement, runFeedbackEpoch, init
βββ js/ # compiled output (committed, no npm install needed)
βββ tests/run.mjs # 38 plain-Node parser tests (no jest/vitest)
βββ tools/
β βββ check.mjs # static analyzer (acorn + Tarjan SCC + DOM crosscheck)
β βββ lint.sh # full pipeline: tsc + check + tests + ruff
β βββ README.md
βββ .github/workflows/check.yml # CI: lint + tests + serve.py smoke test
βββ .env.example
βββ tsconfig.json
βββ package.json
βββ README.md
βββ CONTRIBUTING.md
βββ CHANGELOG.md
βββ LICENSE
# Install dev deps (TypeScript + acorn for the analyzer)
npm install
cd tools && npm install && cd ..
# Edit src/*.ts, then build
npm run build # one-shot
npm run watch # auto-recompile
# Run all checks
npm run lint
The full pipeline that CI runs is in tools/lint.sh:
npx tsc --noEmit β type-checknode tools/check.mjs β static analyzernode tests/run.mjs β parser unit testsruff check serve.py β Python lintThis project would not exist without the following papers. None of their code is reused β the inspiration is conceptual, not literal.
structural_fidelity, color_consistency, typography, spacing_alignment, visual_completeness). Also the inspiration for the Reflection Controller pattern.MIT β see LICENSE.
Built in a single conversation with Claude Code (Opus 4.6 with 1M context). The TypeScript split, the analyzer, the test suite, the README, and most of the system prompt engineering were iterated end-to-end with the model in the loop. The architectural decisions (split critic/generator, lazy DOM, JPEG screenshots, Tarjan-based recursion detection, parser strategies for messy LLM output) emerged from real failures during development.
Citations to all the related papers above. Special mention to GameUIAgent as the conceptual seed.