AI-assisted browser automation you can supervise in real time.
This project wires Google's Gemini 2.5 Computer Use preview model to a real Chromium browser that you can watch—and interrupt—in real time. The agent gets a headful window, you get a live chat console, and both of you can take the wheel whenever you like.
The goal here is simple: give humans an honest look at what the preview model can do while keeping supervision front and center.
gemini-2.5-computer-use-preview-10-2025The backend talks to Gemini strictly through environment variables. Set your key with:
export GEMINI_API_KEY=your_key_here
The helper script manage.py hides the usual wall of setup commands:
python manage.py launch
That one line will:
.venv and install requirements.txt.pnpm install --frozen-lockfile inside ui/.http://localhost:8000.Pop that URL in a browser and you'll see the dashboard. Seconds later a Chromium window appears next to it—the agent's playground that you can also interact with.
All of these are run as python manage.py <command>:
bootstrap — install backend and UI dependencies without starting anything.build — ensure the UI bundle is up to date (skips work when sources are unchanged).launch — bootstrap, build if needed, then start Uvicorn (what we used above).dev — like launch, but leaves Uvicorn in --reload mode for backend development.rebuild-ui — force a fresh pnpm run build.clean — remove the virtualenv, cached install stamps, and the UI build directory.clean-profile — wipe the persisted Chromium profile if you want a fresh browser state.Under the hood the script drops timestamp files (.pip-installed, .pnpm-installed, .build-stamp) so it can tell when a reinstall or rebuild is truly necessary.
Prefer running things by hand? Activate .venv, install requirements, run pnpm install once, build the UI with pnpm run build, and then launch with:
uvicorn app.main:app --reload
The UI code lives under ui/.
cd ui
pnpm install # once
pnpm run dev # Svelte dev server at http://localhost:5173
Leave the FastAPI process running so the dev UI can reach the /ws WebSocket. When you're ready for a production bundle, run pnpm run build (the output lands in ui/build/ and is what the Python app serves).
Handy scripts:
pnpm run guard:derived — catches invalid $derived usage in Svelte reactivity.pnpm run check — guard plus svelte-check.pnpm run lint / pnpm run format — ESLint and Prettier with the project presets.You can still click or type directly in the Chromium window at any time. Playwright and Gemini will adapt.
Everything is driven by environment variables; sensible defaults are baked in if you leave them unset.
GEMINI_BROWSER_WIDTH=1440
GEMINI_BROWSER_HEIGHT=900
GEMINI_BROWSER_PROFILE=~/.gemini-browser/profile
GEMINI_BROWSER_DOWNLOADS=~/Downloads/gemini-browser
GEMINI_ACTION_DELAY=2.0 # default settle delay after actions
GEMINI_NAVIGATION_DELAY=2.0 # settle delay after page navigation
GEMINI_SCROLL_PAUSE=0.5 # wait after wheel events
GEMINI_DRAG_PREHOLD_DELAY=0.1 # pause before dragging
GEMINI_DRAG_POSTHOLD_DELAY=0.1 # pause after dropping
GEMINI_WAIT_ACTION_DURATION=5.0 # how long the wait action sleeps
GEMINI_MODEL_NAME=gemini-2.5-computer-use-preview-10-2025
GEMINI_TURN_LIMIT=1000
# Safety filters:
# GEMINI_DISABLE_SAFETY=true (ask for filters off)
# GEMINI_ENABLE_SAFETY=true (force filters on)
If you request the filters to be disabled and the API refuses, the code automatically retries with defaults and logs the downgrade.
app/main.py — FastAPI application serving the UI and WebSocket endpoint.app/browser.py — launches and manages the shared Playwright browser context.app/actions.py plus app/builtin_actions/* — the catalog of Playwright actions Gemini can call.app/agent.py and app/session_worker.py — keep per-session state, confirmations, and Gemini turns.app/utils.py, app/downloads.py, app/history.py — helpers for screenshots, downloads, and logging.ui/ — Svelte workspace (source, scripts, static config, and build output).Gemini Computer Use is still a preview feature. Keep someone watching the screen and never hand it credentials you wouldn't paste into a chat window yourself.
~/.gemini-browser/profile. Use that isolated space or point the env var at another throwaway directory.~/Downloads/gemini-browser unless you override it.python manage.py clean-profile), and start fresh.playwright doctor, or reinstall with playwright install chromium (add --with-deps on Linux if system libraries are missing).GEMINI_API_KEY is set and that the key has access to the preview model./healthz, then look at your browser console for errors.python manage.py rebuild-ui.python manage.py clean-profile.uvicorn app.main:app --log-level debug.If things really fall over, python manage.py clean followed by python manage.py launch gives you a clean slate.
This codebase is meant for tinkering, demos, and learning how Gemini's computer-use preview behaves with a real browser. Contributions and bug reports are welcome. Just keep the human-in-the-loop mindset: supervise what the model is doing, review downloads, and never run it unattended on sensitive accounts.