A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.
For browser automation without a sandbox, use this project https://github.com/pmbstyle/gemini-browser-agent
- Visual Browser Control: Uses screenshots to "see" and interact with web pages
- Automated Actions: Supports mouse clicks, keyboard input, scrolling, navigation, and more
- Safety Controls: Built-in confirmation prompts for risky actions
- Human-in-the-Loop: Optional user confirmation for sensitive operations
open_web_browser,navigate,searchclick_at,hover_at,type_text_atkey_combination,scroll_document,scroll_atdrag_and_drop,go_back,go_forwardwait_5_seconds
conda create -n gemcu python=3.11 -y conda activate gemcu
python -m pip install --upgrade pip python -m pip install google-genai playwright termcolor
playwright install chromium
# Windows PowerShell $env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE" # Linux/Mac export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
python agent.py "Find Wikipedia article about Niagara Falls and open History section"- Python 3.11+
- Gemini API key (Get API key)
- Chrome/Chromium browser
This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.
Based on Google's Gemini Computer Use API.