pmbstyle/gemini-computer-use

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
agent.py		agent.py

Repository files navigation

Gemini Computer Use Agent

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

For browser automation without a sandbox, use this project https://github.com/pmbstyle/gemini-browser-agent

image

Features

Visual Browser Control: Uses screenshots to "see" and interact with web pages
Automated Actions: Supports mouse clicks, keyboard input, scrolling, navigation, and more
Safety Controls: Built-in confirmation prompts for risky actions
Human-in-the-Loop: Optional user confirmation for sensitive operations

Supported Actions

open_web_browser, navigate, search
click_at, hover_at, type_text_at
key_combination, scroll_document, scroll_at
drag_and_drop, go_back, go_forward
wait_5_seconds

Setup

1. Create and activate environment

conda create -n gemcu python=3.11 -y
conda activate gemcu

2. Install packages

python -m pip install --upgrade pip
python -m pip install google-genai playwright termcolor

3. Install Playwright browser

playwright install chromium

4. Set API key

# Windows PowerShell
$env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
# Linux/Mac
export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"

Usage

python agent.py "Find Wikipedia article about Niagara Falls and open History section"

Requirements

Python 3.11+
Gemini API key (Get API key)
Chrome/Chromium browser

Safety

This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.

Based on Google's Gemini Computer Use API.

About

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

Languages

Python 100.0%

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!