Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

Notifications You must be signed in to change notification settings

pmbstyle/gemini-computer-use

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

6 Commits

Repository files navigation

Gemini Computer Use Agent

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

For browser automation without a sandbox, use this project https://github.com/pmbstyle/gemini-browser-agent

image

Features

  • Visual Browser Control: Uses screenshots to "see" and interact with web pages
  • Automated Actions: Supports mouse clicks, keyboard input, scrolling, navigation, and more
  • Safety Controls: Built-in confirmation prompts for risky actions
  • Human-in-the-Loop: Optional user confirmation for sensitive operations

Supported Actions

  • open_web_browser, navigate, search
  • click_at, hover_at, type_text_at
  • key_combination, scroll_document, scroll_at
  • drag_and_drop, go_back, go_forward
  • wait_5_seconds

Setup

1. Create and activate environment

conda create -n gemcu python=3.11 -y
conda activate gemcu

2. Install packages

python -m pip install --upgrade pip
python -m pip install google-genai playwright termcolor

3. Install Playwright browser

playwright install chromium

4. Set API key

# Windows PowerShell
$env:GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"
# Linux/Mac
export GEMINI_API_KEY="PASTE_YOUR_KEY_HERE"

Usage

python agent.py "Find Wikipedia article about Niagara Falls and open History section"

Requirements

  • Python 3.11+
  • Gemini API key (Get API key)
  • Chrome/Chromium browser

Safety

This agent runs in a controlled browser environment. For production use, consider running in a sandboxed virtual machine or container for additional security.

Based on Google's Gemini Computer Use API.

About

A minimal browser automation agent using Google's Gemini 2.5 Computer Use Preview model and Playwright for web browser control.

Topics

Resources

Stars

Watchers

Forks

Languages

AltStyle によって変換されたページ (->オリジナル) /