A Node.js/TypeScript port of Anthropic's official Python computer-use demo. This implementation provides a complete TypeScript version of Claude's computer control capabilities, allowing Claude to interact with your computer through mouse movements, keyboard input, and screen captures.
This project converts Anthropic's Python implementation to TypeScript while maintaining all the core functionalities and adding some TypeScript-specific enhancements. It enables Claude to:
- Control your computer's mouse and keyboard
- Capture and analyze screenshots
- Manage windows and applications
- Execute system commands
Perfect for developers who prefer Node.js/TypeScript or want to integrate Claude's computer control capabilities into TypeScript projects.
-
π±οΈ Mouse Control
- Movement and clicks
- Dragging and scrolling
- Position tracking
- Multiple button support
-
β¨οΈ Keyboard Actions
- Key press and release
- Text typing
- Modifier key combinations
- Multiple key sequences
-
πͺ Window Management
- Focus control
- Move and resize
- Minimize/maximize
- Cross-platform support
-
πΈ Screen Capture
- High-quality screenshots
- Automatic compression
- Organized storage
- Metadata tracking
Create a .env file in the root directory:
# Required ANTHROPIC_API_KEY=sk-ant-xxxx # Your Anthropic API key
# Install dependencies pnpm install # Build the project pnpm run build # Run example pnpm run test:basic
import { ComputerTool } from './src/tools/computer'; const tool = new ComputerTool(); // Move mouse await tool.execute({ action: 'mouse_move', coordinate: [100, 100] }); // Type text await tool.execute({ action: 'type', text: 'Hello, World!' }); // Take screenshot await tool.execute({ action: 'screenshot' });
// Mouse scroll with direction await tool.execute({ action: 'mouse_scroll', scrollAmount: 5, direction: 'down' }); // Key combination await tool.execute({ action: 'key', text: 'Control+C' }); // Window management await tool.execute({ action: 'focus_window', windowTitle: 'Chrome' });
mouse_move: Move cursor to coordinatesleft_click,right_click,middle_click: Mouse clicksleft_click_drag: Click and dragmouse_scroll: Scroll in any directionmouse_toggle: Press/release mouse buttons
key: Single key or combination presstype: Type text stringkey_toggle: Press/release keyskey_tap_multiple: Repeat key taps
focus_window: Activate windowmove_window: Change window positionresize_window: Adjust window sizeminimize_window,maximize_window: Window state
screenshot: Capture screencursor_position: Get current cursor location
Screenshots are automatically organized:
screenshots/
βββ metadata.json
βββ YYYY/MM/DD/
βββ screenshot-{timestamp}-original.png
βββ screenshot-{timestamp}-compressed.[png|jpg]
Key settings can be modified in constants:
const TIMING = { TYPING_DELAY_MS: 12, SCREENSHOT_DELAY_MS: 2000, RETRY_DELAY_MS: 500 }; const MAX_IMAGE_SIZE = 5 * 1024 * 1024; // 5MB
- Node.js (v16+)
- TypeScript
- Dependencies:
- robotjs
- screenshot-desktop
- sharp
- Relevant system libraries
sudo apt-get install -y \ libxtst-dev \ libpng-dev \ libxss-dev \ xvfb
brew install opencv@4 brew install cairo pango
- Requires windows-build-tools:
npm install --global windows-build-tools
MIT
- Fork the repo
- Create feature branch
- Commit changes
- Push to branch
- Create Pull Request