The automation layer between AI agents and iOS
Mobile browser and native app automation via iOS Simulator. Built for AI agents — start with coordinates, escalate to vision only when needed.
ATL marks on Target.com Numbered marks give you coordinates — vision API only when stuck
markElements → getMarkInfo → tap x,y → screenshot if stuck
openApp → snapshot → find/tapRef → screenshot if stuck
Tiered automation:
- Coordinates first — marks (browser) or refs (native) give you x,y without vision calls (90% of actions)
- Vision fallback — screenshot when stuck to see modals/blockers
- JS injection — direct DOM manipulation as last resort (browser only)
git clone https://github.com/JordanCoin/Atl.git cd Atl/core ./bin/atl-sim start # API ready at http://localhost:9222
Swift CLI for agent-friendly automation:
cd core/cli && swift build -c release .build/release/atl ping # Check server .build/release/atl goto https://target.com # Navigate .build/release/atl wait --for-selector ".product-grid" # Wait for element (SPA-safe) .build/release/atl snapshot # Get marks + text .build/release/atl click "Add to cart" # Click by text .build/release/atl pdf -o cart.pdf # Capture PDF
Wait strategies (for SPAs like Target, Amazon):
--for-selector— Wait for CSS selector (most reliable)--for-text— Wait for text to appear--network 500— Wait for network idle (500ms quiet)- Default: DOM stability (500ms no mutations)
# Navigate curl -X POST localhost:9222/command -d '{"method":"goto","params":{"url":"https://target.com"}}' # Mark all interactive elements curl -X POST localhost:9222/command -d '{"method":"markAll"}' # Get coordinates for element #26 curl -X POST localhost:9222/command -d '{"method":"getMarkInfo","params":{"label":26}}' # → {"x":43, "y":539, "text":"Add to cart"} # Tap at exact coordinates curl -X POST localhost:9222/command -d '{"method":"tap","params":{"x":214,"y":561}}'
# Open Settings app curl -X POST localhost:9222/command -d '{"method":"openApp","params":{"bundleId":"com.apple.Preferences"}}' # Snapshot accessibility tree (get refs for all elements) curl -X POST localhost:9222/command -d '{"method":"snapshot","params":{"interactiveOnly":true}}' # Find and tap Wi-Fi curl -X POST localhost:9222/command -d '{"method":"find","params":{"text":"Wi-Fi","action":"tap"}}' # Switch back to browser curl -X POST localhost:9222/command -d '{"method":"openBrowser"}'
openclaw skills install ./core/skill
Full skill documentation → includes:
- Vision-free automation workflow
- Escalation ladder (coordinates → vision → JS)
- Touch gestures (tap, swipe, pinch)
- Helper bash functions
- Best practices & troubleshooting
See BROWSER-AUTOMATION.md for quick-start guide.
- Full README - Complete API reference
- OpenClaw Skill - AI agent integration
- OpenAPI Spec - Machine-readable API
- macOS with Xcode (for iOS Simulator)
- That's it!
MIT - see LICENSE