"My AI Assistant Could Code, But It Couldn't Operate My Desktop"

DEV Community

Instead of guessing pixels, the assistant can ask the operating system for the UI tree:

list windows -> focus app -> find input -> set value -> send Enter -> read text

That means it can find a textbox by control type, set its value through the accessibility API, invoke a button, read visible text, and only fall back to screenshots when the app does not expose useful accessibility metadata.

This is the bridge I wanted: a coding assistant that can work in repos, but also operate the desktop applications that surround the repo.

Where This Is Going

The current shape is:

CliGate routes AI coding tools through one local server.
Runtime sessions keep Codex and Claude Code work alive.
The assistant watches, coordinates, and summarizes.
Skills give it reusable procedures.
Desktop control gives it a path into native apps and GUI workflows.

That combination changes the product from "proxy for AI tools" into "local operator for developer workflows."

I think the desktop-control layer deserves its own post, because "AI can operate any app through the OS accessibility tree" is a deeper topic than I can fit here.

The project is open source here: CliGate on GitHub

How are you handling the boundary between coding agents and the desktop apps they still need to interact with?