feat: add coordinate space abstraction for open weights LLM support#282

Open

philipph-askui wants to merge 10 commits into

main from

feat/llm_coordinate_system

Open

feat: add coordinate space abstraction for open weights LLM support #282
philipph-askui wants to merge 10 commits into
main from
feat/llm_coordinate_system

Conversation

@philipph-askui

@philipph-askui philipph-askui commented Jun 9, 2026

Copy link

Copy Markdown

Contributor

Introduces VlmCoordinateSpace strategy (pixel, scaled, normalized) so agentOS facades can map model-emitted coordinates to screen pixels.
Thereby adds auto-detection for Qwen, Holo (0-1000 grid) and Kimi (0.0-1.0 floats) in OllamaVlmProvider.
Appends coordinate info to system prompts for OpenAI-compatible providers.

philipph-askui added 7 commits

June 9, 2026 16:37

@philipph-askui


 feat: add coordinate space abstraction for open weights LLM support

8f496af

@philipph-askui


 fix: map non-pixel coordinate spaces directly to device resolution

31865a8

@philipph-askui


 feat: provider-owned image scaling with configurable max_image_edge

395d00d

Replace fixed SCREENSHOT_RESOLUTION constant with per-provider image
scalers. Each VlmProvider now owns an ImageScaler callable and exposes
max_image_edge (also via ASKUI_VLM_MAX_IMAGE_EDGE env var). Facades
derive target_resolution dynamically from scaler output.

@philipph-askui


 refactor: clean up PR (composition, deduplication, exports)

3665cc4

@philipph-askui


 chore: fine-tune settings for MAX_IMAGE_EDGE

cca155f

@philipph-askui


 Merge remote-tracking branch 'origin/main' into feat/llm_coordinate_s...

184ac9c

...ystem

@philipph-askui


 fix: outdated cos test for kimi

d6415eb

@philipph-askui philipph-askui marked this pull request as ready for review

June 12, 2026 13:39

@philipph-askui philipph-askui requested review from mlikasam-askui and programminx-askui

June 12, 2026 13:39

programminx-askui

programminx-askui reviewed

Jun 18, 2026

View reviewed changes

@programminx-askui programminx-askui left a comment

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,

I've check the PR. Overall nice. We should introduce a ClickableCapability/ClickableTarget.

src/askui/model_providers/anthropic_vlm_provider.py Outdated

Comment on lines +81 to +82

image_scaler: ImageScaler | None = None,

max_image_edge: int | None = None,

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything what is images related should start with image_. Similare like model_. Otherwise it's hard to understand the zusammenhänge

Suggested change

image_scaler: ImageScaler | None = None,

max_image_edge: int | None = None,

image_scaler: ImageScaler | None = None,

image_edge_max: int | None = None,

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but is this not a property of the image scaler?

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes. This value is only used if users do not specify an ImageScaler explicitly but the default one is used.

src/askui/model_providers/anthropic_vlm_provider.py Outdated

if self._image_scaler_override is not None:

return self._image_scaler_override

max_edge = self._max_edge

return lambda image: compute_patch_optimized_image(image, max_edge=max_edge)

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we sure, that we can use the lambda? I had bad experience with lambda in python, that they caach the computation. But can't remember

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored this to that we now have a proper ImageScaler base class with subclasses that are used here instead.

src/askui/model_providers/askui_vlm_provider.py Outdated

Comment on lines +74 to +92

self._image_scaler_override = image_scaler

self._max_edge = (

max_image_edge

or int(os.environ.get("ASKUI_VLM_MAX_IMAGE_EDGE", "0"))

or _DEFAULT_MAX_IMAGE_EDGE

)

@property

@override

def model_id(self) -> str:

return self._model_id_value

@property

@override

def image_scaler(self) -> ImageScaler:

if self._image_scaler_override is not None:

return self._image_scaler_override

max_edge = self._max_edge

return lambda image: compute_patch_optimized_image(image, max_edge=max_edge)

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why so complicated?

Suggested change

self._image_scaler_override = image_scaler

self._max_edge = (

max_image_edge

or int(os.environ.get("ASKUI_VLM_MAX_IMAGE_EDGE", "0"))

or _DEFAULT_MAX_IMAGE_EDGE

)

@property

@override

def model_id(self) -> str:

return self._model_id_value

@property

@override

def image_scaler(self) -> ImageScaler:

if self._image_scaler_override is not None:

return self._image_scaler_override

max_edge = self._max_edge

return lambda image: compute_patch_optimized_image(image, max_edge=max_edge)

self._image_scaler = image_scaler if image_scaler else lambda image: compute_patch_optimized_image(image, max_edge=max_edge)

self._max_edge = (

max_image_edge

or int(os.environ.get("ASKUI_VLM_MAX_IMAGE_EDGE", "0"))

or _DEFAULT_MAX_IMAGE_EDGE

)

@property

@override

def model_id(self) -> str:

return self._model_id_value

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I refactored it to use a proper imagescaler class i/o lambdas

src/askui/model_providers/askui_vlm_provider.py Outdated

Comment on lines +75 to +79

self._max_edge = (

max_image_edge

or int(os.environ.get("ASKUI_VLM_MAX_IMAGE_EDGE", "0"))

or _DEFAULT_MAX_IMAGE_EDGE

)

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove it if possible

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean the env variable?

src/askui/tools/playwright/agent_os_facade.py

Comment on lines +44 to 46

self._scaler.real_screen_resolution = self._agent_os.screenshot(

report=False

).size

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The screen resolution can change over time

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this value will be update with every new screenshot

src/askui/tools/playwright/agent_os_facade.py Outdated

self._scaler = CoordinateScaler(

coordinate_space=coordinate_space,

image_scaler=image_scaler,

fetch_real_resolution=lambda: self._agent_os.screenshot(report=False).size,

@programminx-askui programminx-askui Jun 17, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of lambda in python. What is the reason to use them?

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed the lambdas

src/askui/tools/android/agent_os_facade.py

self._agent_os: AndroidAgentOs = agent_os

self._target_resolution: Tuple[int, int] = (1024, 768)

self._real_screen_resolution: Optional[Tuple[int, int]] = None

self._scaler = CoordinateScaler(

@programminx-askui programminx-askui Jun 18, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a note.

The "CoordinateScaler" should be on the DisplayCapability/DisplayTarget/WindowTarget/WindowCapability.

But here we assume, that we have only one ClickableTarget.

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at the moment it does not matter after all what target we are scaling for. The llm sees a screenshot and predicts a position on it. All of what we are doing is scaling the predicted coordinates to the correct absolute image coordinates of the screenshot we got

src/askui/tools/coordinate_scaler.py

self.target_resolution = scaled.size

return scaled

def scale_coordinates(

@programminx-askui programminx-askui Jun 18, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need this function? Should this not replaced from the ScaleCoordinate Layer?

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the scaleCoordinate Layer :-D or what do you mean?

src/askui/utils/llm_image_utils.py

Comment on lines +71 to +80

# Binary search for largest scale that fits within token budget

lo, hi = 0.0, scale

for _ in range(50):

mid = (lo + hi) / 2

w = max(1, int(width * mid))

h = max(1, int(height * mid))

if count_image_tokens(w, h, patch_size) <= max_tokens:

lo = mid

else:

hi = mid

@programminx-askui programminx-askui Jun 18, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need this? can we move this to a own function?

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code was copied directly from an example from anthropic, so I would keep it here as is

tests/unit/tools/test_agent_os_facade_coordinates.py

PixelCoordinateSpace,

ScaledCoordinateSpace,

)

from askui.tools.android.agent_os_facade import AndroidAgentOsFacade

@programminx-askui programminx-askui Jun 18, 2026

Copy link

Copy Markdown

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you checked the tests? I would add here some dynamic tests. with different resolutions and negative tests

@philipph-askui philipph-askui Jun 18, 2026

Copy link

Copy Markdown

Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

philipph-askui added 3 commits

June 18, 2026 14:22

@philipph-askui


 feat: integrate features from PR #283 (tool result image scaling)

fabd05d

@philipph-askui


 address review remarks

d3bde3f

@philipph-askui


 chore: rename coordinateScaler file

26cf7c7

@philipph-askui philipph-askui mentioned this pull request

Jun 18, 2026

Feat/tool result image scaling #283

Closed

Labels

None yet

2 participants

@philipph-askui @programminx-askui

Conversation

@philipph-askui philipph-askui commented Jun 9, 2026

Uh oh!

@programminx-askui programminx-askui left a comment

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 17, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@programminx-askui programminx-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

@philipph-askui philipph-askui Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants