[WIP] Browser backends#314

Draft

ollmer wants to merge 61 commits into

main from

browser_backends

Draft

[WIP] Browser backends #314
ollmer wants to merge 61 commits into
main from
browser_backends

Conversation

@ollmer

@ollmer ollmer commented Oct 31, 2025 •

edited by korbit-ai Bot

Loading

Copy link

Copy Markdown

Contributor

Separation of the miniwob task and underlying browser backend
MCP-Playwright as browser

Description by Korbit AI

What change is being made?

Introduce a new Browser/MCP-based backend for MiniWob benchmarks, along with environment, task, and benchmark scaffolding, plus a basic MCP Playwright integration and a small test script to exercise the backend.

Why are these changes being made?

Add browser-based task execution support to run MiniWob tasks via MCP Playwright, enabling end-to-end interaction with web tasks through a modular backend and benchmark framework. This lays the groundwork for browser-backed experiments and testing of browser interactions in a structured, extensible way.

Is this description stale? Ask me to generate a new description by commenting /korbit-generate-pr-description

@ollmer


 miniwob with mcp browser backend, first draft

5ceeb60

@korbit-ai

korbit-ai Bot commented Oct 31, 2025

Copy link

Copy Markdown

Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment /korbit-review.

Your admin can change your review schedule in the Korbit Console

ollmer added 28 commits

October 31, 2025 19:18

@ollmer


 actions whitelist, fixes, support new order of the agent env creation...

450dacf

... in the loop

@ollmer


 miniwob config

2e2b8a6

@ollmer


 llm config

630569a

@ollmer


 fixes, use firefox

8be56ce

@ollmer


 plan_react agent with function calling and sonnet llm

9acd97d

@ollmer


 fixes

cfc85c6

@ollmer


 fix done state parsing

f278c0f

@ollmer


 fixes

4e27c3a

@ollmer


 refactor loop step_info

d1953d2

@ollmer


 return page snapshot to mcp playwright results

5656d0b

@ollmer


 fix loop

b06c4e2

@ollmer


 vision support

f5ad036

@ollmer


 fix agent_info as dict

a827344

@ollmer


 remove tapeagents dep from backends core, fixes

4117e0a

@ollmer


 python playwright backend draft

a3fa1c9

@ollmer


 fixes

955e0d3

@ollmer


 remove tapeagents dep, add task-level obs postprocess

61a537f

@ollmer

fix

b82aef0

@ollmer


 fix action space

645ee2d

@ollmer


 playwright backend

02dee09

@ollmer


 fix obs format

f591f36

@ollmer


 simplest react agent with markdown observations, images and tool calls

01e0719

@ollmer


 fix mcp close

dba5978

@ollmer


 async playwright backend

ecf59d5

@ollmer


 fixes

d42dfd7

@ollmer


 format

55da7cf

@ollmer


 fix pw actions

1f090c2

@ollmer


 fix tapeagent

f2c480a

ollmer added 22 commits

November 25, 2025 13:48

@ollmer


 pass backend cls, instantiate backend in task

6664b69

@ollmer


 get html from playwright mcp

ffebf6b

@ollmer


 better abstract class

3378b56

@ollmer


 init files

323978d

@ollmer


 add base benchmark class to study

e2cd4b9

@ollmer


 move action and tool classes to actions module

20502a8

@ollmer


 improve entrypoint

dfbc005

@ollmer


 new react toolcall agent, inspired by tapeagents but independent

7a682a0

@ollmer


 few comments

29ba1c4

@ollmer


 simplify history format

d9c9216

@ollmer

fix

cb6d213

@ollmer

fix

cc23893

@ollmer


 simpler tool call object

b8e5c3a

@ollmer


 format

3d88daf

@ollmer


 history compaction

768d37c

@ollmer


 tool schemas in the action module

e28eb0f

@ollmer


 better task interface, support old bgym tasks in the new env

f10615f

@ollmer


 support new tasks interface

a203e46

@ollmer


 async playwright backend

362de79

@ollmer

fix

4fe4e48

@ollmer


 universal rendering of any dict observation that contains only texts ...

e6f1f5d

...and images

@ollmer

fix

e7aa807

@ollmer ollmer force-pushed the browser_backends branch from 6a1caf2 to e7aa807 Compare

November 26, 2025 16:35

ollmer added 7 commits

November 26, 2025 16:38

@ollmer


 remove tape agent

212c0f4

@ollmer


 revert tapeagent changes

8be1174

@ollmer

fix

cdd9b54

@ollmer


 html pruning

1befd83

@ollmer


 max obs size limit, function to prepare pair of turn data for rl trai...

462038e

...ning

@ollmer


 workarena bench, reuse bgym task inside

cf68ef6

@ollmer


 fixes

805c717

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Browser backends#314

[WIP] Browser backends #314
ollmer wants to merge 61 commits into
main from
browser_backends

Conversation

@ollmer ollmer commented Oct 31, 2025 •

edited by korbit-ai Bot

Loading

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai Bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

@ollmer ollmer commented Oct 31, 2025 • edited by korbit-ai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description by Korbit AI

What change is being made?

Why are these changes being made?

Uh oh!

korbit-ai Bot commented Oct 31, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

@ollmer ollmer commented Oct 31, 2025 •

edited by korbit-ai Bot

Loading