Codex/update readme experiments#94

Open

fishsure wants to merge 2 commits into

main from

codex/update-readme-experiments

Open

Codex/update readme experiments #94
fishsure wants to merge 2 commits into
main from
codex/update-readme-experiments

Conversation

@fishsure

@fishsure fishsure commented May 26, 2026

Copy link

Copy Markdown

Collaborator

No description provided.

fishsure added 2 commits

May 26, 2026 11:49

@fishsure


 Add StepPO recipes and dataset docs

6ede22b

@fishsure


 docs: update Agent-R1 README and experiments

0e3cd5e

gemini-code-assist[bot]

gemini-code-assist Bot reviewed

May 26, 2026

View reviewed changes

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link

Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands the Agent-R1 framework by adding support for broader algorithms and benchmarks, including StepPO, RLOO, REINFORCE++ Baseline, and GiGPO. It introduces complete task recipes, data preparation scripts, and text-based environments for ALFWorld, HotpotQA, Paper Search, and WebShop. Additionally, the core PPO trainer and advantage estimators have been refactored to support these multi-step agent tasks. Feedback on the changes highlights a potential TypeError in the _to_hashable helper function within core_algos.py when encountering None values, suggesting a safer type check to handle them as hashable scalars.

agent_r1/trainer/ppo/core_algos.py

def _to_hashable(value):

"""Convert common observation objects to hashable keys for GiGPO grouping."""

if isinstance(value, (int, float, str, bool)):

@gemini-code-assist gemini-code-assist Bot May 26, 2026

Copy link

Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The _to_hashable function does not handle None values, which will cause a TypeError if any observation field is None. It is safer to allow None as a hashable scalar value.

Suggested change

if isinstance(value, (int, float, str, bool)):

if valueisNoneorisinstance(value, (int, float, str, bool)):

References

Be careful not to confuse variables with similar names or purposes. Verify the type and origin of a variable before assuming its structure (e.g., list vs. scalar).

@0russwest0 0russwest0 force-pushed the main branch from 6be3ae1 to 898c36d Compare

May 28, 2026 09:42

Labels

None yet

1 participant

@fishsure

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Codex/update readme experiments#94

Codex/update readme experiments #94
fishsure wants to merge 2 commits into
main from
codex/update-readme-experiments

Conversation

@fishsure fishsure commented May 26, 2026

Uh oh!

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

@gemini-code-assist gemini-code-assist Bot May 26, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant