-
Notifications
You must be signed in to change notification settings - Fork 101
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly expands the Agent-R1 framework by adding support for broader algorithms and benchmarks, including StepPO, RLOO, REINFORCE++ Baseline, and GiGPO. It introduces complete task recipes, data preparation scripts, and text-based environments for ALFWorld, HotpotQA, Paper Search, and WebShop. Additionally, the core PPO trainer and advantage estimators have been refactored to support these multi-step agent tasks. Feedback on the changes highlights a potential TypeError in the _to_hashable helper function within core_algos.py when encountering None values, suggesting a safer type check to handle them as hashable scalars.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The _to_hashable function does not handle None values, which will cause a TypeError if any observation field is None. It is safer to allow None as a hashable scalar value.
References
- Be careful not to confuse variables with similar names or purposes. Verify the type and origin of a variable before assuming its structure (e.g., list vs. scalar).
6be3ae1 to
898c36d
Compare
No description provided.