Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Parameter Averaging in Reinforcement Learning (PARL). Inspired by OpenAI requests for research 2.0

Notifications You must be signed in to change notification settings

FibonacciDude/PaRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

History

16 Commits

Repository files navigation

Research project

Inspired by OpenAI’s request for research 2.0, I decided to explore the effect of averaging the parameters of multiple parallel workers in reinforcement learning. To test this, I coded a PPO implementation (with elements from John Schulman’s code and OpenAI’s SpinningUp code) that instead of averaging the gradients at each step, took multiple steps and averaged the parameters of the models in different reinforcement learning environments.

The model converged faster to optimal behavior in the environment in reward per communication, but equally as the average gradient (instead of parameter) model in reward per step. It reduced the communication of parallel workers while still keeping the same performance as the baseline (from my analysis).

Link to request for research 2.0: https://openai.com/index/requests-for-research-2/

About

Parameter Averaging in Reinforcement Learning (PARL). Inspired by OpenAI requests for research 2.0

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

AltStyle によって変換されたページ (->オリジナル) /