Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Add RL examples #463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
foksly wants to merge 4 commits into master
base: master
Choose a base branch
Loading
from hive-rl
Open

Add RL examples #463

foksly wants to merge 4 commits into master from hive-rl

Conversation

Copy link
Collaborator

@foksly foksly commented Mar 18, 2022
edited
Loading

Current plan:

TODO:

  • make a PPO run with more than 1 peer
  • run baseline PPO on Atari games (adapt from here)
  • run hivemind optimizer with target batch size large enough to average every ~30 seconds

Later:

  • find out what caused the problem with use_local_updates + cuda
  • figure out how to use learning rate schedule (e.g. disable the default one and make use of hivemind.Optimizer(scheduler=...))

justheuristic reacted with thumbs up emoji
@foksly foksly changed the title (削除) Create RL (削除ここまで) (追記) RL examples (追記ここまで) Mar 18, 2022
Copy link

codecov bot commented Mar 18, 2022
edited
Loading

Codecov Report

Merging #463 (1eb7522) into master (712e428) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@ Coverage Diff @@
## master #463 +/- ##
==========================================
- Coverage 85.80% 85.75% -0.06% 
==========================================
 Files 80 80 
 Lines 7794 7794 
==========================================
- Hits 6688 6684 -4 
- Misses 1106 1110 +4 
Impacted Files Coverage Δ
hivemind/averaging/matchmaking.py 87.50% <0.00%> (-1.79%) ⬇️
hivemind/dht/node.py 91.44% <0.00%> (-0.24%) ⬇️
hivemind/utils/asyncio.py 100.00% <0.00%> (+0.86%) ⬆️
hivemind/dht/dht.py 91.51% <0.00%> (+1.21%) ⬆️

@borzunov borzunov changed the title (削除) RL examples (削除ここまで) (追記) Add RL examples (追記ここまで) Mar 19, 2022
Copy link
Member

@foksly gentle reminder: do you still have time for the PR?

Copy link
Member

justheuristic commented Jun 20, 2022
edited
Loading

Great job!
Initial request: please trigger auto-merge (sync with master) and apply linters

black .
isort .

return exp_name


class AdamWithClipping(torch.optim.Adam):
Copy link
Member

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to @mryab : we've recently merged the same clipping functionality here:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/moe/server/layers/optim.py#L48

Would you prefer if we...

  • keep everything as is, accept some code duplication?
  • extract moe.server.layers.optim to utils.optim and use it here?
  • keep wrapper in hivemind.optim and import from there?
  • insert your option here :)

Copy link
Member

@mryab mryab Jun 20, 2022
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50:50 between the "keep here, accept duplication" and "move OptimizerWrapper and ClippingWrapper to hivemind.optim.wrapper" solutions, so ultimately, it's @foksly's call

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utils option is also acceptable, but I'm slightly against this folder becoming too bloated. That said, it looks like a reasonable place to put such code, so any solution of these three is fine by me (as long as you don't import the wrapper from hivemind.moe)

@@ -0,0 +1,45 @@
# Training PPO with decentralized averaging

This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.
Copy link
Member

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i believe PPO is on-policy

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the same belief :)

Copy link
Member

[just in case] feel free to ping me if you need any help with black / isort

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

@justheuristic justheuristic justheuristic left review comments

@mryab mryab mryab left review comments

At least 1 approving review is required to merge this pull request.

Assignees

No one assigned

Labels

None yet

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

AltStyle によって変換されたページ (->オリジナル) /