Add RL examples #463

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

foksly wants to merge 4 commits into master

from hive-rl

Open

Add RL examples #463

foksly wants to merge 4 commits into master from hive-rl

Conversation

foksly

Copy link

Collaborator

@foksly foksly commented Mar 18, 2022 •

edited

Loading

Current plan:

working with https://stable-baselines3.readthedocs.io
trying to make a minimalistic example that uses hivemind.Optimizer

TODO:

make a PPO run with more than 1 peer
run baseline PPO on Atari games (adapt from here)
run hivemind optimizer with target batch size large enough to average every ~30 seconds

Later:

find out what caused the problem with use_local_updates + cuda
figure out how to use learning rate schedule (e.g. disable the default one and make use of hivemind.Optimizer(scheduler=...))

@foksly


 Create RL

30122a9

@foksly foksly changed the title ~~(削除) Create RL (削除ここまで)~~ (追記) RL examples (追記ここまで)

Mar 18, 2022

@codecov

Copy link

codecov bot commented Mar 18, 2022 •

edited

Loading

Codecov Report

Merging #463 (1eb7522) into master (712e428) will decrease coverage by 0.05%.
The diff coverage is n/a.

@@ Coverage Diff @@
## master #463 +/- ##
==========================================
- Coverage 85.80% 85.75% -0.06% 
==========================================
 Files 80 80 
 Lines 7794 7794 
==========================================
- Hits 6688 6684 -4 
- Misses 1106 1110 +4

Impacted Files	Coverage Δ
hivemind/averaging/matchmaking.py	`87.50% <0.00%> (-1.79%)`	⬇️
hivemind/dht/node.py	`91.44% <0.00%> (-0.24%)`	⬇️
hivemind/utils/asyncio.py	`100.00% <0.00%> (+0.86%)`	⬆️
hivemind/dht/dht.py	`91.51% <0.00%> (+1.21%)`	⬆️

@borzunov borzunov changed the title ~~(削除) RL examples (削除ここまで)~~ (追記) Add RL examples (追記ここまで)

Mar 19, 2022

@justheuristic

Copy link

Member

justheuristic commented Jun 9, 2022

@foksly gentle reminder: do you still have time for the PR?

@foksly


 add ppo example

2f42352

@foksly foksly requested a review from justheuristic

June 20, 2022 10:28

@justheuristic

Copy link

Member

justheuristic commented Jun 20, 2022 •

edited

Loading

Great job!
Initial request: please trigger auto-merge (sync with master) and apply linters

black .
isort .

justheuristic

justheuristic reviewed

Jun 20, 2022

View reviewed changes

examples/ppo/ppo.py

return exp_name

class AdamWithClipping(torch.optim.Adam):

Copy link

Member

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note to @mryab : we've recently merged the same clipping functionality here:
https://github.com/learning-at-home/hivemind/blob/master/hivemind/moe/server/layers/optim.py#L48

Would you prefer if we...

keep everything as is, accept some code duplication?
extract moe.server.layers.optim to utils.optim and use it here?
keep wrapper in hivemind.optim and import from there?
insert your option here :)

Copy link

Member

@mryab mryab Jun 20, 2022 •

edited

Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50:50 between the "keep here, accept duplication" and "move OptimizerWrapper and ClippingWrapper to hivemind.optim.wrapper" solutions, so ultimately, it's @foksly's call

Copy link

Member

@mryab mryab Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The utils option is also acceptable, but I'm slightly against this folder becoming too bloated. That said, it looks like a reasonable place to put such code, so any solution of these three is fine by me (as long as you don't import the wrapper from hivemind.moe)

justheuristic

justheuristic reviewed

Jun 20, 2022

View reviewed changes

examples/ppo/README.md Outdated

@@ -0,0 +1,45 @@

# Training PPO with decentralized averaging

This tutorial will walk you through the steps to set up collaborative training of an off-policy reinforcement learning algorighm [PPO](https://arxiv.org/pdf/1707.06347.pdf) to play Atari Breakout. It uses [stable-baselines3 implementation of PPO](https://stable-baselines3.readthedocs.io/en/master/modules/ppo.html), hyperparameters for the algorithm are taken from [rl-baselines3-zoo](https://github.com/DLR-RM/rl-baselines3-zoo/blob/master/hyperparams/ppo.yml), collaborative training is built on `hivemind.Optimizer` to exchange information between peers.

Copy link

Member

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: i believe PPO is on-policy

Copy link

Collaborator Author

@foksly foksly Jun 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also have the same belief :)

foksly added 2 commits

June 20, 2022 14:05

@foksly


 Merge branch 'master' into hive-rl

4bb482c

@foksly


 minor update to readme.md

1eb7522

@justheuristic

Copy link

Member

justheuristic commented Jun 21, 2022

[just in case] feel free to ping me if you need any help with black / isort

Labels

None yet

3 participants

@foksly @justheuristic @mryab

Add RL examples #463

Are you sure you want to change the base?

Add RL examples #463

Uh oh!

Conversation

@foksly foksly commented Mar 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Mar 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

justheuristic commented Jun 9, 2022

Uh oh!

justheuristic commented Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

Uh oh!

@mryab mryab Jun 20, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

@mryab mryab Jun 20, 2022

Choose a reason for hiding this comment

Uh oh!

@justheuristic justheuristic Jun 20, 2022

Choose a reason for hiding this comment

Uh oh!

@foksly foksly Jun 20, 2022

Choose a reason for hiding this comment

Uh oh!

justheuristic commented Jun 21, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

@foksly foksly commented Mar 18, 2022 •

edited

Loading

codecov bot commented Mar 18, 2022 •

edited

Loading

justheuristic commented Jun 20, 2022 •

edited

Loading

@mryab mryab Jun 20, 2022 •

edited

Loading