Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

clip in PPOLoss #2334

Answered by albertbou92
majid5776 asked this question in Q&A
Discussion options

Hi.
As you know when the actor network in PPO want to take action, it get sample mu(mean) and sigma(variance) from the policy dist.
How it(mu and sigma) implemented in torchrl?
As my question I want to use 1/sigma replace with clip_ratio that is always 0.2? that is possible?
Thank you.

You must be logged in to vote

Hello!

As you mention, the action sampling is defined by de actor, and is independent of the algorithm.

Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.

An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-imple...

Replies: 1 comment

Comment options

Hello!

As you mention, the action sampling is defined by de actor, and is independent of the algorithm.

Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.

An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/utils_mujoco.py#L83

I am not sure if I understood the 1 / sigma part. I don't think there is any existing module to flip sigma and obtain 1/sigma, but if you need to do that for some reason you could create a very simple custom TensorDictModule that does it in the forward method and concatenate it at the end of your model, before the ProbabilisticActor.

For the PPO clip parameter, it can be fixed in the PPO class.
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L640

You must be logged in to vote
0 replies
Answer selected by majid5776
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet

AltStyle によって変換されたページ (->オリジナル) /