-
Notifications
You must be signed in to change notification settings - Fork 418
clip in PPOLoss #2334
-
Hi.
As you know when the actor network in PPO want to take action, it get sample mu(mean) and sigma(variance) from the policy dist.
How it(mu and sigma) implemented in torchrl?
As my question I want to use 1/sigma replace with clip_ratio that is always 0.2? that is possible?
Thank you.
Beta Was this translation helpful? Give feedback.
All reactions
Hello!
As you mention, the action sampling is defined by de actor, and is independent of the algorithm.
Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.
An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-imple...
Replies: 1 comment
-
Hello!
As you mention, the action sampling is defined by de actor, and is independent of the algorithm.
Essentially, TorchRL has a ProbabilisticActor class that can be used handle the probabilistic sampling. You can add it at the end of your model and specify the distribution you want (e.g. TanhNormal). As long as the outputs of your model (the keys of the output TensorDict) match the inputs expected by the distribution, the ProbabilisticActor will automatically sample the action and add it to the output TensorDict.
An example of how to define your ProbabilisticActor can be found in the sota-implementation for PPO for MuJoCo environments: https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/utils_mujoco.py#L83
I am not sure if I understood the 1 / sigma part. I don't think there is any existing module to flip sigma and obtain 1/sigma, but if you need to do that for some reason you could create a very simple custom TensorDictModule that does it in the forward method and concatenate it at the end of your model, before the ProbabilisticActor.
For the PPO clip parameter, it can be fixed in the PPO class.
https://github.com/pytorch/rl/blob/main/torchrl/objectives/ppo.py#L640
Beta Was this translation helpful? Give feedback.
All reactions
-
👍 1