-
Notifications
You must be signed in to change notification settings - Fork 6.5k
-
in
annotated_deep_learning_paper_implementations/labml_nn/normalization/deep_norm/__init__.py
Line 112 in 90e21b5
return self.layer_norm(x + self.alpha * gx)
the DeepNorm is calculated as
return self.layer_norm(x + self.alpha * gx)
should this be
return self.layer_norm(self.alpha * x + gx)?
this is implementation from torchscale lib
https://github.com/microsoft/torchscale/blob/4d1e0e82e5adf86dd424f1463192635b73fc8efc/torchscale/architecture/decoder.py#L130
Beta Was this translation helpful? Give feedback.
All reactions
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment