- 7.2k
- 2
- 11
- 26
Is it going negative? You can fix that with:
eps = 1e-8
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)
You could also switch optimizers to see if this fixes the issue:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))
The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.
Is it going negative? You can fix that with:
eps = 1e-8
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)
You could also switch optimizers to see if this fixes the issue:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
Is it going negative? You can fix that with:
eps = 1e-8
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)
You could also switch optimizers to see if this fixes the issue:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))
The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.
Is it going negative? You can fix that with:
eps = 1e-8
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)
You could also switch optimizers to see if this fixes the issue:
optimizer = torch.optim.Adam(model.parameters(), lr=0.001)