1. Home
2. Questions
3. AI Assist
4. Tags
5. Challenges
6. Chat
7. Articles
8. Users
9. Companies
11. Communities for your favorite technologies. Explore all Collectives
Stack Internal

Stack Overflow for Teams is now called Stack Internal. Bring the best of human thought and AI automation together at your work.
Try for free Learn more
Bring the best of human thought and AI automation together at your work. Learn more

Return to Answer

Post Timeline

Updating trying to guess the issue as the author doesn't provide complete code

Source Link

edited Oct 22, 2024 at 20:18

oppressionslayer

edited Oct 22, 2024 at 20:18

oppressionslayer

7.2k
2
11
26

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
 print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))

The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
 print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))

The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.

Source Link

answered Oct 20, 2024 at 6:35

oppressionslayer

answered Oct 20, 2024 at 6:35

oppressionslayer

7.2k
2
11
26

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

lang-py

CollectivesTM on Stack Overflow