Skip to main content
Stack Overflow
  1. About
  2. For Teams

Return to Answer

Post Timeline

Updating trying to guess the issue as the author doesn't provide complete code
Source Link
oppressionslayer
  • 7.2k
  • 2
  • 11
  • 26

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
 print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))

The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
or completely:
import torch
# Example in the forward pass
eps = 1e-8 # Small epsilon to avoid negative values in sqrt due to precision
grad_magnitude = self.grad_x**2 + self.grad_y**2
# Check for any potential negative values (this shouldn't happen, but for debugging)
if (grad_magnitude < 0).any():
 print("Warning: Negative values detected in grad_magnitude")
# Clamp the value to avoid NaN errors
self.grad_mag = torch.sqrt(torch.clamp(grad_magnitude, min=eps))

The issue might be floating-point precision differences between CUDA and CPU, which are causing the NaNs in the backward pass.

Source Link
oppressionslayer
  • 7.2k
  • 2
  • 11
  • 26

Is it going negative? You can fix that with:

eps = 1e-8 
self.grad_mag = torch.sqrt(self.grad_x**2 + self.grad_y**2 + eps)

You could also switch optimizers to see if this fixes the issue:

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)
lang-py

AltStyle によって変換されたページ (->オリジナル) /