I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022.
I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint.
I checked out the document, but I couldn't figure it out.
Here are some force-break code location:
__device__ __host__ TVector3(T e0, T e1, T e2) : x(e0), y(e1), z(e2) {}
template <typename T>
__device__ __host__ inline TVector3<T> operator+(const TVector3<T>& u, const TVector3<T>& v) {
return TVector3(u.x + v.x, u.y + v.y, u.z + v.z);
}
1 Answer 1
The debugger is stopping at "weird" places because the compiler inlined/optimized template/device functions (and/or you don’t have device debug symbols), so source binary mapping is not 1:1.
Quick steps to fix (do these in your Debug build)
Build with device debug info and disable device optimizations:
nvcc: add -G (Generate debug info for device) and disable optimizations for CUDA code.
Only use -G for local debug builds (it drastically changes code/performance).
Build host with debug info and no optimizations:
MSVC: C/C++ → Optimization = Disabled (/Od) and Debug Info = /Zi.
Clean + full rebuild.
Force a non-inlined function where you want a reliable breakpoint:
Use noinline on the functions you debug:
GCC/Clang/nvcc: attribute((noinline))
MSVC: __declspec(noinline)
Example:
device host attribute((noinline)) TVector3<T> operator+(...) { ... }
Start the correct debug session in Nsight:
- Use "Start CUDA Debugging" / Nsight CUDA debug to hit device code. The normal VS debugger only handles host.
Verify symbols and sources:
- Check Modules window / Nsight logs to ensure PDB / debug symbols for the binary are loaded and match source files.
Short-run kernel for easier stepping:
- Run a tiny grid (1 block, 1 thread) while debugging so it’s reproducible and easier to step.