How to debug cuda in Visual Studio with "step over"

Question 1

I installed NVIDIA Nsight Visual Studio Edition 2025.01 in Visual Studio 2022.

I want to debug code, but I can't debug with step over(F10), The debugger always stops at a location without a breakpoint.

I checked out the document, but I couldn't figure it out.

Here are some force-break code location:

__device__ __host__ TVector3(T e0, T e1, T e2) : x(e0), y(e1), z(e2) {}
template <typename T>
__device__ __host__ inline TVector3<T> operator+(const TVector3<T>& u, const TVector3<T>& v) {
 return TVector3(u.x + v.x, u.y + v.y, u.z + v.z);
}

Question 2

The debugger is stopping at "weird" places because the compiler inlined/optimized template/device functions (and/or you don’t have device debug symbols), so source binary mapping is not 1:1.

Quick steps to fix (do these in your Debug build)

Build with device debug info and disable device optimizations:
- nvcc: add -G (Generate debug info for device) and disable optimizations for CUDA code.
- Only use -G for local debug builds (it drastically changes code/performance).
Build host with debug info and no optimizations:
- MSVC: C/C++ → Optimization = Disabled (/Od) and Debug Info = /Zi.
- Clean + full rebuild.
Force a non-inlined function where you want a reliable breakpoint:
- Use noinline on the functions you debug:
  - GCC/Clang/nvcc: attribute((noinline))
  - MSVC: __declspec(noinline)
- Example:
  device host attribute((noinline)) TVector3<T> operator+(...) { ... }
Start the correct debug session in Nsight:
- Use "Start CUDA Debugging" / Nsight CUDA debug to hit device code. The normal VS debugger only handles host.
Verify symbols and sources:
- Check Modules window / Nsight logs to ensure PDB / debug symbols for the binary are loaded and match source files.
Short-run kernel for easier stepping:
- Run a tiny grid (1 block, 1 thread) while debugging so it’s reproducible and easier to step.

Luis Yuman 111 bronze badge · Accepted Answer · 2025-10-31 04:16:31Z

The debugger is stopping at "weird" places because the compiler inlined/optimized template/device functions (and/or you don’t have device debug symbols), so source binary mapping is not 1:1.

Quick steps to fix (do these in your Debug build)

Build with device debug info and disable device optimizations:
- nvcc: add -G (Generate debug info for device) and disable optimizations for CUDA code.
- Only use -G for local debug builds (it drastically changes code/performance).
Build host with debug info and no optimizations:
- MSVC: C/C++ → Optimization = Disabled (/Od) and Debug Info = /Zi.
- Clean + full rebuild.
Force a non-inlined function where you want a reliable breakpoint:
- Use noinline on the functions you debug:
  - GCC/Clang/nvcc: attribute((noinline))
  - MSVC: __declspec(noinline)
- Example:
  device host attribute((noinline)) TVector3<T> operator+(...) { ... }
Start the correct debug session in Nsight:
- Use "Start CUDA Debugging" / Nsight CUDA debug to hit device code. The normal VS debugger only handles host.
Verify symbols and sources:
- Check Modules window / Nsight logs to ensure PDB / debug symbols for the binary are loaded and match source files.
Short-run kernel for easier stepping:
- Run a tiny grid (1 block, 1 thread) while debugging so it’s reproducible and easier to step.

CollectivesTM on Stack Overflow

How to debug cuda in Visual Studio with "step over"

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related