Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit b066bde

Browse files
Chapter 5: Debugging and Profiling
1 parent ee97f7b commit b066bde

File tree

2 files changed

+182
-1
lines changed

2 files changed

+182
-1
lines changed
Lines changed: 181 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,181 @@
1+
# Chapter 5: Debugging and Profiling CUDA Programs
2+
3+
In this chapter, we will explore various tools and techniques to debug and profile CUDA programs. Debugging helps identify and fix errors in your code, while profiling helps optimize performance by analyzing how your code executes on the GPU.
4+
5+
## Table of Contents
6+
7+
1. [Introduction](#introduction)
8+
2. [Debugging CUDA Programs](#debugging-cuda-programs)
9+
- [Using cuda-gdb](#using-cuda-gdb)
10+
- [Using Nsight Eclipse Edition](#using-nsight-eclipse-edition)
11+
- [Using Nsight Visual Studio Edition](#using-nsight-visual-studio-edition)
12+
3. [Profiling CUDA Programs](#profiling-cuda-programs)
13+
- [Using Nsight Systems](#using-nsight-systems)
14+
- [Using Nsight Compute](#using-nsight-compute)
15+
4. [Sample Code](#sample-code)
16+
5. [Additional Resources](#additional-resources)
17+
18+
## Introduction
19+
20+
Debugging and profiling are essential steps in CUDA development. They help ensure your code runs correctly and efficiently on the GPU. This guide will introduce you to the tools and techniques needed to debug and profile CUDA programs effectively.
21+
22+
## Debugging CUDA Programs
23+
24+
### Using cuda-gdb
25+
26+
`cuda-gdb` is a powerful debugger for CUDA applications. It allows you to set breakpoints, step through code, and inspect variables.
27+
28+
1. **Compile with Debug Information**:
29+
```sh
30+
nvcc -g -G -o debapro debapro.cu
31+
```
32+
33+
2. **Start cuda-gdb**:
34+
```sh
35+
cuda-gdb ./debapro
36+
```
37+
38+
3. **Set Breakpoints**:
39+
```sh
40+
(cuda-gdb) break main
41+
```
42+
43+
4. **Run the Program**:
44+
```sh
45+
(cuda-gdb) run
46+
```
47+
48+
5. **Step Through Code**:
49+
```sh
50+
(cuda-gdb) next
51+
```
52+
53+
6. **Inspect Variables**:
54+
```sh
55+
(cuda-gdb) print variable_name
56+
```
57+
58+
### Using Nsight Eclipse Edition
59+
60+
Nsight Eclipse Edition provides an integrated development environment for debugging CUDA applications.
61+
62+
1. **Open Nsight Eclipse Edition**.
63+
2. **Create a Debug Configuration**:
64+
- Go to **Run > Debug Configurations**.
65+
- Create a new CUDA C/C++ Application configuration.
66+
- Set the project and application to your `debapro` executable.
67+
3. **Set Breakpoints and Start Debugging**.
68+
69+
### Using Nsight Visual Studio Edition
70+
71+
Nsight Visual Studio Edition integrates with Visual Studio to provide debugging capabilities for CUDA applications.
72+
73+
1. **Open Visual Studio**.
74+
2. **Create a Debug Configuration**:
75+
- Go to **Debug > Attach to Process**.
76+
- Select the CUDA application you want to debug.
77+
3. **Set Breakpoints and Start Debugging**.
78+
79+
## Profiling CUDA Programs
80+
81+
### Using Nsight Systems
82+
83+
Nsight Systems provides system-wide performance analysis, helping you identify bottlenecks in your application.
84+
85+
1. **Run Nsight Systems**:
86+
```sh
87+
nsys profile ./debapro
88+
```
89+
90+
2. **Analyze the Report**:
91+
- This command generates a report file (e.g., `report.qdrep`).
92+
- Open this file in the Nsight Systems GUI for detailed analysis.
93+
94+
### Using Nsight Compute
95+
96+
Nsight Compute provides detailed analysis of CUDA kernel performance.
97+
98+
1. **Run Nsight Compute**:
99+
```sh
100+
ncu ./debapro
101+
```
102+
103+
2. **Analyze the Output**:
104+
- Nsight Compute will provide detailed metrics about the kernel execution.
105+
106+
## Sample Code
107+
108+
Here is a sample CUDA program for matrix multiplication:
109+
110+
```cpp
111+
#include <iostream>
112+
#include <cuda_runtime.h>
113+
114+
#define N 16 // Size of the matrix (N x N)
115+
116+
__global__ void matrixMul(const float *A, const float *B, float *C, int width) {
117+
int row = blockIdx.y * blockDim.y + threadIdx.y;
118+
int col = blockIdx.x * blockDim.x + threadIdx.x;
119+
float value = 0;
120+
121+
if (row < width && col < width) {
122+
for (int k = 0; k < width; ++k) {
123+
value += A[row * width + k] * B[k * width + col];
124+
}
125+
C[row * width + col] = value;
126+
}
127+
}
128+
129+
int main() {
130+
int size = N * N * sizeof(float);
131+
float h_A[N * N], h_B[N * N], h_C[N * N];
132+
133+
// Initialize matrices
134+
for (int i = 0; i < N * N; ++i) {
135+
h_A[i] = static_cast<float>(rand()) / RAND_MAX;
136+
h_B[i] = static_cast<float>(rand()) / RAND_MAX;
137+
}
138+
139+
float *d_A, *d_B, *d_C;
140+
cudaMalloc((void **)&d_A, size);
141+
cudaMalloc((void **)&d_B, size);
142+
cudaMalloc((void **)&d_C, size);
143+
144+
cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);
145+
cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);
146+
147+
dim3 threadsPerBlock(16, 16);
148+
dim3 blocksPerGrid((N + threadsPerBlock.x - 1) / threadsPerBlock.x, (N + threadsPerBlock.y - 1) / threadsPerBlock.y);
149+
matrixMul<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);
150+
151+
cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);
152+
153+
// Verify the result
154+
for (int i = 0; i < N; ++i) {
155+
for (int j = 0; j < N; ++j) {
156+
float sum = 0;
157+
for (int k = 0; k < N; ++k) {
158+
sum += h_A[i * N + k] * h_B[k * N + j];
159+
}
160+
if (fabs(sum - h_C[i * N + j]) > 1e-5) {
161+
std::cerr << "Result verification failed at element (" << i << ", " << j << ")" << std::endl;
162+
exit(EXIT_FAILURE);
163+
}
164+
}
165+
}
166+
167+
std::cout << "Test PASSED" << std::endl;
168+
169+
cudaFree(d_A);
170+
cudaFree(d_B);
171+
cudaFree(d_C);
172+
173+
return 0;
174+
}
175+
```
176+
177+
## Resources to check
178+
179+
- [NVIDIA Developer Blog on Profiling and Debugging CUDA Applications](https://developer.nvidia.com/blog/new-video-tutuorial-profiling-and-debugging-nvidia-cuda-applications/)2
180+
- [CUDA Profiler User’s Guide](https://docs.nvidia.com/cuda/profiler-users-guide/index.html)4
181+
- [NVIDIA Nsight Developer Tools](https://developer.nvidia.com/nsight-visual-studio-edition)5

‎README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,7 +70,7 @@
7070
- [x] [Chapter 2: Basic Syntax](Chapter2-BasicSyntax)
7171
- [x] [Chapter 3: Simple CUDA Vector Addition](Chapter3-EasyCudaProject)
7272
- [x] [Chapter 4: Intermediate Algorithms](Chapter4-IntermediateAlgo)
73-
73+
-[x][Chapter 5: Debugging and Profiling](Chapter5-DebuggingandProfiling)
7474

7575
## Going Beyond: Advanced CUDA
7676

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /