Commit b066bde

committed

Chapter 5: Debugging and Profiling

1 parent ee97f7b commit b066bdeCopy full SHA for b066bde

File tree

2 files changed

+182

-1

lines changed

Chapter5-DebuggingandProfiling
- README.md
README.md

2 files changed

+182

-1

lines changed

`‎Chapter5-DebuggingandProfiling/README.md`

Lines changed: 181 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,181 @@`
	`1`	`+# Chapter 5: Debugging and Profiling CUDA Programs`
	`2`	`+`
	`3`	`+In this chapter, we will explore various tools and techniques to debug and profile CUDA programs. Debugging helps identify and fix errors in your code, while profiling helps optimize performance by analyzing how your code executes on the GPU.`
	`4`	`+`
	`5`	`+## Table of Contents`
	`6`	`+`
	`7`	`+1. [Introduction](#introduction)`
	`8`	`+2. [Debugging CUDA Programs](#debugging-cuda-programs)`
	`9`	`+ - [Using cuda-gdb](#using-cuda-gdb)`
	`10`	`+ - [Using Nsight Eclipse Edition](#using-nsight-eclipse-edition)`
	`11`	`+ - [Using Nsight Visual Studio Edition](#using-nsight-visual-studio-edition)`
	`12`	`+3. [Profiling CUDA Programs](#profiling-cuda-programs)`
	`13`	`+ - [Using Nsight Systems](#using-nsight-systems)`
	`14`	`+ - [Using Nsight Compute](#using-nsight-compute)`
	`15`	`+4. [Sample Code](#sample-code)`
	`16`	`+5. [Additional Resources](#additional-resources)`
	`17`	`+`
	`18`	`+## Introduction`
	`19`	`+`
	`20`	`+Debugging and profiling are essential steps in CUDA development. They help ensure your code runs correctly and efficiently on the GPU. This guide will introduce you to the tools and techniques needed to debug and profile CUDA programs effectively.`
	`21`	`+`
	`22`	`+## Debugging CUDA Programs`
	`23`	`+`
	`24`	`+### Using cuda-gdb`
	`25`	`+`
	`26`	+`cuda-gdb` is a powerful debugger for CUDA applications. It allows you to set breakpoints, step through code, and inspect variables.
	`27`	`+`
	`28`	`+1. Compile with Debug Information:`
	`29`	+ ```sh
	`30`	`+ nvcc -g -G -o debapro debapro.cu`
	`31`	+ ```
	`32`	`+`
	`33`	`+2. Start cuda-gdb:`
	`34`	+ ```sh
	`35`	`+ cuda-gdb ./debapro`
	`36`	+ ```
	`37`	`+`
	`38`	`+3. Set Breakpoints:`
	`39`	+ ```sh
	`40`	`+ (cuda-gdb) break main`
	`41`	+ ```
	`42`	`+`
	`43`	`+4. Run the Program:`
	`44`	+ ```sh
	`45`	`+ (cuda-gdb) run`
	`46`	+ ```
	`47`	`+`
	`48`	`+5. Step Through Code:`
	`49`	+ ```sh
	`50`	`+ (cuda-gdb) next`
	`51`	+ ```
	`52`	`+`
	`53`	`+6. Inspect Variables:`
	`54`	+ ```sh
	`55`	`+ (cuda-gdb) print variable_name`
	`56`	+ ```
	`57`	`+`
	`58`	`+### Using Nsight Eclipse Edition`
	`59`	`+`
	`60`	`+Nsight Eclipse Edition provides an integrated development environment for debugging CUDA applications.`
	`61`	`+`
	`62`	`+1. Open Nsight Eclipse Edition.`
	`63`	`+2. Create a Debug Configuration:`
	`64`	`+ - Go to Run > Debug Configurations.`
	`65`	`+ - Create a new CUDA C/C++ Application configuration.`
	`66`	+ - Set the project and application to your `debapro` executable.
	`67`	`+3. Set Breakpoints and Start Debugging.`
	`68`	`+`
	`69`	`+### Using Nsight Visual Studio Edition`
	`70`	`+`
	`71`	`+Nsight Visual Studio Edition integrates with Visual Studio to provide debugging capabilities for CUDA applications.`
	`72`	`+`
	`73`	`+1. Open Visual Studio.`
	`74`	`+2. Create a Debug Configuration:`
	`75`	`+ - Go to Debug > Attach to Process.`
	`76`	`+ - Select the CUDA application you want to debug.`
	`77`	`+3. Set Breakpoints and Start Debugging.`
	`78`	`+`
	`79`	`+## Profiling CUDA Programs`
	`80`	`+`
	`81`	`+### Using Nsight Systems`
	`82`	`+`
	`83`	`+Nsight Systems provides system-wide performance analysis, helping you identify bottlenecks in your application.`
	`84`	`+`
	`85`	`+1. Run Nsight Systems:`
	`86`	+ ```sh
	`87`	`+ nsys profile ./debapro`
	`88`	+ ```
	`89`	`+`
	`90`	`+2. Analyze the Report:`
	`91`	+ - This command generates a report file (e.g., `report.qdrep`).
	`92`	`+ - Open this file in the Nsight Systems GUI for detailed analysis.`
	`93`	`+`
	`94`	`+### Using Nsight Compute`
	`95`	`+`
	`96`	`+Nsight Compute provides detailed analysis of CUDA kernel performance.`
	`97`	`+`
	`98`	`+1. Run Nsight Compute:`
	`99`	+ ```sh
	`100`	`+ ncu ./debapro`
	`101`	+ ```
	`102`	`+`
	`103`	`+2. Analyze the Output:`
	`104`	`+ - Nsight Compute will provide detailed metrics about the kernel execution.`
	`105`	`+`
	`106`	`+## Sample Code`
	`107`	`+`
	`108`	`+Here is a sample CUDA program for matrix multiplication:`
	`109`	`+`
	`110`	+```cpp
	`111`	`+#include <iostream>`
	`112`	`+#include <cuda_runtime.h>`
	`113`	`+`
	`114`	`+#define N 16 // Size of the matrix (N x N)`
	`115`	`+`
	`116`	`+__global__ void matrixMul(const float A, const float B, float *C, int width) {`
	`117`	`+ int row = blockIdx.y * blockDim.y + threadIdx.y;`
	`118`	`+ int col = blockIdx.x * blockDim.x + threadIdx.x;`
	`119`	`+ float value = 0;`
	`120`	`+`
	`121`	`+ if (row < width && col < width) {`
	`122`	`+ for (int k = 0; k < width; ++k) {`
	`123`	`+ value += A[row * width + k] * B[k * width + col];`
	`124`	`+ }`
	`125`	`+ C[row * width + col] = value;`
	`126`	`+ }`
	`127`	`+}`
	`128`	`+`
	`129`	`+int main() {`
	`130`	`+ int size = N * N * sizeof(float);`
	`131`	`+ float h_A[N * N], h_B[N * N], h_C[N * N];`
	`132`	`+`
	`133`	`+ // Initialize matrices`
	`134`	`+ for (int i = 0; i < N * N; ++i) {`
	`135`	`+ h_A[i] = static_cast<float>(rand()) / RAND_MAX;`
	`136`	`+ h_B[i] = static_cast<float>(rand()) / RAND_MAX;`
	`137`	`+ }`
	`138`	`+`
	`139`	`+ float d_A, d_B, *d_C;`
	`140`	`+ cudaMalloc((void **)&d_A, size);`
	`141`	`+ cudaMalloc((void **)&d_B, size);`
	`142`	`+ cudaMalloc((void **)&d_C, size);`
	`143`	`+`
	`144`	`+ cudaMemcpy(d_A, h_A, size, cudaMemcpyHostToDevice);`
	`145`	`+ cudaMemcpy(d_B, h_B, size, cudaMemcpyHostToDevice);`
	`146`	`+`
	`147`	`+ dim3 threadsPerBlock(16, 16);`
	`148`	`+ dim3 blocksPerGrid((N + threadsPerBlock.x - 1) / threadsPerBlock.x, (N + threadsPerBlock.y - 1) / threadsPerBlock.y);`
	`149`	`+ matrixMul<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, d_C, N);`
	`150`	`+`
	`151`	`+ cudaMemcpy(h_C, d_C, size, cudaMemcpyDeviceToHost);`
	`152`	`+`
	`153`	`+ // Verify the result`
	`154`	`+ for (int i = 0; i < N; ++i) {`
	`155`	`+ for (int j = 0; j < N; ++j) {`
	`156`	`+ float sum = 0;`
	`157`	`+ for (int k = 0; k < N; ++k) {`
	`158`	`+ sum += h_A[i * N + k] * h_B[k * N + j];`
	`159`	`+ }`
	`160`	`+ if (fabs(sum - h_C[i * N + j]) > 1e-5) {`
	`161`	`+ std::cerr << "Result verification failed at element (" << i << ", " << j << ")" << std::endl;`
	`162`	`+ exit(EXIT_FAILURE);`
	`163`	`+ }`
	`164`	`+ }`
	`165`	`+ }`
	`166`	`+`
	`167`	`+ std::cout << "Test PASSED" << std::endl;`
	`168`	`+`
	`169`	`+ cudaFree(d_A);`
	`170`	`+ cudaFree(d_B);`
	`171`	`+ cudaFree(d_C);`
	`172`	`+`
	`173`	`+ return 0;`
	`174`	`+}`
	`175`	+```
	`176`	`+`
	`177`	`+## Resources to check`
	`178`	`+`
	`179`	`+- [NVIDIA Developer Blog on Profiling and Debugging CUDA Applications](https://developer.nvidia.com/blog/new-video-tutuorial-profiling-and-debugging-nvidia-cuda-applications/)2`
	`180`	`+- [CUDA Profiler User’s Guide](https://docs.nvidia.com/cuda/profiler-users-guide/index.html)4`
	`181`	`+- [NVIDIA Nsight Developer Tools](https://developer.nvidia.com/nsight-visual-studio-edition)5`

`‎README.md`

Lines changed: 1 addition & 1 deletion

Original file line number	Diff line number	Diff line change
`@@ -70,7 +70,7 @@`
`70`	`70`	`- [x] [Chapter 2: Basic Syntax](Chapter2-BasicSyntax)`
`71`	`71`	`- [x] [Chapter 3: Simple CUDA Vector Addition](Chapter3-EasyCudaProject)`
`72`	`72`	`- [x] [Chapter 4: Intermediate Algorithms](Chapter4-IntermediateAlgo)`
`73`		`-`
	`73`	`+-[x][Chapter 5: Debugging and Profiling](Chapter5-DebuggingandProfiling)`
`74`	`74`
`75`	`75`	`## Going Beyond: Advanced CUDA`
`76`	`76`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit b066bde

File tree

2 files changed

2 files changed

`‎Chapter5-DebuggingandProfiling/README.md`

`‎README.md`

0 commit comments