Commit 915ed1b

committed

Chapter 2: Simple CUDA Vector Addition

1 parent 71a5509 commit 915ed1bCopy full SHA for 915ed1b

File tree

5 files changed

+123

-0

lines changed

Chapter2-EasyCudaProject
README.md

5 files changed

+123

-0

lines changed

`‎Chapter2-EasyCudaProject/README.md`

Lines changed: 63 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,63 @@`
	`1`	`+# Chapter 2: Simple CUDA Vector Addition`
	`2`	`+`
	`3`	`+## Overview`
	`4`	`+`
	`5`	`+In this chapter, we'll create a basic CUDA program that performs vector addition using the GPU. The program consists of a CUDA kernel that adds corresponding elements from two input vectors and stores the result in an output vector.`
	`6`	`+`
	`7`	`+### Code Explanation`
	`8`	`+`
	`9`	`+Let's go through the code step by step:`
	`10`	`+`
	`11`	+1. Kernel Function (`vectorAdd`):
	`12`	+ - The `vectorAdd` function is the heart of our CUDA program. It runs on the GPU and performs the vector addition.
	`13`	`+ - It takes four arguments:`
	`14`	+ - `A`: Pointer to the first input vector.
	`15`	+ - `B`: Pointer to the second input vector.
	`16`	+ - `C`: Pointer to the output vector (where the result will be stored).
	`17`	+ - `size`: Size of the vectors (number of elements).
	`18`	+ - Inside the kernel, each thread computes the sum of corresponding elements from `A` and `B` and stores the result in `C`.
	`19`	`+`
	`20`	`+2. Main Function:`
	`21`	+ - The `main` function sets up the host (CPU) and device (GPU) memory, initializes input vectors, launches the kernel, and retrieves the result.
	`22`	+ - Key steps in the `main` function:
	`23`	+ - Allocate memory for vectors (`h_A`, `h_B`, and `h_C`) on the host.
	`24`	+ - Initialize input vectors (`h_A` and `h_B`) with sample values.
	`25`	+ - Allocate memory for vectors (`d_A`, `d_B`, and `d_C`) on the device (GPU).
	`26`	+ - Copy data from host to device using `cudaMemcpy`.
	`27`	+ - Launch the `vectorAdd` kernel with appropriate block and grid dimensions.
	`28`	`+ - Copy the result back from the device to the host.`
	`29`	+ - Print the result (output vector `h_C`).
	`30`	`+`
	`31`	`+3. Memory Allocation and Transfer:`
	`32`	`+ - We allocate memory for vectors on both the host and the device.`
	`33`	+ - `cudaMalloc` allocates memory on the device.
	`34`	+ - `cudaMemcpy` transfers data between host and device.
	`35`	`+`
	`36`	`+4. Kernel Launch:`
	`37`	+ - We launch the `vectorAdd` kernel using `<<<numBlocks, threadsPerBlock>>>` syntax.
	`38`	+ - `numBlocks` and `threadsPerBlock` determine the grid and block dimensions.
	`39`	`+`
	`40`	`+5. Clean Up:`
	`41`	+ - We free the allocated device memory using `cudaFree`.
	`42`	+ - We also delete the host vectors (`h_A`, `h_B`, and `h_C`) to avoid memory leaks.
	`43`	`+`
	`44`	`+## How to Compile and Run`
	`45`	`+`
	`46`	`+1. Compile the Code:`
	`47`	`+ - Open your terminal or command prompt.`
	`48`	+ - Navigate to the folder containing `vector_addition.cu`.
	`49`	+ - Compile the code using `nvcc` (NVIDIA CUDA Compiler):
	`50`	+ ```
	`51`	`+ nvcc vector_addition.cu -o vector_addition`
	`52`	+ ```
	`53`	`+ ![Compile Code Image](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/tree/main/Chapter2-EasyCudaProject/part1.png)`
	`54`	`+`
	`55`	`+2. Run the Executable:`
	`56`	`+ - Execute the compiled binary:`
	`57`	+ ```
	`58`	`+ ./vector_addition`
	`59`	+ ```
	`60`	`+ - You'll see the result of vector addition printed to the console.`
	`61`	`+`
	`62`	`+ ![Executable Code Image](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/tree/main/Chapter2-EasyCudaProject/part2.png)`
	`63`	`+`

`‎Chapter2-EasyCudaProject/part1.png`

16.5 KB

Loading[フレーム]

`‎Chapter2-EasyCudaProject/part2.png`

55.1 KB

Loading[フレーム]

`‎Chapter2-EasyCudaProject/vector_addition.cu`

Lines changed: 59 additions & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1,59 @@`
	`1`	`+#include <iostream>`
	`2`	`+`
	`3`	`+// Kernel for vector addition`
	`4`	`+__global__ void vectorAdd(float* A, float* B, float* C, int size) {`
	`5`	`+ int tid = blockIdx.x * blockDim.x + threadIdx.x;`
	`6`	`+ if (tid < size) {`
	`7`	`+ C[tid] = A[tid] + B[tid];`
	`8`	`+ }`
	`9`	`+}`
	`10`	`+`
	`11`	`+int main() {`
	`12`	`+ const int N = 1024; // Size of vectors`
	`13`	`+ const int threadsPerBlock = 256;`
	`14`	`+ const int numBlocks = (N + threadsPerBlock - 1) / threadsPerBlock;`
	`15`	`+`
	`16`	`+ // Allocate memory for vectors on the host`
	`17`	`+ float* h_A = new float[N];`
	`18`	`+ float* h_B = new float[N];`
	`19`	`+ float* h_C = new float[N];`
	`20`	`+`
	`21`	`+ // Initialize vectors with some values (you can modify this)`
	`22`	`+ for (int i = 0; i < N; ++i) {`
	`23`	`+ h_A[i] = i;`
	`24`	`+ h_B[i] = 2 * i;`
	`25`	`+ }`
	`26`	`+`
	`27`	`+ // Allocate memory for vectors on the device (GPU)`
	`28`	`+ float* d_A, d_B, d_C;`
	`29`	`+ cudaMalloc(&d_A, N * sizeof(float));`
	`30`	`+ cudaMalloc(&d_B, N * sizeof(float));`
	`31`	`+ cudaMalloc(&d_C, N * sizeof(float));`
	`32`	`+`
	`33`	`+ // Copy data from host to device`
	`34`	`+ cudaMemcpy(d_A, h_A, N * sizeof(float), cudaMemcpyHostToDevice);`
	`35`	`+ cudaMemcpy(d_B, h_B, N * sizeof(float), cudaMemcpyHostToDevice);`
	`36`	`+`
	`37`	`+ // Launch the kernel`
	`38`	`+ vectorAdd<<<numBlocks, threadsPerBlock>>>(d_A, d_B, d_C, N);`
	`39`	`+`
	`40`	`+ // Copy result back to host`
	`41`	`+ cudaMemcpy(h_C, d_C, N * sizeof(float), cudaMemcpyDeviceToHost);`
	`42`	`+`
	`43`	`+ // Clean up`
	`44`	`+ cudaFree(d_A);`
	`45`	`+ cudaFree(d_B);`
	`46`	`+ cudaFree(d_C);`
	`47`	`+`
	`48`	`+ // Print the result (you can modify this)`
	`49`	`+ for (int i = 0; i < N; ++i) {`
	`50`	`+ std::cout << h_C[i] << " ";`
	`51`	`+ }`
	`52`	`+ std::cout << std::endl;`
	`53`	`+`
	`54`	`+ delete[] h_A;`
	`55`	`+ delete[] h_B;`
	`56`	`+ delete[] h_C;`
	`57`	`+`
	`58`	`+ return 0;`
	`59`	`+}`

`‎README.md`

Lines changed: 1 addition & 0 deletions

Original file line number	Diff line number	Diff line change
`@@ -67,4 +67,5 @@`
`67`	`67`	`### Follow The Guide`
`68`	`68`
`69`	`69`	`- [x] [Chapter 1: Basics of CUDA Programming](Chapter1-Basics)`
	`70`	`+- [x] [Chapter 2: Simple CUDA Vector Addition](Chapter2-EasyCudaProject)`
`70`	`71`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit 915ed1b

File tree

5 files changed

5 files changed

`‎Chapter2-EasyCudaProject/README.md`

`‎Chapter2-EasyCudaProject/part1.png`

`‎Chapter2-EasyCudaProject/part2.png`

`‎Chapter2-EasyCudaProject/vector_addition.cu`

`‎README.md`

0 commit comments