Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 915ed1b

Browse files
Chapter 2: Simple CUDA Vector Addition
1 parent 71a5509 commit 915ed1b

File tree

5 files changed

+123
-0
lines changed

5 files changed

+123
-0
lines changed

‎Chapter2-EasyCudaProject/README.md

Lines changed: 63 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,63 @@
1+
# Chapter 2: Simple CUDA Vector Addition
2+
3+
## Overview
4+
5+
In this chapter, we'll create a basic CUDA program that performs vector addition using the GPU. The program consists of a CUDA kernel that adds corresponding elements from two input vectors and stores the result in an output vector.
6+
7+
### Code Explanation
8+
9+
Let's go through the code step by step:
10+
11+
1. **Kernel Function (`vectorAdd`)**:
12+
- The `vectorAdd` function is the heart of our CUDA program. It runs on the GPU and performs the vector addition.
13+
- It takes four arguments:
14+
- `A`: Pointer to the first input vector.
15+
- `B`: Pointer to the second input vector.
16+
- `C`: Pointer to the output vector (where the result will be stored).
17+
- `size`: Size of the vectors (number of elements).
18+
- Inside the kernel, each thread computes the sum of corresponding elements from `A` and `B` and stores the result in `C`.
19+
20+
2. **Main Function**:
21+
- The `main` function sets up the host (CPU) and device (GPU) memory, initializes input vectors, launches the kernel, and retrieves the result.
22+
- Key steps in the `main` function:
23+
- Allocate memory for vectors (`h_A`, `h_B`, and `h_C`) on the host.
24+
- Initialize input vectors (`h_A` and `h_B`) with sample values.
25+
- Allocate memory for vectors (`d_A`, `d_B`, and `d_C`) on the device (GPU).
26+
- Copy data from host to device using `cudaMemcpy`.
27+
- Launch the `vectorAdd` kernel with appropriate block and grid dimensions.
28+
- Copy the result back from the device to the host.
29+
- Print the result (output vector `h_C`).
30+
31+
3. **Memory Allocation and Transfer**:
32+
- We allocate memory for vectors on both the host and the device.
33+
- `cudaMalloc` allocates memory on the device.
34+
- `cudaMemcpy` transfers data between host and device.
35+
36+
4. **Kernel Launch**:
37+
- We launch the `vectorAdd` kernel using `<<<numBlocks, threadsPerBlock>>>` syntax.
38+
- `numBlocks` and `threadsPerBlock` determine the grid and block dimensions.
39+
40+
5. **Clean Up**:
41+
- We free the allocated device memory using `cudaFree`.
42+
- We also delete the host vectors (`h_A`, `h_B`, and `h_C`) to avoid memory leaks.
43+
44+
## How to Compile and Run
45+
46+
1. **Compile the Code**:
47+
- Open your terminal or command prompt.
48+
- Navigate to the folder containing `vector_addition.cu`.
49+
- Compile the code using `nvcc` (NVIDIA CUDA Compiler):
50+
```
51+
nvcc vector_addition.cu -o vector_addition
52+
```
53+
![Compile Code Image](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/tree/main/Chapter2-EasyCudaProject/part1.png)
54+
55+
2. **Run the Executable**:
56+
- Execute the compiled binary:
57+
```
58+
./vector_addition
59+
```
60+
- You'll see the result of vector addition printed to the console.
61+
62+
![Executable Code Image](https://github.com/Shikha-code36/CUDA-Programming-Beginner-Guide/tree/main/Chapter2-EasyCudaProject/part2.png)
63+

‎Chapter2-EasyCudaProject/part1.png

16.5 KB
Loading[フレーム]

‎Chapter2-EasyCudaProject/part2.png

55.1 KB
Loading[フレーム]
Lines changed: 59 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,59 @@
1+
#include <iostream>
2+
3+
// Kernel for vector addition
4+
__global__ void vectorAdd(float* A, float* B, float* C, int size) {
5+
int tid = blockIdx.x * blockDim.x + threadIdx.x;
6+
if (tid < size) {
7+
C[tid] = A[tid] + B[tid];
8+
}
9+
}
10+
11+
int main() {
12+
const int N = 1024; // Size of vectors
13+
const int threadsPerBlock = 256;
14+
const int numBlocks = (N + threadsPerBlock - 1) / threadsPerBlock;
15+
16+
// Allocate memory for vectors on the host
17+
float* h_A = new float[N];
18+
float* h_B = new float[N];
19+
float* h_C = new float[N];
20+
21+
// Initialize vectors with some values (you can modify this)
22+
for (int i = 0; i < N; ++i) {
23+
h_A[i] = i;
24+
h_B[i] = 2 * i;
25+
}
26+
27+
// Allocate memory for vectors on the device (GPU)
28+
float* d_A, *d_B, *d_C;
29+
cudaMalloc(&d_A, N * sizeof(float));
30+
cudaMalloc(&d_B, N * sizeof(float));
31+
cudaMalloc(&d_C, N * sizeof(float));
32+
33+
// Copy data from host to device
34+
cudaMemcpy(d_A, h_A, N * sizeof(float), cudaMemcpyHostToDevice);
35+
cudaMemcpy(d_B, h_B, N * sizeof(float), cudaMemcpyHostToDevice);
36+
37+
// Launch the kernel
38+
vectorAdd<<<numBlocks, threadsPerBlock>>>(d_A, d_B, d_C, N);
39+
40+
// Copy result back to host
41+
cudaMemcpy(h_C, d_C, N * sizeof(float), cudaMemcpyDeviceToHost);
42+
43+
// Clean up
44+
cudaFree(d_A);
45+
cudaFree(d_B);
46+
cudaFree(d_C);
47+
48+
// Print the result (you can modify this)
49+
for (int i = 0; i < N; ++i) {
50+
std::cout << h_C[i] << " ";
51+
}
52+
std::cout << std::endl;
53+
54+
delete[] h_A;
55+
delete[] h_B;
56+
delete[] h_C;
57+
58+
return 0;
59+
}

‎README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -67,4 +67,5 @@
6767
### Follow The Guide
6868

6969
- [x] [Chapter 1: Basics of CUDA Programming](Chapter1-Basics)
70+
- [x] [Chapter 2: Simple CUDA Vector Addition](Chapter2-EasyCudaProject)
7071

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /