|
| 1 | +# Chapter 1: Basics of CUDA Programming Syntax |
| 2 | + |
| 3 | +## 1. Including Headers |
| 4 | + |
| 5 | +In C++ (and CUDA), we use headers to include predefined functions and classes. The most common header is `<iostream>`, which provides input/output functionality. For example: |
| 6 | + |
| 7 | +```cpp |
| 8 | +#include <iostream> |
| 9 | +``` |
| 10 | + |
| 11 | +## 2. Data Types |
| 12 | + |
| 13 | +### a. Integers (`int`, `short`, `long`) |
| 14 | + |
| 15 | +- **`int`**: Represents whole numbers (e.g., 5, -10, 0). |
| 16 | +- **`short`**: Short integers with smaller range. |
| 17 | +- **`long`**: Long integers with larger range. |
| 18 | + |
| 19 | +### b. Floating-Point Numbers (`float`, `double`) |
| 20 | + |
| 21 | +- **`float`**: Single-precision floating-point (e.g., 3.14, -0.001). |
| 22 | +- **`double`**: Double-precision floating-point (more accurate, but uses more memory). |
| 23 | + |
| 24 | +### c. Other Types |
| 25 | + |
| 26 | +- **`char`**: Represents individual characters (e.g., 'A', 'b', '$'). |
| 27 | +- **`bool`**: Represents true or false values. |
| 28 | + |
| 29 | +## 3. Constants |
| 30 | + |
| 31 | +- Use `const` to define constants (values that don't change during program execution). |
| 32 | +- Example: |
| 33 | + |
| 34 | + ```cpp |
| 35 | + const float PI = 3.14159; |
| 36 | + ``` |
| 37 | + |
| 38 | +## 4. Loops |
| 39 | + |
| 40 | +### a. `for` Loop |
| 41 | + |
| 42 | +- Repeats a block of code a specified number of times. |
| 43 | +- Example: |
| 44 | + |
| 45 | + ```cpp |
| 46 | + for (int i = 0; i < 10; ++i) { |
| 47 | + // Code to execute |
| 48 | + } |
| 49 | + ``` |
| 50 | + |
| 51 | +### b. `while` Loop |
| 52 | + |
| 53 | +- Repeats a block of code while a condition is true. |
| 54 | +- Example: |
| 55 | + |
| 56 | + ```cpp |
| 57 | + int count = 0; |
| 58 | + while (count < 5) { |
| 59 | + // Code to execute |
| 60 | + ++count; |
| 61 | + } |
| 62 | + ``` |
| 63 | + |
| 64 | +## 5. The `int main()` Function |
| 65 | + |
| 66 | +- Every C++ program starts with the `main` function. |
| 67 | +- It's the entry point of your program, where execution begins. |
| 68 | +- The `int` before `main` indicates that the function returns an integer (usually 0 for successful execution). |
| 69 | + |
| 70 | +## 6. Writing a Simple Kernel |
| 71 | + |
| 72 | +- A **kernel** is a function that runs on the GPU. |
| 73 | +- It's the heart of CUDA programming. |
| 74 | +- Kernels are defined using the `__global__` keyword. |
| 75 | + |
| 76 | +Example of a simple kernel: |
| 77 | + |
| 78 | +```cpp |
| 79 | +__global__ void vectorAdd(float* A, float* B, float* C, int size) { |
| 80 | + int tid = blockIdx.x * blockDim.x + threadIdx.x; |
| 81 | + if (tid < size) { |
| 82 | + C[tid] = A[tid] + B[tid]; |
| 83 | + } |
| 84 | +} |
| 85 | +``` |
| 86 | + |
| 87 | +## 7. Thread Indexing |
| 88 | + |
| 89 | +- In CUDA, we work with threads. |
| 90 | +- Each thread has a unique index. |
| 91 | +- We calculate the thread index using `blockIdx.x`, `blockDim.x`, and `threadIdx.x`. |
| 92 | + |
| 93 | +## 8. Memory Allocation |
| 94 | + |
| 95 | +- We allocate memory for data on both the CPU (host) and GPU (device). |
| 96 | +- `cudaMalloc` allocates memory on the device. |
| 97 | +- `cudaMemcpy` transfers data between host and device. |
| 98 | + |
| 99 | +## 9. Launching the Kernel |
| 100 | + |
| 101 | +- We launch the kernel using `<<<numBlocks, threadsPerBlock>>>` syntax. |
| 102 | +- `numBlocks` and `threadsPerBlock` determine the grid and block dimensions. |
| 103 | + |
| 104 | +## 10. Clean Up |
| 105 | + |
| 106 | +- After using GPU memory, we free it using `cudaFree`. |
| 107 | +- Also, delete any host memory allocated with `new`. |
0 commit comments