Prefer Your Language

Search This Blog

What is Kernel in CUDA Programming

Posted by Unknown at 07:13 | 4 comments

Basic of CUDA Programming: Part 5

Kernels

CUDA C extends C by allowing the programmer to define C functions, called kernels, that, when called, are executed N times in parallel by N different CUDA threads, as opposed to only once like regular C functions.

A kernel is defined using the __global__ declaration specifier and the number of CUDA threads that execute that kernel for a given kernel call is specified using a new <<<…>>>execution configuration syntax. Each thread that executes the kernel is given a unique thread ID that is accessible within the kernel through the built-in threadIdx variable.

Syntax:

Kernel_Name<<< GridSize, BlockSize, SMEMSize, Stream >>>(arg,..);

Where:

SMEMsize : is the size of Shared Memory at Runtime .

Stream : is a stream on which kernel will execute.

Sample Example:

// Kernel definition
__global__ void VecAdd(float* A, float* B, float* C)
{
int i = threadIdx.x; C[i] = A[i] + B[i];
}
int main()
{ ...
// Kernel invocation with N threads
VecAdd<<<1, N>>>(A, B, C);
...

}
Here, each of the N threads that execute VecAdd() performs one pair-wise addition.
You must be wondered, how grid organized in term of block in term of threads; Read this Post

Feel free to comment...

References
CUDA C Programming Guide
CUDA; Nvidia

4 comments:

Anonymous30 January 2024 at 06:37
int* dev_dynamic_ptr;
cudaMalloc((void**)&dev_dynamic_ptr, dynamic_size);
Reply Delete
Replies
Anonymous30 January 2024 at 06:40
cuda program
Reply Delete
Replies
Anonymous30 January 2024 at 16:01
#include
#include
#include

int main() {
const int SIZE = 1001; // Array size is 1001 to include 0 to 1000
const int NUM_THREADS = 10;
int sum = 0; // For the total sum of the array
std::vector averages(NUM_THREADS, 0); // To store averages computed by each thread

// Initialize the array with values 0 to 1000
std::vector array(SIZE);
for (int i = 0; i < SIZE; ++i) {
array[i] = i;
}

#pragma omp parallel num_threads(NUM_THREADS)
{
int id = omp_get_thread_num(); // Get the thread ID
int start = id * (SIZE / NUM_THREADS);
int end = (id + 1) * (SIZE / NUM_THREADS);
int thread_sum = 0; // Sum for each thread

for (int i = start; i < end; ++i) {
thread_sum += array[i];
}

float thread_average = static_cast(thread_sum) / (SIZE / NUM_THREADS);
averages[id] = thread_average; // Store the average for the thread

#pragma omp atomic
sum += thread_sum; // Update the global sum atomically
}

// Output the result array (averages) to console
std::cout << "Averages: ";
for (float avg : averages) {
std::cout << avg << " ";
}
std::cout << std::endl;

// Output the sum of the whole array
std::cout << "Sum of array: " << sum << std::endl;

return 0;
}
Reply Delete
Replies
Anonymous30 January 2024 at 22:13
ؤتواؤلنالو
Reply Delete
Replies

Add comment

Help us to improve our quality and become contributor to our blog

[フレーム]

Become a contributor to this blog. Click on contact us tab

CUDA Programming

Prefer Your Language

Search This Blog

Tags

Related Posts

Share This

What is Kernel in CUDA Programming

4 comments:

Recent Post

About Me

Subscribe To

Total Pageviews

Labels

Blog Archive

Labels

Like Us

Cloud

Admin