CUDA Dynamic memory allocation in kernel

Question 1

There are two arrays named A and B, they are corresponding to each other, and their space are allocated during the kernels running. the details of A and B are that A[i] is the position and B[i] is value.All the threads do the things below:

If the current thread's data is in the arrays update B,
Else expanding A and B, and insert the current thread's data into the arrays.
The initial size of A and B are zero.

Is the upper implementing supported by CUDA?

Question 2

Could you please clarify point #1?

Question 3

point #1 means that A[i] and B[i] store the position and value of the i-th element, current thread may update B[i], if the position of current thread's element is in array A.

Question 4

Concerning point #2, you would need something like C++'s realloc(), which, as long as I know, is not supported by CUDA. You can write your own realloc() according to this post

CUDA: Using realloc inside kernel

but I do not know how efficient will be this solution.

Alternatively, you should pre-allocate a "large" amount of global memory to be able to account for the worst case memory occupation scenario.

Question 5

Thanks a lot! another question is how to guarantee atomic operation? if there are one more threads update A and B at the same time.

Question 6

Have a look at the CUDA C Programming Guide, Section B.11. There you will find information on how using atomic operations in CUDA.

Question 7

the atomic operations in Section B.11 are used for exact number of global memory or shared memory, such as B[i]; I want to guarantee atomic operation for the whole array, such as the other thread is refused to access the array while one thread is accessing the array.

Question 8

You might consider using a critical section to control access to the array but there are challenges and difficulties. Search on cuda critical section in the upper right corner

Question 9

Yes, I mentioned there would be challenges and difficulties. You could consider using a critical section to manage inter-block access, while using the ordinary threadblock communications methods (shared memory, __syncthreads(), etc.) to handle arbitration within a threadblock.

Vitality 21.7k5 gold badges118 silver badges153 bronze badges · Accepted Answer · 2013-09-13 08:09:49Z

1

Concerning point #2, you would need something like C++'s realloc(), which, as long as I know, is not supported by CUDA. You can write your own realloc() according to this post

CUDA: Using realloc inside kernel

but I do not know how efficient will be this solution.

Alternatively, you should pre-allocate a "large" amount of global memory to be able to account for the worst case memory occupation scenario.

Share

Improve this answer

edited May 23, 2017 at 12:31

Community's user avatar

Community Bot

11 silver badge

answered Sep 13, 2013 at 8:09

Vitality's user avatar

Vitality

21.7k5 gold badges118 silver badges153 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

taoyuanjl

taoyuanjl Over a year ago

Thanks a lot! another question is how to guarantee atomic operation? if there are one more threads update A and B at the same time.

2013年09月13日T08:37:59.873Z+00:00

Vitality

Vitality Over a year ago

Have a look at the CUDA C Programming Guide, Section B.11. There you will find information on how using atomic operations in CUDA.

2013年09月13日T11:05:43.313Z+00:00

taoyuanjl

taoyuanjl Over a year ago

the atomic operations in Section B.11 are used for exact number of global memory or shared memory, such as B[i]; I want to guarantee atomic operation for the whole array, such as the other thread is refused to access the array while one thread is accessing the array.

2013年09月13日T11:43:53.63Z+00:00

Robert Crovella

Robert Crovella Over a year ago

You might consider using a critical section to control access to the array but there are challenges and difficulties. Search on cuda critical section in the upper right corner

2013年09月13日T12:35:21.54Z+00:00

Robert Crovella

Robert Crovella Over a year ago

Yes, I mentioned there would be challenges and difficulties. You could consider using a critical section to manage inter-block access, while using the ordinary threadblock communications methods (shared memory, __syncthreads(), etc.) to handle arbitration within a threadblock.

2013年09月15日T13:32:51.04Z+00:00

|

CollectivesTM on Stack Overflow

CUDA Dynamic memory allocation in kernel

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

6 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related