2

I have been starting my development with GPU using OpenCL. I have been playing around with code which pushes limits.

During this I have been running into the situation where the computation time on the GPU is relatively long which results in the GUI becoming unresponsive and/or the GPU task takes so long that the device driver is reset.

While I understand why this happens and I am not looking for and explanation of why, what I am hoping to understand is how far can I push computation with a GPU which is being used by the system for GUI operations.

Is there any guide lines/best practices of this type of interactions

Is there any programming methods which would allow for long running GPU computation and still allow the GUI to remain responsive.

I know that the basic recommendation would be to split up the GPU task to be relatively small I am assuming that this is not possible, since I am exploring limits of GPU programming.

Any online discussions would be very useful.

Jim K

asked Aug 18, 2013 at 23:47
2
  • you are probably interested in asynchronous programming, that's the usual "solution" for keeping your application responsive while performing some tasks in the background. It depends on what languages you are using, in C++ there is boost or the latest C++11 standard and both offer support for async tasks/methods. Commented Aug 19, 2013 at 0:16
  • Sorry if I did not make it clear. Is the window itself which is becoming unresponsive not my GUI. In fact I am using the command line Commented Aug 19, 2013 at 1:03

2 Answers 2

1

To answer your question, no there is nothing you can do to achieve your goal of having a long running kernel and maintain a functioning GUI all on one GPU. If you want long running kernels and a functioning GUI, you must use a dedicated GPU for computing. If you want a responsive GUI while doing computations on the same GPU, you must have short running kernels. You could complain every week on the AMD or Nvidia forums begging for this feature.

The only platform independent way to divide your work that comes to mind is to limit the amount of work sent to the GPU so that it finishes in something like 1/60th of a second (for 60Hz screens) and include a sleep command that puts the CPU thread to sleep for a short while so other applications can send tasks to the GPU. You may have to adjust that time limit to find something that does not affect the user.

answered Aug 20, 2013 at 7:12
Sign up to request clarification or add additional context in comments.

1 Comment

This is what I expected. Just wanted to have this confirmed by someone with more OpenCL knowledge.
0

One solution is to use two display devices: one for the OS and another for computation. But there are benefits to breaking a long run. For example, suppose a GPU task will take 10 days. How do you know the GPU task is really running properly during that 10 day period? Breaking up the task into segments of a few seconds allows you to add progress reporting capability to the controlling program. Breaking up the task also allows the controlling program to implement a periodic state save feature for resuming after a power failure. If you want to use multiple GPUs to further accelerate the computation, then it is essential that the task is broken into smaller segments. A small segment of work can be given to each GPU as it finishes the previous segment. That way, all GPUs will remain fully loaded until the task is complete. If instead the task is divided into large portions for each GPU, then it will be difficult or impossible to size the pieces so that the GPUs all complete at the same time.

I believe most GPU workloads can be broken up into segments of a few seconds each without any significant performance loss. So in this sense, breaking up the task does not detract from the goal of 'pushing the limits' of GPU computation. If the controlling program dispatches work continuously to the GPU used by the OS display, it may still impact the responsiveness of the OS display. A solution to this problem that does not reduce performance is to access the the machine remotely, using Remote Desktop, VNC or similar.

answered Aug 19, 2013 at 1:53

2 Comments

I know all this is true from general principles of programming. But, since I am probing the limits I am still in the condition. (the reason for this is an expected future use of the techniques). What I have currently (again I am probing the limits) is a Work items of 1M and work group of 4. The total time takes about 5 sec (tune able). While this is executing it the system GUI does not work. Again this is contrived because I am probing the limits.
I agree with ScottD. If your problem is taking 5 seconds, you need to divide it up. GPU tasks should run in less than a second or your system will become quite unresponsive. You will not be less efficient running 100 kernels of 50 ms each than 1 kernel of 5000 ms.

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.