I have a kernel that runs perfectly on my laptop CPU (i5-3427U). I am now trying to get it to run on the built-in GPU (HD 4000).
The full error is here but I can't make much of it. Now the (in my eyes) curious part is that the following fails to build:
unsigned int dist_histogram[n_bins];
for (i = 0; i < n_bins; i++) {
dist_histogram[i] = 0;
}
for (i = 0; i < n_bins; i++) {
atomic_add(&g_r[i], dist_histogram[i]);
}
but this builds and runs:
unsigned int dist_histogram[n_bins];
for (i = 0; i < n_bins; i++) {
dist_histogram[i] = 0;
atomic_add(&g_r[i], dist_histogram[i]);
}
What's going on here?
Full kernel is pasted here. OS is 64-bit OS X 10.9.2. I've unsuccessfully tried messing around with the bottom 3 loops in some other ways. The issue appears to have something to do with accessing anything in distances or adjusting values in dist_histogram but I don't understand why then the first example I posted fails to build.
1 Answer 1
Given the nature of the build log and the fact that the same code builds just fine on other devices, this is almost certainly a bug with Apple's OpenCL implementation. Their OpenCL implementation for the Intel integrated graphics family is still fairly immature, and many people have found various issues with it (including myself).
I recommend producing a minimal code example that reproduces the problem and reporting it using the Apple Bug Reporter.