pyopenCL, openCL, Can't build program on GPU

Question 1

I have a piece of kernel source which runs on the G970 on my PC but won't compile on my early 2015 MacBook pro with Iris 6100 1536MB graphic.

platform = cl.get_platforms()[0]
device = platform.get_devices()[1] # Get the GPU ID
ctx = cl.Context([device]) # Tell CL to use GPU
queue = cl.CommandQueue(ctx) # Create a command queue for the target device.
# program = cl.Program(ctx, kernelsource).build()
print platform.get_devices()

This get_devices() show I have 'Intel(R) Core(TM) i5-5287U CPU @ 2.90GHz' on 'Apple' at 0xffffffff>, 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500.

The kernel will run correctly on CPU. But when I build the program on GPU. It returns:

---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-44-e2b6e1b931de> in <module>()
 3 ctx = cl.Context([device]) # Tell CL to use GPU
 4 queue = cl.CommandQueue(ctx) # Create a command queue for the target device.
----> 5 program = cl.Program(ctx, kernelsource).build()
 6 
 7 
/usr/local/lib/python2.7/site-packages/pyopencl-2015年2月4日-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in build(self, options, devices, cache_dir)
 393 self._context, self._source, options, devices,
 394 cache_dir=cache_dir),
--> 395 options=options, source=self._source)
 396 
 397 del self._context
/usr/local/lib/python2.7/site-packages/pyopencl-2015年2月4日-py2.7-macosx-10.11-x86_64.egg/pyopencl/__init__.pyc in _build_and_catch_errors(self, build_func, options, source)
 428 # Python 3.2 outputs the whole list of currently active exceptions
 429 # This serves to remove one (redundant) level from that nesting.
--> 430 raise err
 431 
 432 # }}}
RuntimeError: clbuildprogram failed: BUILD_PROGRAM_FAILURE - 
Build on <pyopencl.Device 'Intel(R) Iris(TM) Graphics 6100' on 'Apple' at 0x1024500>:
Cannot select: 0x7f94b30a5110: i64,ch = dynamic_stackalloc 0x7f94b152a290, 0x7f94b30a4f10, 0x7f94b3092c10 [ORD=7] [ID=54]
 0x7f94b30a4f10: i64 = and 0x7f94b30a4c10, 0x7f94b3092b10 [ORD=7] [ID=52]
 0x7f94b30a4c10: i64 = add 0x7f94b30a6610, 0x7f94b3092a10 [ORD=7] [ID=49]
 0x7f94b30a6610: i64 = shl 0x7f94b3092d10, 0x7f94b3092e10 [ID=46]
 0x7f94b3092d10: i64 = bitcast 0x7f94b30a4810 [ID=41]
 0x7f94b30a4810: v2i32 = IGILISD::MOVSWZ 0x7f94b3092710, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=32]
 0x7f94b3092710: i32,ch = CopyFromReg 0x7f94b152a290, 0x7f94b3092610 [ORD=5] [ID=22]
 0x7f94b3092610: i32 = Register %vreg60 [ORD=5] [ID=1]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b3092e10: i64 = bitcast 0x7f94b30a3f10 [ID=38]
 0x7f94b30a3f10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4510, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=29]
 0x7f94b30a4510: i32 = Constant<2> [ID=19]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b3092a10: i64 = bitcast 0x7f94b30a4b10 [ID=40]
 0x7f94b30a4b10: v2i32 = IGILISD::MOVSWZ 0x7f94b30a4e10, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=31]
 0x7f94b30a4e10: i32 = Constant<7> [ID=21]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b3092b10: i64 = bitcast 0x7f94b3092910 [ID=39]
 0x7f94b3092910: v2i32 = IGILISD::MOVSWZ 0x7f94b30a5010, 0x7f94b30a4210, 0x7f94b30a2810, 0x7f94b30a2810 [ID=30]
 0x7f94b30a5010: i32 = Constant<-8> [ID=20]
 0x7f94b30a4210: i32 = Constant<-1> [ORD=3] [ID=10]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b3092c10: i64 = bitcast 0x7f94b3092810 [ID=35]
 0x7f94b3092810: v2i32 = IGILISD::MOVSWZ 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810, 0x7f94b30a2810 [ID=27]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
 0x7f94b30a2810: i32 = Constant<0> [ORD=1] [ID=7]
In function: trajectories
(options: -I /usr/local/lib/python2.7/site-packages/pyopencl-2015年2月4日-py2.7-macosx-10.11-x86_64.egg/pyopencl/cl)
(source saved as /var/folders/p2/jd7m10gs5k1_q6hx5kvktkcc0000gn/T/tmpWQmCKr.cl)

Any suggestion why this won't run? I am running Early 2015 MacBook Pro, Sierra 10.12.5. print cl.version.VERSION return 2015年2月4日

Here is the Kernel Code:

kernelsource = """
__kernel void trajectories(
 // TODO: adjust argtypes above if this is changed
 const int N,
 const int dim,
 __constant float* data,
 const int nrParticles,
 __global float* pos,
 __global float* vel,
 const int nrSteps,
 __global float* trj, 
 __global float* sigarr, 
 const float sigma, 
 const float mass, 
 const float alpha, // alpha is resistance in reverse. 
 const float dt
){
 int i,k,step;
 float h, sigsum, hexp; 
 int pidx = get_global_id(0); // global ID used as particle index
 int ofs = pidx * nrSteps * dim;
 int accofs = ofs + (nrSteps-1) * dim; // use last trj point to tmp store acc vector
 float v[dim];
 float sigma2 = sigma*sigma;
 float m = mass / sigma2;
 float dt_over_m = dt /m;
 for(step=0; step<nrSteps; step++){
 for(k=0; k<dim; k++)
 {
 trj[accofs+k]=0;
 } 
 for(i=0; i<N; i++)
 {
 h=0; // to store ||data[i]-x||**2
 for(k=0; k<dim; k++)
 { 
 v[k] = pos[pidx*dim+k] - data[i*dim + k];
 h += v[k]*v[k]; //h == force1p_sum
 };
 hexp = exp(-h/sigma2)/sigma2;
 for(k=0; k<dim; k++)
 { 
 trj[accofs+k] += -(hexp) * v[k]; 
 }; 
 };
 sigsum = 0;
 for(k=0; k<dim; k++)
 { 
 vel[pidx*dim+k] = alpha * vel[pidx*dim+k] + dt_over_m * trj[accofs+k]; // vel = alpha*vel + acc*dt 
 pos[pidx*dim+k] += dt * vel[pidx*dim+k]; // pos = pos + vel*dt
 sigsum += vel[pidx*dim+k] * vel[pidx*dim+k]; // v^2 for kinetic energy
 trj[ofs+step*dim+k] = pos[pidx*dim+k]; // write to result vector
 };
 sigarr[pidx*nrSteps+step] = sigsum; // sig = | vel | 
 }
 for(step=0; step<nrSteps-2; step++)
 {
 sigarr[pidx*nrSteps+step] = sigarr[pidx*nrSteps+step+2] - sigarr[pidx*nrSteps+step+1];
 };
 sigarr[pidx*nrSteps+nrSteps-1] = sigarr[pidx*nrSteps+nrSteps-2] = 0; 
}
"""

Thanks

Jiajun

Question 2

Can you share the kernel code? It returns BUILD_PROGRAM_FAILURE, so there has to be something wrong with the kernel code.

Question 3

clBuildProgram should also give you diagnostic output and tell you on what lines the problem is. If you can't make sense of that, post it together with the source as @parallelhighway suggests and we can try to help.

Question 4

Hi, I added the kernel code in. Thanks

Question 5

You should try to query the error of the build in such cases. Another thing you can do in similar, kernel code errors is that you can use offline compilers. Every OpenCL implementer has offline compiler.

You can find Intel's OpenCL offline compiler here: https://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools

AMD has a tool called CodeXL, in which you can also do offline compilation to see if your kernel code compiles.

Here is the ARM OpenCL offline compiler: https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

Intel's support is up to OpenCL 2.1 while ARM supports up until 1.1. So, you can choose any of them to compile your kernel code to find out bugs or errors easily.

The problem in your kernel is the following line:

float v[dim];

OpenCL C specification does not allow variable length arrays and the offline compiler gives the following error:

ERROR: <source>:22:12: error: variable length arrays are not supported in OpenCL

You can fix that line to overcome the error and from now on, you can check if your kernel can be compiled with the offline compiler.

EDIT: In the specification, there is a footnote that explains the variable length arrays are not supported. You can see it here:

https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=31

Question 6

Hi, you are right. When I replace it with fixed length it works. But I don't quite get is that I have been using variable length before with CPU and Nvidia 970 GPU. All of them work but not the Intel Iris GPU. Any idea why this happen? And dim is the dimension of my data, which needs to be a variable unless I manually change it everytime, is there any go-around? Many thanks

Question 7

You can create the v value on CPU and pass it as an argument. Defining a variable length inside the kernel is not allowed in this case.

parallel highway 3542 silver badges12 bronze badges · Accepted Answer · 2017-07-17 09:11:20Z

You should try to query the error of the build in such cases. Another thing you can do in similar, kernel code errors is that you can use offline compilers. Every OpenCL implementer has offline compiler.

You can find Intel's OpenCL offline compiler here: https://software.intel.com/en-us/articles/programming-with-the-intel-sdk-for-opencl-applications-development-tools

AMD has a tool called CodeXL, in which you can also do offline compilation to see if your kernel code compiles.

Here is the ARM OpenCL offline compiler: https://developer.arm.com/products/software-development-tools/graphics-development-tools/mali-offline-compiler/downloads

Intel's support is up to OpenCL 2.1 while ARM supports up until 1.1. So, you can choose any of them to compile your kernel code to find out bugs or errors easily.

The problem in your kernel is the following line:

float v[dim];

OpenCL C specification does not allow variable length arrays and the offline compiler gives the following error:

ERROR: <source>:22:12: error: variable length arrays are not supported in OpenCL

You can fix that line to overcome the error and from now on, you can check if your kernel can be compiled with the offline compiler.

EDIT: In the specification, there is a footnote that explains the variable length arrays are not supported. You can see it here:

https://www.khronos.org/registry/OpenCL/specs/opencl-2.0-openclc.pdf#page=31

Hi, you are right. When I replace it with fixed length it works. But I don't quite get is that I have been using variable length before with CPU and Nvidia 970 GPU. All of them work but not the Intel Iris GPU. Any idea why this happen? And dim is the dimension of my data, which needs to be a variable unless I manually change it everytime, is there any go-around? Many thanks
You can create the v value on CPU and pass it as an argument. Defining a variable length inside the kernel is not allowed in this case.

CollectivesTM on Stack Overflow

pyopenCL, openCL, Can't build program on GPU

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

2 Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related