kernel execution returns CL_INVALID_VALUE


I implemented an OpenCL calculation, and all of the stages seem to be running ok (context creation, device variable allocation, kernel building, etc). However, the execution of the kernel with clEnqueueNDRangeKernel returns the CL_INVALID_VALUE constant. I’m a little bit lost as for what could be the reason for the error, since the Khronos OpenCL specification (v1, rev43) doesn’t describe this result value in the section corresponding to Kernel Execution. Any comments/suggestions for possible things to look at?


i assume, you talk about the current nVidia sdk.
try to use less threads (128 or 64).

if that does not help, you can try to post the code.

Yes, that’s correct. I’m using the opencl 1.0 conformant release for Linux (on top of cudadriver_2.2_linux_32_185.18.08-beta).

I didn’t set the number of threads, so I must be using the default value (which I ignore what it is). I’ll see how to change it and let you know how it goes. Just as additional information, I’m using a Geforce 8600 GT, and the examples that come with the SDK seem to work ok.

Many thanks for the tip!

I had the same problem with OpenCL 1.0 CUDA 3.0.1 (beta). Reading CL_INVALID_VALUE after a clEnqueueNDRangeKernel call is indeed confusing. I guess it’s a previous error showing up. In my case, I was passing “cl_mem” to the fourth argument of clSetKernelArg instead of “cl_mem *”.

Be careful with argument sizes. As far as I understand, clEnqueueNDRangeKernel can return CL_INVALID_VALUE if one of the arguments passed to kernel has different size than in declaration of kernel function. I mean kernel may expect you pass 10-bytes-size struct, but you are passing 6 bytes, for example. in this case you will got this errorcode.

Yes, be careful with argument sizes. For example a size_t argument. In the kernel size_t may be 32 bits, while in the host code it may be 64 bits. Some implementations will let this slide, others won’t.