I am trying to do a reduction on a large array of data, using local memory. I found that the largest amount of local memory I am able to use in my kernel is 49152 bytes whether they are long, int, or short. Any more and I get error -5 launching the kernel.
My local work size is set at 128, and my global work size is very large multiple of 128.
I get the same results whether I launch the kernel on the CPU or the GPU.
Any Ideas what might cause this?
I assume you are using an NVIDIA Fermi card of some sort ? The max shared memory for
this card is
CL_DEVICE_LOCAL_MEM_SIZE: 48 KByte
(which is 49152 bytes). You can query this value in your code using clGetDeviceInfo() and look for CL_DEVICE_LOCAL_MEM_SIZE. You say that you get the same on the CPU, but i’m not suer which CPU you have or which platform you are using for the CPU. My CPU (intel core 2 quad using AMD OpenCL platform) shows
CL_DEVICE_LOCAL_MEM_SIZE: 32 KByte