When I try to compile a kernel (it has a lot of functions), I get an error and it depends on the device I use.
AMD: The kernel’s compilation returns "E013:Insufficient Private Resources! ".
NVIDIA: There isn’t any error during the compilation but when I execute the kernel it returns CL_OUT_OF_RESOURCES.
Is there any solution to this problem?
Thanks in advance.
Since this is a long kernel, it could be that the final version uses too many registers or local memory. Nvidia has a maxregcount build option as far as I remember, but I don’t know if AMD has an equivalent. This options causes registers to be spilled into memory, slowing down your code, but I think it will fix your problem, at least on Nvidia cards.
Dividing your kernel into a few simpler pieces, while incurring extra overhead due to more kernel calls, may still be better than spilling registers to memory.
Thanks so much.
Do you know where I can get information about this build option?
AMD APP SDK documentation: http://developer.amd.com/tools/hc/AMDAPPSDK/documentation/Pages/default.aspx
Sorry, never used this option before so I haven’t looked for it in AMD documents.
For Nvidia, look at Chapter 5 of the CUDA C Programming Guide. Also look at the OpenCL documentation for clBuildProgram. I have not needed to try this yet, but I think you should be able to achieve the desired effect by adding -maxregcount to the command line options passed to clBuildProgram for your Nvidia GPU. As I said, you will have to find the AMD equivalent, if there is one.