I have 5 Kernels, which keeps processing a finite amount of data. Multiple cl_mem objects are created, some which are used only in a single kernel and some which are shared across kernels. I keep getting CL_MEM_OBJECT_ALLOCATION_FAILURE while enqueuing the 3rd Kernel. However, when I reduce the data am getting the error while enqueuing the 4th Kernel (The 3rd Kernel enqueue works fine). There are no errors returned in any of the clCreateBuffer calls. I suspected it to be a memory issue. For the first (larger) set of data, almost 42MB memory (Global Memory) (cl_mem objects) was allocated before the 3rd Kernel enqueue failure. For the second (smaller) set of data, only 1.48MB memory (Global Memory) was allocated before the 4th Kernel enqueue failure. My device capabalities queries yield CL_DEVICE_MAX_MEM_ALLOC_SIZE as 256MByte and CL_DEVICE_GLOBAL_MEM_SIZE 1024MByte. Am allocating much less than these values. Fearing, it could be a problem in the kernel code, I commented out the entire Kernel code, except the parameters and still am getting the same. So am completely lost in understanding this issue. The callback notification function set to the context (in clCreateContext) didn’t provide any additional details. Is there any way to get details regarding which memory object allocation failed and for what reasons?
You are seeing the effect of the so-called delayed or lazy allocation, which is a very common technique used in many implementations.
Basically, when an OpenCL buffer is created, it is not instantly physically allocated on the device. Instead, allocation is delayed to the time it is used. This is why you are seeing the error at different times depending on array size.
Lazy allocation is allowed by the standard, and is not a problem per se. The problem comes from the fact that many implementation do not support the other side, which is buffer eviction (i.e. moving buffer out of the device when it’s not used): to work around this, you will have to manage your buffers manually, releasing them when not in use.