Looking for a better explanation of CL_DEVICE_MEM_BASE_ADDR_ALIGN

Hi I was looking through the OpenCL 1.2 (although this also applies to the OpenCL2.0 draft spec) and I came across a device property CL_DEVICE_MEM_BASE_ADDR_ALIGN which does not seem adequately explained

4.2 Querying devices


The minimum value is the size (in bits) of the largest OpenCL built-in data type supported by the device (long16 in FULL profile, long16 or int16 in EMBEDDED profile) for devices that are not of type CL_DEVICE_TYPE_CUSTOM .

This does not explain what CL_DEVICE_MEM_BASE_ADDR_ALIGN is. All this is telling me is that for FULL profile…

cl_uint align=0;

assert( align >= sizeof(long16)*8 ); // For FULL profile
assert( align >= sizeof(long16)*8 || align >= sizeof(int16)*8); //For EMBEDDED profile

The only mention of this I can find in description of clCreateSubBuffer() (5.2.1 Creating Buffer objects).

returned in errcode_ret if there are no devices in
context associated with buffer for which the origin
value is aligned to the

I don’t quite understand. Is the documentation saying that if you create a subbuffer, its origin must be aligned to CL_DEVICE_MEM_BASE_ADDR_ALIGN for just one device? Surely you want the subbuffer origin to be aligned with CL_DEVICE_MEM_BASE_ADDR_ALIGN for all devices in the context. For example if you are allowed to create a sub buffer that is not aligned with a device A’s CL_DEVICE_MEM_BASE_ADDR_ALIGN (if there are other devices in the context that are aligned) then what happens if you try to launch a kernel on device A that takes a pointer to that subbuffer?

Even if I’ve interpreted the specification correctly I feel that the explanation of CL_DEVICE_MEM_BASE_ADDR_ALIGN could be improved in the specification.


In that case clEnqueueNDRangeKernel will return a CL_MISALIGNED_SUB_BUFFER_OFFSET as specified in the clEnqueueNDRangeKernel docs.

My understanding is that a cl_mem object belongs to the context, which can have multiple devices. But also note that all enqueue commands go to a cl_command_queue object which can only be tied to a single device (see section 5.1). So you just need to worry about creating the buffers with an alignment of the device that will use it and not finding a common denominator between all devices.


Thanks for the reply. That makes sense.

I’ll try and leave some feedback in the OpenCL 2.0 thread though as I think the documentation should be clarified.