Experiments with memory object allocation flags

There have been a handful of threads that discuss the memory object allocation flags (CL_MEM_COPY_HOST_PTR, CL_MEM_ALLOC_HOST_PTR, and CL_MEM_USE_HOST_PTR), but I’ve run some experiments and they lead to different conclusions.

First, if a memory object is created with the CL_MEM_WRITE_ONLY flag, then the host pointer must be null. If the object is created with the CL_MEM_READ_ONLY or CL_MEM_READ_WRITE flags, then the host pointer must not be null.

If the host pointer is not null, then one of three allocation settings are possible: CL_MEM_USE_HOST_PTR, CL_MEM_COPY_HOST_PTR, and CL_MEM_COPY_HOST_PTR | CL_MEM_ALLOC_HOST_PTR. When I try to create a memory object with no flags or CL_MEM_ALLOC_HOST_PTR alone, I get a run-time error. From what I’ve seen, these aren’t valid options.

If the host pointer is null, then you can create the memory object with no allocation flags.

This is how I think the flags work:
CL_MEM_USE_HOST_PTR - The memory object uses the pre-allocated memory identified by the host pointer
CL_MEM_COPY_HOST_PTR - OpenCL allocates memory for the memory object separate from that identified by the host pointer.
CL_MEM_COPY_HOST_PTR | CL_MEM_ALLOC_HOST_PTR - OpenCL allocates host-accessible memory for the memory object separate from that identified by the host pointer.

If anyone has code that contradicts this, please let me know. I’m running Ubuntu and sending kernels to an ATI 5850 graphics card.


Please read the OpenCL specification since I think some of your statements oppose what the specification says. Although a specific vendor might support some deviations, by following the specification it allows you to be portable to other conformant vendors.

For example, the CL_MEM_READ_WRITE, CL_MEM_WRITE_ONLY and CL_MEM_READ_ONLY specify how the device accesses this memory object, and has no relationship to the host pointer. It is the remaining flags that effect the host pointer, so I wonder what your experimental software specified for those remaining flags that might have suggested this conclusion.

Also the only valid combinations of the remaining flags is: no remaining flags set, CL_MEM_USE_HOST_PTR, CL_MEM_ALLOC_HOST_PTR, CL_MEM_COPY_HOST_PTR, and CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR.

A host pointer is only needed for these combinations: CL_MEM_USE_HOST_PTR, and CL_MEM_COPY_HOST_PTR, and CL_MEM_ALLOC_HOST_PTR |CL_MEM_COPY_HOST_PTR, the former is definitional, and the latter two so that a copy can take place from host to device memory after the device memory is allocated.

Finally the combination: no remaining flags set, and CL_MEM_COPY_HOST_PTR mean the device memory is non-mappable (no CL_MEM_ALLOC_HOST_PTR set implying no host accessible using clEnqueue[Map|Unmap], in other words, only accessible with clEnqueue[Read|Write]Buffer), whereas the combination: CL_MEM_ALLOC_HOST_PTR, and CL_MEM_ALLOC_HOST_PTR | CL_MEM_COPY_HOST_PTR mean the device memory is mappable (CL_MEM_ALLOC_HOST_PTR set implying accessible using clEnqueue[Map|Unmap] in addition with clEnqueue[Read|Write]Buffer).

I’ve read the spec and I’ve read your postings on this subject. What I’ve written about has to do with the run-time errors I get when I run real code.

If I create a write-only memory object with a non-null host pointer, I get a run-time error. If you can, I’ll be certain that my concerns are implementation-specific.

bwatt is right. What you describe are bugs in the implementation you are using. I recommend filing bug reports on the vendor’s support system.

Perhaps this is an area where the conformance tests are too lax and should be strengthened.

I found the same OpenCL bugs on my Mac, which has an Nvidia GPU. It seems odd that Nvidia and AMD have precisely the same bugs in their implementations.

I’d be interested to know if any implementations don’t have these bugs.

Please, I’m not being critical of you, actually I’m very interested in this behavior and your insights. Can you simplify your software to a specific failure and post it here. I’d like to give it a try on some of the implementations I have access to.

I don’t quite understand what you mean. I have succefully created buffer object with CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR and non-NULL pointer.
Windows, ATI stream SDK 2.2, cat. 10.10 driver.

Restating what I typed above: the CL_MEM_WRITE_ONLY flag specifies how the device accesses the memory object, and has NO relationship to the host pointer. The CL_MEM_USE_HOST_PTR requires a host pointer be specified. Therefore “CL_MEM_WRITE_ONLY | CL_MEM_USE_HOST_PTR and non-NULL host pointer” is a correct combination.