Hi! I have a small question regarding the usage of this flag. When using it with clCreateBuffer, does that mean that when the kernel executes, he will finally write the data back to the host pointer when he finishes?

So if I have a buffer = clCreateBuffer(with the CL_MEM_USE_HOST_PTR) and I use it with a kernel after which the CPU host waits for, will I have the data in the host vector, without invoking a read on the buffer?


The host buffer is not necessarily up-to-date when your kernel ends because its content can be cached in device memory.

You have to use clEnqueueMapBuffer / clEnqueueUnmapBuffer to ensure that the host buffer is updated with the latest values.

Thanks for the information.

But what is the actual meaning of the CL_MEM_USE_HOST_PTR. If I use it with clCreateBuffer, does that create a cl_mem object that will have the same address as the buffer that I created from the host? Therefore will place the data to the same location as the host allocated memory. Or will it create a separate and new memory location where the buffer will reside?

The reason I am asking this thing is that on the Intel Haswell you have the GPU and the CPU on the same dye sharing the L3 cache. Therefore if the two are sharing the same address location (the host buffer and the device buffer) then my idea would be to create a mechanism where the two cooperate with each other so that they could get the data they need faster. I have to test it out, but it would be great if someone could give me pointers or something: hey stupid stop wasting your time because it does not work.


Yes, this is possible. Please see:

But then do I have to use the clEnqueueMapBuffer to remap the buffer or write directly into the array and the GPU might see it?


Yes, in OpenCL 1.x you must use clEnqueueMapBuffer before accessing on host side, and clEnqueueUnmapMemObject before using on device side.

Thanks, for all the information. This has helped me a lot.