"clReleaseMemObject" too long


Currently, I am writing an algorithm on 9800 GT with OpenCL. I used OpenCL Visual Profiler to watch performance and I found “clReleaseMemOBject” takes as much time than “clEnqueueReadBuffer” for the same MemObj. I just want unallocated GPU memory to liberate space, I don’t need to read them.
Do you know why “clReleaseMemObj” take so much time?
Is it a Nvidia issue or is it the same on ATI GPU?
Do you know an other way faster to unallocated memory?

Thanks a lot.

When you say “as much time than “clEnqueueReadBuffer””, do you mean a blocking or non-blocking read?

clReleaseMemObj should be instant. (All it does is reduce the reference count of the mem object by one.) However, this does not guarantee that the memory is freed. That will depend on whether other things are using the memory object (executing kernels, copies, etc.) and when the resource manager for the device gets around to freeing it. If clReleaseMemObject is not instant then there is a performance bug in the driver you’re using. Which vendor’s driver is this?

Sorry, I was long to reappear. :oops:

Issues disappear vith new driver CUDA 3.0 by Nvidia. So it was obviously a driver problem.

thanks for your help!!!

I’m experiencing the same issue.

Though I’m using C++ binding (cl.hpp and cl::Buffer defined in it), it should make essentially no difference.

(i) If I call clReleaseMemObject right after a series of kernel execution, it takes 0.17 seconds.

(ii) If I call clEnqueueReadBuffer(blocking) between kernels and clReleaseMemObject, clEnqueueReadBuffer takes 0.17 seconds and clReleaseMemObject takes only negligible time (1e-5 sec).

So apparently clReleaseMemObject waits for kernels to finish.
Do you have any ideas?

CUDA version: 4.2