Persistent storage on opencl GPU

Sorry for the grammatical errors. Re-posting after correction

Hi. Is there any way for the GPU to have persistent data storage(across kernel invocations), so that we don’t have to repeatedly send the data, every time we invoke the kernel.

For example if the kernel takes 2 arguments - constant_data(let us say this is 10000 bytes and the data is constant) and variable_data(say 1000-10000 bytes and the data varies between every invocation of the kernel), and we invoke the kernel from the host program 10000 times, each time with the same constant_data, but different variable_data, I have to bear the extra overhead of sending the same constant_data, inspite of this data being constant across kernel invocations. So over multiple kernel invokes, can the constant data be stored in the GPU, so that for the next invocation of the kernel, I don’t have to send the constant data, but I only send the variable data?

A case where this issues comes up is pattern matching. For example, if I have a kernel which implements the pattern matching algorithm, and I receive data as a stream in the host and the host invokes the kernel multiple times by sending as arguments a chunk of the data stream, and the set of patterns(where the set of patterns is always constant), then one has to bear the cost of sending the patterns, again when the kernel is invoked with the next chunk of data by the host. In such a case, it would be useful if one can store the patterns in the GPU and the kernel invocation from the host just supplies the next chunk of data against which the pattern matching has to be done.


A question related to this previous post of mine.

When we set a kernel argument using clSetKernelArg(memory_object) and call the kernel multiple times using clEnqueueNDKernelRangeKernel( ), will the memory_object that you had previously set as the argument in clSetKernelArg( ) for the kernel, be transferred from the host to the GPU, for every invocation of the kernel?

Persistance data storage is through buffers, i.e cl_mem objects, that can written through the host API, clEnqueueWriteBuffer, or via output from a kernel. A cl_mem object once allocated will presist until it is released. If the buffer is marked as read only, i.e. created with the flag CL_MEM_READ_ONLY, then the implication is that kernels will not preform a write and as such an implementation is free to place this in constant memory or optimize out writes. (The specification states that the behaviour of writing to such a buffer is undefined.)

In your example of pattern matching would it make more sense to allocate a cl_mem buffer object and write the constant patterns, using clEnqeueWriteBuffer, once and then simply pass in the handle to the buffer object as one of the kernels arguments. As this kernel argument is set once for all calls you need only call clSetKernelArg one time and its value will persist for the duration of the kernels life.

In parctice, keeping the number of arguments small will reduce the amount of data that needs to be transfered, by value, during kernel invocation and as noted above as kernel arguments are persistant refrain from resetting unless the values really have changed.

Yes. I think this would be the solution. Will try this out and get back here. Thanks for taking time on this.