How do I rerun a kernel with a new data set?

biokhar · June 19, 2012, 1:48am

I have a kernel that I successfully setup and ran, but I want to take the results of that kernel run, then rerun the kernel with a new data set 40,000 times.

Is it possible to replace the GPU memory with a new data set without repeating the whole setup process (creating context, commandQueue, program, kernel, memory buffer, kernel arguments)?

My kernel takes 5 arrays and calculates a score. Then I read the score back to the CPU. Using the score, I change one of the arrays and I want to rerun the kernel with the changes. How do I reset the array on the GPU?

Jan23 · June 19, 2012, 8:20am

If you want to call a Kernel several times using a loop, you can do most things just once, before the loop. This includes creating context, queue, kernel, and program. Also, the creation of those memory buffers that remain the same for each execution and their setting as kernel arguments.
However, those kernel arguments that change have to be set in each run of the loop, as well as - of course - enqueuing the kernel.
If a kernel execution relies on the kernel before having finished (which is the case, if I understand you correctly), you should also use a clEnqueueBarrier() in your loop (this applies only for OpenCL 1.1, not 1.2).

These are general guidelines that proved useful for me; as I don’t know your use case, I might have forgot something.

notzed · June 19, 2012, 5:41pm

You just write the new value and call the kernel again. All the memory is ‘static’ and all the other values will stay around unless you change them.

If you’re using use_host_ptr or copy_host_ptr then you call enqueuewritebuffer after making local changes, or use mapbuffer() / make changes / unmap buffer(). This will ensure the data is updated and valid on the device.

You don’t need barriers or anything as Jan23 suggests: by default all queues are in-order and everything execute on the queue (read/write/kernel execution) runs in the order you call them. Barriers are useful if you have multiple queues you need to synchronise.

biokhar · June 20, 2012, 5:55pm

Thank you so much for the help. I tried to make this work, but I’m not setting it up right.

Here is how I setup the relevant parts of my kernel:

cl_mem memObjects[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
memObjects[3] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, NUM_ATOMS*sizeof(float), atom_X, NULL);
errNum |= clSetKernelArg(CheckKernel, 3, sizeof(cl_mem), &memObjects[3]);

for(i=0;i<40,000;i++){
errNum = clEnqueueNDRangeKernel(commandQueue, CheckKernel, 1, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL);
clFinish(commandQueue);
errNum = clEnqueueReadBuffer(commandQueue, memObjects[7], CL_TRUE, 0, NUM_RAYS*sizeof(float), distances, 0, NULL, NULL);

//use results to move atom_X
float* X;
X = (float*)clEnqueueMapBuffer(commandQueue, memObjects[3], CL_TRUE, CL_MAP_WRITE, 0, NUM_ATOMS*sizeof(float), 0, NULL, NULL, &errNum);
for(j=0;j<NUM_ATOMS;j++){
atom_X[j] = atom_X[j] + 2;//2 is just to test
}
errNum = clEnqueueUnmapMemObject(commandQueue,memObjects[3],(void *)X,0,NULL,&errNum);

//rerun kernel with new position
}

Is this the correct direction to go? Also, I’m getting this error:

error: cannot convert ‘cl_int*’ to ‘_cl_event**’ for argument ‘6’ to ‘cl_int clEnqueueUnmapMemObject(_cl_command_queue*, _cl_mem*, void*, cl_uint, _cl_event* const*, _cl_event**)’

Thanks again

notzed · June 20, 2012, 8:27pm

biokhar:

Thank you so much for the help. I tried to make this work, but I’m not setting it up right.

Here is how I setup the relevant parts of my kernel:

cl_mem memObjects[9] = { 0, 0, 0, 0, 0, 0, 0, 0, 0 };
memObjects[3] = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, NUM_ATOMS*sizeof(float), atom_X, NULL);
errNum |= clSetKernelArg(CheckKernel, 3, sizeof(cl_mem), &memObjects[3]);

for(i=0;i<40,000;i++){
errNum = clEnqueueNDRangeKernel(commandQueue, CheckKernel, 1, NULL, globalWorkSize, localWorkSize, 0, NULL, NULL);
clFinish(commandQueue);
errNum = clEnqueueReadBuffer(commandQueue, memObjects[7], CL_TRUE, 0, NUM_RAYS*sizeof(float), distances, 0, NULL, NULL);

//use results to move atom_X
float* X;
X = (float*)clEnqueueMapBuffer(commandQueue, memObjects[3], CL_TRUE, CL_MAP_WRITE, 0, NUM_ATOMS*sizeof(float), 0, NULL, NULL, &errNum);
for(j=0;j<NUM_ATOMS;j++){
atom_X[j] = atom_X[j] + 2;//2 is just to test
}
errNum = clEnqueueUnmapMemObject(commandQueue,memObjects[3],(void *)X,0,NULL,&errNum);

//rerun kernel with new position
}

Is this the correct direction to go? Also, I’m getting this error:

error: cannot convert ‘cl_int*’ to ‘_cl_event**’ for argument ‘6’ to ‘cl_int clEnqueueUnmapMemObject(_cl_command_queue*, _cl_mem*, void*, cl_uint, _cl_event* const*, _cl_event**)’

Thanks again

You have to use the value returned by clEnqueueMap*, i.e. X, and not atom_X in your update code. COPY_HOST_PTR just means it copies the content at atom_X, but otherwise is never used again after the createBuffer call. You probably have to use CL_MAP_READ | CL_MAP_WRITE as flags too since you’re doing a read/write operation. I’ve not used map/unmap though so i’m going from the man pages.

Your error looks expected, you’re just not calling it properly - see the man page, or the prototype you pasted.