Array in global memory - strange results

phaethon · October 28, 2012, 8:04pm

I am attempting to implement an algorithm to solve ODEs in OpenCL. I’ve been struggling with the best way to do this, and I have an idea in mind that I think should work in theory, but when I actually try to start with just a simple case I’m having unexpected results. I would very much appreciate any help.

Here’s what I have:


__kernel void calculate(__global float *n_array, __global float *n_avg, int step)
{
  int j = get_global_id(0);
  n_array[j] += n_array[j] * some constants;
  n_avg[step] += n_array[j];
}

I’m executing this with 1000 global work items (so that “j” ranges from 0-999) and the program is called 1000 times in a for loop (so that this kernel gets executes a total of 1000*1000 times). I place values in “n_array” initially with clEnqueueWriteBuffer and this is working correctly. The issue is that n_avg ends up with very unexpected values. I expect is has something to do with being shared memory and perhaps the timing is such that values aren’t being written properly. When I pick out a single value of j (i.e. if(j==10) ), it works exactly as expected. The issues is adding all the values to the n_avg[step] variable, say I have 1000 values of n_array[j] = 1, I would expect n_avg[step] to equal 1000 (since I added 1 to it 1000 times), but this is not always what is occurring. I know I may have not been clear here, I am still learning OpenCL so I don’t know exactly how to frame my question. If you think you can help but need more information, please let me know and I will do my best to provide it. Thank you very much for taking the time to look at my question.

utnapishtim · October 29, 2012, 10:15am

n_avg[step] is accessed concurrently by all work-items so this cannot work.

You have to split up your kernel into two kernels: in the first one, do the computation of the vector n_array, then in the second one, do a reduction of n_array.

You can learn about parallel reduction in this introductory article for instance:

http://developer.amd.com/Resources/documentation/articles/Pages/OpenCL-Optimization-Case-Study-Simple-Reductions.aspx

phaethon · November 4, 2012, 12:54pm

It seems like that link you provided is down, any chance you could explain what it said? Thanks!

notzed · November 4, 2012, 4:50pm

Just do an internets search for parallel reduction, parallel sum, and see where that leads you. It is a widely researched and documented topic.