Array in global memory - strange results

I am attempting to implement an algorithm to solve ODEs in OpenCL. I’ve been struggling with the best way to do this, and I have an idea in mind that I think should work in theory, but when I actually try to start with just a simple case I’m having unexpected results. I would very much appreciate any help.

Here’s what I have:

__kernel void calculate(__global float *n_array, __global float *n_avg, int step)
  int j = get_global_id(0);
  n_array[j] += n_array[j] * some constants;
  n_avg[step] += n_array[j];

I’m executing this with 1000 global work items (so that “j” ranges from 0-999) and the program is called 1000 times in a for loop (so that this kernel gets executes a total of 1000*1000 times). I place values in “n_array” initially with clEnqueueWriteBuffer and this is working correctly. The issue is that n_avg ends up with very unexpected values. I expect is has something to do with being shared memory and perhaps the timing is such that values aren’t being written properly. When I pick out a single value of j (i.e. if(j==10) ), it works exactly as expected. The issues is adding all the values to the n_avg[step] variable, say I have 1000 values of n_array[j] = 1, I would expect n_avg[step] to equal 1000 (since I added 1 to it 1000 times), but this is not always what is occurring. I know I may have not been clear here, I am still learning OpenCL so I don’t know exactly how to frame my question. If you think you can help but need more information, please let me know and I will do my best to provide it. Thank you very much for taking the time to look at my question.

n_avg[step] is accessed concurrently by all work-items so this cannot work.

You have to split up your kernel into two kernels: in the first one, do the computation of the vector n_array, then in the second one, do a reduction of n_array.

You can learn about parallel reduction in this introductory article for instance:

It seems like that link you provided is down, any chance you could explain what it said? Thanks!

Just do an internets search for parallel reduction, parallel sum, and see where that leads you. It is a widely researched and documented topic.