Error while reading same mem positions on different threads


I have a problem while reading a couple of positions in a double array from different threads.

I enqueue the execution with :

nelements = nx*ny;
err = clEnqueueNDRangeKernel(queue,kernelTvl2of,1,NULL,&nelements,NULL,0,NULL,NULL);

kernelTvl2of has (among other) the code

size_t k = get_global_id(0);
u1_[k] = (float)u1[k];

and forwardgradient has the code:

void forwardgradient(global double f, global double fx, global double fy, int ker,int nx, int ny){
unsigned int rowsnotlast = ((nx)
fx[ker] = f[ker+1] - f[ker];
fy[ker] = f[ker+nx] - f[ker];
fx[ker] = f[ker+1] - f[ker];
fx[0] = f[4607];
fx[1] = f[4608];
fx[2] = f[4608] - f[4607];
fx[3] = f[ker];
fx[4] = f[ker+1];
fx[5] = f[ker+1] - f[ker];
fx[ker] = 0;
fy[ker] = 0;
if(ker%nx == nx-1){
fx[6] = f[4608];

When I get the contents of the first positions of fx, they are:

-6 0 6 -6 0 6 -6

And here’s my problem: when I query fx[ker+1] or fx[4608] on thread with id 4607 I get a ‘0’ (positions second and fifth of the output array), but from other threads I get a ‘-6’ last position of the output array)

Anyone has a clue on what I’m doing wrong, or where I could look to?

Thanks a lot,


From OpenCL specs about memory consistency: “Global memory is consistent across work-items in a single work-group at a work-group barrier, but there are no guarantees of memory consistency between different work-groups executing a kernel.”

Work-item 4607 and work-item 4608 are probably not part of the same work-group, so barrier(CLK_GLOBAL_MEM_FENCE) does not make global memory changes visible between them.