Semaphor problem atomic_xchg function

Hi all,

I have an issue using my Nvidia card and atomic function atom_xchg, in order to implement semaphor. I need to lock for a work-item an array of float named “v” passing as argument to a kernel function: (below the code). When i launch the kernel the semaphor is initialize with 0 (<=> the resource is available)

Code :
__kernel void my_ker (__global float* v, __global int* semaphor)
// Retrieve the index of the work-item
int g = get_global_id(0);

// wait for resurce to be available
 while (atom_xchg (semaphor, 1) != 0)  {}

// use the array for the current work-item
v[g] = (float)g;

// unlock the resource
atom_xchg (semaphor, 0);


But the program crashes all the timeand i do not understand why? Could someone help me please and explain me why it does not work correctly

you cannot implement per-work item semaphores this way.

e.g. on a gpu if you have say 64 work items running concurrently, the first work item will get the semaphore, but every other one will not - and thus the whole compute unit will go into an infinite loop. on a gpu remember all work items in the work group execute the same instructions at the same time, the code cannot diverge at the work-item level.

do a search on the various opencl forums for more information, or just give up and use a more appropriate algorithm.

per work-group semaphores are possible but not very useful since global memory isn’t consistent outside of a work-group.