I am studying opencl1.1 from 978 9321749642 ISBN, it largely states about synchronization that synchronization is limited to work-items in a work-group only and OPENCL does not provide synchronization guarantee work-items in a multiple work-group.
I reviewed opencl3.0 specification and very little said about whether this limitation has improved 3.2.4 Synchronization.
The specific example in an above mentioned book took an example as work-item synchronizatino within work-group:
void kernel _fcn_name (...args...)
{
<code snippet>....
barrier()
<code snippet>....
}
But CUDA example I looked before has simply put the synchronization function (which I can not recall function’s exact name) after the kernel call in host caller code.
host code:
<code snippet...>
<cuda kernel> ()....
synchronization function call
<code snippet...>
This way all cuda threads called by cuda kernel will be forced to complete before synch function call. Can it be done similar in opencl?
There is a barrier() opencl built-in function, not sure if this can be called from host.