After reading the OpenCL 1.1 standard I still can’t grasp whether in-order command queue does guarantee memory visibility for any pair of commands (not only kernels) according to their enqueueing order.
OpenCL standard 1.1 section 5.11 states:
If the CL_QUEUE_OUT_OF_ORDER_EXEC_MODE_ENABLE property of a command-queue is not set, the commands enqueued to a command-queue execute in order. For example, if an application calls clEnqueueNDRangeKernel to execute kernel A followed by a clEnqueueNDRangeKernel to execute kernel B, the application can assume that kernel A finishes first and then kernel B is executed. If the memory objects output by kernel A are inputs to kernel B then kernel B will see the correct data in memory objects produced by execution of kernel A.
What about clEnqueueWriteBuffer (non-blocking) and clEnqueueNDRangeKernel enqueued after, which uses that buffer contents?
AFAIK, ‘finishes execution’ does not imply that corresponding writes are visible (due to relaxed consistency). For example, section 5.10 states specifically:
The clEnqueueBarrier command ensures that all queued commands in command_queue have finished execution before the next batch of commands can begin execution. The clEnqueueBarrier command is a synchronization point.
In other words, should I rely on other ‘synchronization points’-related rules (events, etc.), or I get memory synchronization out-of-the-box for all the commands in an in-order queue?
In-order queues insert barriers implicitly. You should only bother about synchronization if you use multiple queues or out-of-order queues.
Thank you! Could you guide me as of how I could be sure the OpenCL 1.1 standard states directly that? In other words, where could I find an evidence, that ‘In-order queues insert barriers implicitly’?
The snippet you copied states “commands enqueued to a command-queue”. “Command” implies every function with “Enqueue” in its name. Worth noting, “submission” is not the same as “execution”: if you submit two kernels that use the same READ_ONLY buffer but write into separate buffers, your runtime may decide to execute both in parallel even on in-order queues.
The problem with the quotes that I find worth considering is that ‘in-order’ is related strictly to the ‘command execution order’, but there is no clear relation to memory visibility between commands (in general case) because of that ‘for example statement’, which talks about visibility also, but for kernels only. I can’t find any notion of ‘implicit synchronization points’ for such queues in the standard. I can’t simply imply that ‘finished execution’ means ‘writes of this command are visible’, because the term ‘end of execution’ has no meaning described in the text.
Frankly, I’ve never thought of it this way. From purely practical standpoint, “command is finished” means “the OpenCL device is in well defined state the runtime can reason about”. If the standart would allow a driver to use an invalid memory object Vulkan style, it’d make the whole concept of a command queue meaningless: why not simply send commands to a cl_device directly?
Things that you are saying are reasonable, no doubt. What confuses me is that such a clear definition of ‘command is finished’ is not so hard to add to the glossary of the standard, but it’s not there. Looks like all I’ve got as a developer is ‘common sense’ reasoning, which is implied. Nevertheless, thank you for your time.