Timing transform feedback

Why this should stall CPU?

The only way to ensure that all previously given OpenGL commands complete in “finite time” is for the client to send those commands to the server. If the server command queue is full, then the client (CPU) must stall until that command queue empties out enough to accept all of the waiting commands.

If an OpenGL implementation is threaded, then it is possible for a GL client thread to exist that will ensure that all OpenGL commands are completed in “finite time”. In which case, a glFlush is effectively a no-op. But this is implementation-dependent; it’s better to assume that glFlush will induce a CPU stall than to think that it won’t.

The reason for using glFlush() in synchronization across multiple contexts is the fact we have to deal with totally independent GL servers.

ARB_sync has nothing to do with synchronization across multiple contexts specifically. It introduces sync objects and fence syncs; you can use those just fine in a single context application.

The purpose of fences is to be able to test whether the GPU has completed a particular task (defined by the point where you sent the sync command) or to simply be able to do a glFinish, but only to a certain point in the command stream.

Look at how the spec states it:

It says nothing about only flushing with multiple contexts. It simply says that you need to ensure that a fence is flushed manually at some point before you can start listening for it.

It does mention multiple contexts a bit later, but only to say that it requires more work than the single context case. You still need that flush regardless of how many contexts you have.

Very good point. Even though drivers are nowadays multithreaded and I’m pretty sure they will try to avoid application side CPU stalls at all costs, it is better not to build on such assumptions.

While it is true, I would point out that ARB_sync’s main use cases are still synchronization across multiple GL contexts and synchronization between OpenCL and OpenGL operations. I cannot really think of any particular use case where ARB_sync provides more flexibility than the regular explicit and implicit synchronization mechanisms, but please make me wrong as I’m interested.

Although it is possible to use sync objects in a single context, there is probably very limited number of use-cases where it can be useful. Frankly, I cannot find even one. Potentially, they can be used for signaling the client that some operation is finished on the server side by unblocking ClientWaitSync(), but I have never had a need for that.

Waiting on the server side something to be finished inside a single context is meaningless, since GL guarantees in order execution of issued commands. Please correct me if I’m wrong.

So, as Daniel already said, main use cases are synchronization across multiple GL contexts and synchronization between OpenCL and OpenGL.

Frankly, I cannot find even one

Really? I can think of a few:

1: Finding out when a read into a PBO has finished. That way, you’re not stalling the CPU waiting for one.

2: Finding out when the GPU is finished with a buffer, so that you’re free to write to it without stalling the CPU, and without allocating another block of memory (ie: being able to use unsynchronized mapping without flushing the buffer).

3: Asking if a timer is finished without blocking the CPU to wait for it.

4: Finding out if an occlusion query is finished without blocking the CPU for it (though that one isn’t so important these days thanks to conditional rendering).

And that’s just what I thought up off the top of my head.

Alfonse, you have the point, however, I think there are much better ways to deal with such issues.

What kind of timer? You mean a timer query? You can ask for the completeness of a timer query without sync objects.

Same here. You could always query whether the results of the occlusion query are available. Same goes for all asynchronous queries. Also, ARB_occlusion_query2 goes even farther as it can theoretically make the results available already after the first fragment passed the depth test so, again, theoretically, it is unnecessary to wait for the query to finish.

This use-case sounds interesting (unlike the rest). I was not aware of that since I’m not using VBO mapping.

Nevertheless, thank you for hints on single-context usage of sync objects.

Also apologize for drawing discussion from the main OP’s question. But I hope it was useful to clear up some facts about timing and synchronization in OpenGL.