Quick context: I’m working on a Qt-based application where Qt provides its own GL context for offscreen rendering. We need to do our own offscreen rendering separate from Qt’s update loop, so I’m trying to break our rendering into its own thread. I’m now trying to synchronize the output of that thread to Qt’s offscreen context.
The general approach is:
- Main thread ticks, prepares data necessary for a render, then queues a request to the render thread.
- Render thread wakes up, receives the request, and writes the output to a texture (the render output texture).
- Qt’s scene thread (with offscreen context active and FBO bound) calls glCopyImageSubData to copy the contents of the render output texture into the texture attached to the offscreen context FBO (the offscreen texture).
Synchronization between 2 and 3 is done through a mutex and a GL sync object. The scene thread waits on a RenderFenceReady mutex. Once the render thread is done processing output, it calls glFenceSync, writes that sync object to a shared location, then signals the RenderFenceReady mutex. The scene thread then wakes up, calls glWaitSync on the fence, then finally issues the glCopyImageSubData from the render output texture to the offscreen texture. The fence sync object is released immediately after this.
What’s happening is that on Intel machines, the glCopyImageSubData seems to be issued before the render thread is done writing the output, resulting in not all draw calls being reflected in the copied output. I can confirm this seems to be the case by forcing a glFinish on the render thread at the end of the output. Other vendors don’t seem to have this problem, but it might just be because the GPU is rendering faster.
It’s easy to blame drivers for everything, and it’s far more likely I’m just doing something wrong, but I’m at a loss as to what it is. I’ve carefully read the spec on visibility propagation across threads. It seems as though I’m running afoul of visibility rules 3/4 because glCopyImageSubData doesn’t actually bind the texture, so I tried adding a bind right before. I’ve also tried binding it as GL_READ_FRAMEBUFFER for use with glCopyTexSubImage2D instead.
I’m pretty certain this doesn’t qualify as incoherent memory access, but just in case, I also tried issuing a glMemoryBarrier.
What’s intriguing/annoying is that the same process works for the other type of render request passed to the thread, where the data is readback to the CPU and passed to a video encoder instead. The operative difference in that process is that it’s using glClientWaitSync at the end of the request and mapping for read once it’s ready. But if it was a problem with fence syncs specifically, there likely should be flickering inside the video as well.
Does the process of mutex -> insert fence -> wait fence -> copy described above seem correct? Any other insights?