Synchronizing a texture between threads

cmartel · October 23, 2019, 7:26pm

Quick context: I’m working on a Qt-based application where Qt provides its own GL context for offscreen rendering. We need to do our own offscreen rendering separate from Qt’s update loop, so I’m trying to break our rendering into its own thread. I’m now trying to synchronize the output of that thread to Qt’s offscreen context.

The general approach is:

Main thread ticks, prepares data necessary for a render, then queues a request to the render thread.
Render thread wakes up, receives the request, and writes the output to a texture (the render output texture).
Qt’s scene thread (with offscreen context active and FBO bound) calls glCopyImageSubData to copy the contents of the render output texture into the texture attached to the offscreen context FBO (the offscreen texture).

Synchronization between 2 and 3 is done through a mutex and a GL sync object. The scene thread waits on a RenderFenceReady mutex. Once the render thread is done processing output, it calls glFenceSync, writes that sync object to a shared location, then signals the RenderFenceReady mutex. The scene thread then wakes up, calls glWaitSync on the fence, then finally issues the glCopyImageSubData from the render output texture to the offscreen texture. The fence sync object is released immediately after this.

What’s happening is that on Intel machines, the glCopyImageSubData seems to be issued before the render thread is done writing the output, resulting in not all draw calls being reflected in the copied output. I can confirm this seems to be the case by forcing a glFinish on the render thread at the end of the output. Other vendors don’t seem to have this problem, but it might just be because the GPU is rendering faster.

It’s easy to blame drivers for everything, and it’s far more likely I’m just doing something wrong, but I’m at a loss as to what it is. I’ve carefully read the spec on visibility propagation across threads. It seems as though I’m running afoul of visibility rules 3/4 because glCopyImageSubData doesn’t actually bind the texture, so I tried adding a bind right before. I’ve also tried binding it as GL_READ_FRAMEBUFFER for use with glCopyTexSubImage2D instead.

I’m pretty certain this doesn’t qualify as incoherent memory access, but just in case, I also tried issuing a glMemoryBarrier.

What’s intriguing/annoying is that the same process works for the other type of render request passed to the thread, where the data is readback to the CPU and passed to a video encoder instead. The operative difference in that process is that it’s using glClientWaitSync at the end of the request and mapping for read once it’s ready. But if it was a problem with fence syncs specifically, there likely should be flickering inside the video as well.

Does the process of mutex → insert fence → wait fence → copy described above seem correct? Any other insights?

cmartel · November 26, 2019, 9:41pm

Just bumping this with the solution I eventually found.

The fence sync object is released immediately after this.

This is close to the problem. What I forgot to mention and take into account is that the texture is also released immediately after calling glCopyImageSubData. The GL specification is pretty clear that deleting a texture will release the name, but the data will linger as long as necessary if it’s still “in use”. The specification is also very clear on what constitutes being “in use”:

A buffer, texture, sampler, or renderbuffer object is in use if any of the following conditions are satisfied:
• the object is attached to any container object
• the object is bound to a context bind point in any context
• any other object contains a view of the data store of the object.

None of these include “the object is part of a call to glCopyImageSubData”.

What was causing variant behaviour was indeed render speed specific rather than driver specific. The renderer backend has a resource pool to reuse similar textures instead of allocating fresh new ones every time and, depending on render speed, this could either end up being the fully formed following frame, or a partially rendered one.