Reusing FBO or texture contents in multiple frames

armmm · April 7, 2023, 8:18am

Hi, I am working with OpenGL ES 3.0. I am at a point of trying to optimize my rendering implementation and have run into issues.

I am using offscreen FBOs and rendering into textures to be able to do some processing before rending the final processed texture on screen. My initial implementation was updating the entire texture which is to be rendered on screen for every frame. However, from one frame to the next only a small part of the texture changes (maybe 5% of it) which I can represent with a few triangles, so I thought I may be able to keep the rest of the texture intact and just render this part which has changed.

I have tried a few different approaches and none of them have worked:

Calling glDrawArrays() passing to it only the triangles that have changed from previous frame.
Using a stencil buffer to mask the area which needs to be changed.
Using a scissor box to only render the bounding rectangle of the area that has changed.

All of these approaches gave me the same result - parts of the texture look ok at times, parts of it are garbage and it is flickering heavily. The only way to avoid flickering has been to call glClear without a scissor box enabled, which is something I obviously want to avoid if possible.

I am using the same FBO for each frame and this is the only thing I use the FBO for.

The question here is, am I trying to do something that is not possible? Do I have to do something like render the entire texture first, and then render the parts that have changed (this could potentially be an improvement I guess, if the original shaders are complex).

Dark_Photon · April 7, 2023, 1:09pm

OpenGL ES. Is this with a mobile GPU? Which GPU and driver version?

No, this is totally possible and reasonable. However, especially if you’re using a mobile GPU, you have to think like a driver here.

A few primer questions:

How is your texture being updated? Only from GPU rendering via FBO? Or from CPU-side updates (e.g. glTexSubImage2D()
Is your rendering flow: 1) render to FBO/texture, 2) render texture to screen (rinse/repeat)?
Are you fairly sure you’re not triggering a mid-frame pipeline flush?
Which GPU and driver version?

What is prompting you to render texture changes incrementally? A known bottleneck? Or a suspicion on the bottleneck?
Are you seeing any performance issues with the “update everything” approach?

armmm · April 7, 2023, 1:50pm

Thank you for the response. I am working with a mobile GPU:
ARM, Mali-T760, OpenGL ES 3.2 v1.r13p0-00rel0.5f9f712b40bc6b0a4ce7beeede5b0216

I have tried this on my laptop running Ubuntu as well with same effects though.

To answer your other questions:

The texture is being updated only by rendering via FBO.
The flow is: 1) render to FBO/texture (I want this to be incremental), 2) render that texture into another FBO which I believe further renders into a texture which ends up on screen. I am using Qt and its QQuickFrameBufferObject, which essentially gives you an FBO to render into and this is what I render into in my 2nd step.
What could cause a pipeline flush? I am not calling glFinish or glFlush in my rendering code, but I am changing some state such as viewport, enabling/disabling blending, depth test, etc, which I assume can’t cause it? Once I render into the Qt’s FBO, I don’t know what happens after that, if that is even relevant?
GPU and driver as written above.
I have measured the times required for these two rendering passes and have found that the first one is the bottleneck. I am rendering into two textures of size 1024x1024 in that pass, and that is taking too long. The second pass renders the resulting textures into one ouptut texture and takes about 50% of the time. I made measurements by calling glFinish() after every rendering pass to make sure I get relevant results. Obviously, this slows everything down additionally, but I at least get to see where my bottleneck is.

Dark_Photon · April 9, 2023, 12:16am

Ok. So tile-based GPU, with extremely slow DRAM backing your render targets. You have to structure your rendering for this as a primary constraint.

For minimum memory bandwidth (and to avoid dependencies between frames and render targets):

At the beginning of rendering to a render target, you should glClear all buffers completely (all writemasks enabled, scissor testing disabled).
After rendering to a render target, call glInvalidateFramebuffer() or glDiscardFramebufferEXT() on all rendered buffers that you henceforth don’t care about the contents of (typically DEPTH and STENCIL – i.e. helper buffers),
Avoid all operations in the middle of rendering to a render target that may trigger a pipeline flush and/or a stall (e.g. calling glFinish(), waiting on a sync object, reconfiguring FBOs, overrunning the geometry buffer, updating buffer objects, reading back pixel data, etc.).
Avoid all FBO reconfiguration.

Violating #1 will force the GPU/driver to read the prior contents of the render target buffer(s) from memory prior to tile rasterization (added DRAM bandwidth). Violating #2 will cause the GPU/driver to write out the contents of these buffer(s) to memory after tile rasterization (added DRAM bandwidth). Violating #3 will cause multiple tile-based rasterization passes (with whole framebuffer write and read passes in the middle – added DRAM bandwidth) in order to render to your render target, and potentially trigger rendering artifacts (with downsampling, occlusion testing, blending, etc.) Violating #4 will likely trigger implicit synchronization (read: CPU-side stall), as GL-ES drivers often manage rendering and rendering commands per framebuffer, so reconfiguring FBOs often needs to wait on pending ops for that FBO to complete. If you absolutely must reconfig FBOs in your draw loop, use a ring buffer of them, and make sure you don’t reconfig an FBO until at least 2-3 frames have elapsed since you list issued draw work for it.

Which GPU?

If the onboard GPU in your CPU chip or motherboard, then that’s not too surprising as these are typically tilers as well, for the same reason your Mali T-760 is (sloooww DRAM backing the framebuffers, not fast GRAM + high-speed memory buses as with the discrete GPUs)

Ok, so you don’t have to worry about texture ghosting, like you would with CPU-side texture updates. But…

You do need to think about dependencies between render passes though.

With a full update “from scratch” of each render target, you can glClear() all of your buffers to avoid DRAM reads of the prior buffer/texture contents when beginning rasterization.

However with partial updates, you’re forcing the GPU/driver to pull in the old contents. This is extra DRAM bandwidth and time, which by-itself will be slower. You have to have enough savings later on to dwarf this extra cost/time.

Also, lots of render passes at high res is if course going to be slower than one.

Generally a pipeline flush is something that causes rasterization of a specific framebuffer to be interrupted in the middle for some reason. Things like:

calling glFinish()
waiting on sync objects,
doing readbacks,
reconfiguring FBOs,
running out of geometry buffer space (too many primitives),
etc.

Normal changing of draw render state probably won’t do this. But it’s going to depend on the implementation of the GPU and driver.

In a vendor-specific GPU profiling tool, you’ll be able to tell whether this is occurring. Or more generally, why rendering seems to slow down at a particular point (e.g. for you, for that first render-to-texture (RTT) render pass).

I will say that “flickering” can be a sign that you are triggering a pipeline flush. I mention this because you talked about flickering:

That said, the flickering might be caused by your intermediate rendering results not being preserved like you think they should be.

Ok first…

For good performance, mobile GPUs are completely dependent on waiting to rasterize anything for a render target (think FBO) until everything for that render target has been received, transformed (vertex shaders), and the resulting work binned into screen tiles. As such, it expects to perform rasterization for a frame 1-2 frames later than the app queues the work and the GPU transforms the vertices. When you call glFinish(), you completely prevent the GPU/driver from doing this. This causes phases of your frame to take more time than they otherwise would, and completely thwarts overlap of queuing and transform work with rasterization work. This glFinish() profiling method is also going to avoid you hitting the same intra-render pass dependency bottlenecks that you’re going to need to identify and resolve at some point.

So for profiling on a mobile GPU, I’d highly recommend pulling out the mobile GPU vendor’s profiling tools and just see how your frame is being scheduled. This will most likely make it obvious what you’re bottlenecked on, and either suggest solutions or give you specific questions to ask of the GPU vendor.

Well, we’ve established that the way you’re profiling isn’t a good measure of the performance of a render pass on a mobile GPU. So I’d put a big mark on those results. The fact that you’re not initially performing a glClear() for that first render pass could be what’s slowing it down, by demanding that the GPU read in the old contents of the COLOR buffer (and possibly DEPTH and STENCIL, if you’re not clearing them either) from slow DRAM, slowing up your first render pass.

I would suggest disabling your partial updates, go back to doing full updates, and profile that with the vendor profiling tool. Missing glClear + glDiscard/InvalidateFramebuffer calls could be the cause of your performance problems.

Also re the flickering…

Could you post pictures of the flickering artifacts you said you were getting?
Or describe in more detail what you mean by that?
Are parts of the scene solid and parts missing … some frames?
Do the parts that flicker correspond to the parts that you’re updating incrementally or the parts that should be preserved from prior frames?
Which render targets/textures are you assuming preserve their prior contents?
You’re not assuming the final render target preserves its contents, correct? Because often for efficiency sake they do not.
What is the final render target? An EGL window? PBuffer? Pixmap?
Do you know which pass is causing the flickering? If so, talk more about what you’re doing there.