What is the most efficient way to use FBOs?

r-jaoui · June 28, 2021, 12:35pm

Hello !

I have a quick question : If for example I need to use some kind of rendering context in which I have some amount of textures on which I need to render things, is if better to have a single “static” framebuffer, on which I change the attachement, or should I have an FBO for each texture ? If it depends, what does it depend on ? What are the benefits of the approaches (in terms of memory usage/limitations such as the number of FBOs/speed) ?

I also have the same question about the stencil buffer, which in my case is only used when rendering and doesn’t need to store data outside outside of the draw calls.

Thanks !

Dark_Photon · June 29, 2021, 12:01am

It depends on the implementation of the driver. Either:

Consult the GPU driver implementer’s OpenGL (or OpenGL ES) developer guides or recommendations. And/or…
Do some profiling in a decent GPU profiling tool to ensure that the GL work for each texture is being queued and executed without instigating “implicit sync” bubbles.

A few real-world examples:

On Imagination Tech PowerVR OpenGL ES drivers a few years back, we were seeing very poor performance when doing render-to-texture (RTT) to multiple textures in a frame by using a single shared FBO for each RTT (changing the COLOR0 attachment between each RTT pass). On the advice of ImgTech devtech, we switched this from using one FBO to a small pool of FBOs used in round-robin fashion. This alleviated the GL command queuing hang-ups.

As it turned out, under-the-hood the driver was using the framebuffer/FBO as “the” container for all work queued in the pipeline for a specific render pass. Whenever we tried to reconfig the FBO to render into the next texture, the driver was flushing the pipeline to complete all work queued on that FBO for the previous texture before it would let it be reconfigured. You couldn’t go hog-wild creating a bunch of FBOs for this though because there was a fixed per-FBO GPU memory cost, and total GPU memory sizes on mobile were/are extremely limited.

As another for instance, on NVIDIA desktop OpenGL drivers years ago, I found out through their devtech guys that reconfiguring an FBO (e.g. via changing attachments) was particularly expensive when the resolution and/or texture format(s) changed. They recommended keeping pools of FBOs binned by size and format. This so that when the next RTT task came along, the app could try to pick an FBO that had already been setup for the correct size and format(s) so that the attach could be completed quickly.

Anyway, these examples are probably way more than you care about, but hopefully illustrate that the “best option” varies based on the driver implementation.

Bottom line: You can try doing RTT to multiple textures via one FBO. But before you accept that approach, profile this on every driver you care about in a decent GPU profiling tool where you can see the scheduling of the CPU queuing and GPU execution and can verify that you’re getting the optimum queuing behavior that you want.

(Related: In Vulkan you have to manage all this explicitly. But in OpenGL, you at least need to be aware that it’s going on. Different drivers will implement these under-the-covers details differently, with varying impacts on app rendering performance.)