Do I need 2 FBO + textures for proper double Buffering?

decode · April 4, 2021, 8:09am

Hello there,

it may be my dizzy Sunday morning brain, but i am wondering if double buffering while using FBO for post processing only makes sense if i use 2 FBO or 1 FBO with 2 texture color attachments.

Think about:
If the double Buffering display is reading currently from buffer A - and for that it´s reading from the FBO´s color texture,
and in the background the next frame is already rendered offscreen into the same texture,
than i have the same problem as if i am only doing single buffering… right ?

So for the display buffer A I need a off screen texture for post processing,
and for display buffer B I need a separate off screen texture, right ?

Please tell me if i am totaly wrong here … but i just cant find info about this on the web or in my books,
but it sound logic to me.

Thanx
kai

zicklag · April 4, 2021, 2:22pm

I’m no expert, but I don’t think you need two framebuffers.

If you are rendering to the offscreen FBO the pixels in that FBO only make it to the screen once you draw() that FBO on the screen on top of a full-screen quad right? So then your double-buffering would still work because when you swap_buffers() you can no longer even draw or change the buffer that’s on the screen until you draw and swap_buffers() again.

Dark_Photon · April 4, 2021, 2:58pm

It’s not quite clear, but I think you’re concerned about how best to ensure parallel execution of:

GPU execution (rendering) of frame N, and
CPU queuing of frame N+1.

Right?

Also, your processing flow is not quite clear to me, but it sounds something like:

Frame N+0: Render to tex A (via FBO A), display tex A on double-buffered window.
Frame N+1: Render to tex A (via FBO A), display tex A on double-buffered window.
…

and due to concerns about conflicts over tex A between Frames N and N+1, you’re contemplating adding a new texture B, and alternating between textures A and B, like this:

Frame N+0: Render to tex A (via FBO A), display tex A on double-buffered window.
Frame N+1: Render to tex B (via FBO A), display tex B on double-buffered window.
…

Possibly with 1 FBO or 2. And you’re wondering which is better, or whether there’s really any point in all of this. Am I even close?

And which GPU(s) are you targeting by the way?

decode · April 4, 2021, 3:11pm

Hello there
thanx for the answer!
Yess desciped it perfectly … my concern is that tex A is used for frame N while being newly renderd offscreen for frame N+1.
Right now i am working on a intel HD 630 and also on a Nevidia Geforce.
But basicly the approche should be the same for any modern card …
So … can you give me an answer to the question ?
thanx !!

Alfonse_Reinheart · April 4, 2021, 3:19pm

How would that happen? OpenGL is a synchronous API. Unless you’re using image load/store or SSBOs, all OpenGL commands execute as if in the order they are given to the context.

Dark_Photon · April 4, 2021, 3:21pm

Ok, great.

Well, in terms of just ensuring correct rendering, you can just use one FBO and one texture (FBO A and tex A). This because it sounds like there’s no displayed dependency between the content of the render targets (e.g. tex A and the window) across frames.

However, for best performance, you may want to consider a different option…

You’re right to be thinking about this. It’s important to ensure that how you’re submitting the work to the GPU parallelizes well in the driver and doesn’t trigger implicit sync (where the CPU/app has to wait for the GPU/back-end driver to “catch up” before continuing).

This isn’t behavior specified in the GL/GLES spec. It’s going to depend on the implementation of your graphics drivers. So your best guide is going to be the GPU vendor performance recommendation guides. And the profiling results you get from GPU vendor profiling tools is king here. These should let you visualize whether you’re getting the desired CPU/GPU parallelism with your submitted rendering work. That said…

Given experience with several desktop and mobile drivers, as a starting place, I would suggest a picking a technique that:

Allocates a ring-buffer pool of N FBOs,
Round-robin renders to 1 FBO from this pool each frame,
Never changes the resolution or the format of an FBO,
(by binding texture(s) with different resolution/format from the last binding),
Never changes anything about an FBO until N frames after it is rendered to.

where “N” is the number of frames you want the CPU to queue ahead of GPU execution.

Why?

On some GL/GLES drivers, the FBO is “the” container for all unexecuted rendering work for a particular render target, and re-configuring/re-rendering with the same FBO may trigger a GPU full pipeline flush (including fragment work) before CPU queuing is allowed to continue. That’s an implicit sync, which introduces a major bubble in CPU/GPU queuing. The ring-buffer of N FBOs avoids reusing an FBO until all the previous rendering work associated with it is through the GPU pipeline.

Further, some drivers treat a reconfig of the FBO (e.g. change of resolution and/or formats) effectively as a full delete and recreate of the framebuffer, which is very heavyweight. Think full pipeline flush. FBOs are already about the most heavyweight object in GL/GLES. So we want to avoid that cost at runtime. So I would avoid changing the resolution/formats that an FBO is rendering to, even if you do decide to dynamically rebind new textures to it at render time (e.g. if it starts 512x512 RGBA8, that never changes going forward). And it may go without saying, but don’t dynamically create and delete FBOs at render time.

Finally, don’t go crazy with creating tons of FBOs. There is some memory cost per FBO, separate from the space required by its attachments. And if you’re very GPU memory constrained (e.g. on mobile), this matters.

So far I’ve only talked about FBOs, not the textures their rendering to. That’s because so long as you’re only changing the content of the texture by rendering to it through the GPU pipeline, that should pipeline very well in the driver and the GPU. So you probably don’t need different textures for this.

Where you “do” end up with problems is when you try to upload new content to a texture “from the CPU” (e.g. glTexSubImage2D()) while the GPU is still rendering using the previous content in that texture on the GPU. Then you end up in driver-specific voodoo land where it will either: 1) “ghost” the texture behind the scenes and upload the content to the new texture, 2) block the CPU update until the GPU is finished rendering with the old contents of the texture, or 3) try to save-off the uploaded texture data so it can defer the update until later. Best bet: do all of your CPU texture uploads at startup, not at render time. However, this texture update “from the CPU” case isn’t the texture update case you’re talking about though. You’re updating its contents with the GPU pipeline. So I think you’ll be fine with one texture.

And again, always run a GPU profiler so you can see how your rendering work is parallelizing on the CPU and the target GPU (or not!). These profiling tools can make it pretty easy to see when an unintended synchronization is happening in the driver.

Don’t know much about Intel. But NVIDIA was one where you definitely don’t want to change the resolution or format of an FBO at runtime. Reuse from a pool of FBOs per res/format combination.

decode · April 4, 2021, 4:00pm

Wow … thank you very mutch for the detailed answer!!

I will work it through and see if i can make use of all the input.

take care and stay healthy!

Dark_Photon · April 4, 2021, 5:52pm

Sure thing.

Also, if in your current frameloop you don’t have any reason to render to more than 1 texture, you may be just fine (max perf-wise) creating 1 FBO and binding 1 texture to it at startup, and always rendering to that same FBO/texture every frame. There’s no FBO reconfigs in that case, and it’s pretty common. So it’s likely the driver devs would have optimized for that use case. You’ll just have to profile and see.