Ok, great.
Well, in terms of just ensuring correct rendering, you can just use one FBO and one texture (FBO A and tex A). This because it sounds like there’s no displayed dependency between the content of the render targets (e.g. tex A and the window) across frames.
However, for best performance, you may want to consider a different option…
You’re right to be thinking about this. It’s important to ensure that how you’re submitting the work to the GPU parallelizes well in the driver and doesn’t trigger implicit sync (where the CPU/app has to wait for the GPU/back-end driver to “catch up” before continuing).
This isn’t behavior specified in the GL/GLES spec. It’s going to depend on the implementation of your graphics drivers. So your best guide is going to be the GPU vendor performance recommendation guides. And the profiling results you get from GPU vendor profiling tools is king here. These should let you visualize whether you’re getting the desired CPU/GPU parallelism with your submitted rendering work. That said…
Given experience with several desktop and mobile drivers, as a starting place, I would suggest a picking a technique that:
- Allocates a ring-buffer pool of N FBOs,
- Round-robin renders to 1 FBO from this pool each frame,
- Never changes the resolution or the format of an FBO,
(by binding texture(s) with different resolution/format from the last binding),
- Never changes anything about an FBO until N frames after it is rendered to.
where “N” is the number of frames you want the CPU to queue ahead of GPU execution.
Why?
On some GL/GLES drivers, the FBO is “the” container for all unexecuted rendering work for a particular render target, and re-configuring/re-rendering with the same FBO may trigger a GPU full pipeline flush (including fragment work) before CPU queuing is allowed to continue. That’s an implicit sync, which introduces a major bubble in CPU/GPU queuing. The ring-buffer of N FBOs avoids reusing an FBO until all the previous rendering work associated with it is through the GPU pipeline.
Further, some drivers treat a reconfig of the FBO (e.g. change of resolution and/or formats) effectively as a full delete and recreate of the framebuffer, which is very heavyweight. Think full pipeline flush. FBOs are already about the most heavyweight object in GL/GLES. So we want to avoid that cost at runtime. So I would avoid changing the resolution/formats that an FBO is rendering to, even if you do decide to dynamically rebind new textures to it at render time (e.g. if it starts 512x512 RGBA8, that never changes going forward). And it may go without saying, but don’t dynamically create and delete FBOs at render time.
Finally, don’t go crazy with creating tons of FBOs. There is some memory cost per FBO, separate from the space required by its attachments. And if you’re very GPU memory constrained (e.g. on mobile), this matters.
So far I’ve only talked about FBOs, not the textures their rendering to. That’s because so long as you’re only changing the content of the texture by rendering to it through the GPU pipeline, that should pipeline very well in the driver and the GPU. So you probably don’t need different textures for this.
Where you “do” end up with problems is when you try to upload new content to a texture “from the CPU” (e.g. glTexSubImage2D()
) while the GPU is still rendering using the previous content in that texture on the GPU. Then you end up in driver-specific voodoo land where it will either: 1) “ghost” the texture behind the scenes and upload the content to the new texture, 2) block the CPU update until the GPU is finished rendering with the old contents of the texture, or 3) try to save-off the uploaded texture data so it can defer the update until later. Best bet: do all of your CPU texture uploads at startup, not at render time. However, this texture update “from the CPU” case isn’t the texture update case you’re talking about though. You’re updating its contents with the GPU pipeline. So I think you’ll be fine with one texture.
And again, always run a GPU profiler so you can see how your rendering work is parallelizing on the CPU and the target GPU (or not!). These profiling tools can make it pretty easy to see when an unintended synchronization is happening in the driver.
Don’t know much about Intel. But NVIDIA was one where you definitely don’t want to change the resolution or format of an FBO at runtime. Reuse from a pool of FBOs per res/format combination.