Ping pong blurring with RT, not FBO switching?

Prune · April 17, 2010, 4:18pm

I assume it would be faster to change render targets than FBOs:

glFramebufferDrawBufferEXT(_fbo, GL_COLOR_ATTACHMENT0);
glBindMultiTextureEXT(GL_TEXTURE0, GL_TEXTURE_2D, _tex1);
HorizontalBlur();
glFramebufferDrawBufferEXT(_fbo, GL_COLOR_ATTACHMENT1);
glBindMultiTextureEXT(GL_TEXTURE0, GL_TEXTURE_2D, _tex0);
VerticalBlur();

Is this valid? Are there any cases where this will not work?

Also, is there any point in calling glTextureBarrierNV() between the two passes, given they are different textures?

(I’m only targeting NV GPUs, GT280 or higher)

[Edit:] A related question is whether glBlitFramebuffer (for AA resolve) can be done between two render targets of the same FBO, and whether there would be any speed improvement.

Prune · April 26, 2010, 2:28pm

Hello…

Dark_Photon · April 26, 2010, 6:29pm

I would time it, because at least at one time on NVidia this definitely wasn’t true at all (maybe still not).

An NVidia guy at the time said it was very expensive to change the resolution or pixel format of the FBO, which is what I was seeing. Was seeing render-to-framebuffer and copy to texture was faster than render-to-texture (and that’s even with a downsample on the copy from framebuffer). He instead suggested creating an FBO for each res/fmt permutation, and picking an FBO from a list based on that. That generated a marked speed improvement, and was faster than render-to-framebuffer-copy-to-texture.

However, it’s been 4 years or so (an eternity) since then, so you should definitely time both and see. And don’t assume the performance characteristics are the necessarily the same across GPU vendors.

SergeyH · April 27, 2010, 12:05am

On NVidia hardware under Linux I observed the following:

Switching textures with different formats/sizes in a single FBO is the slowest
Switching FBOs is much faster, but still is too slow. It can be performed at most 2-3 times per frame without severely harming FPS
Switching between textures of the same format in a single FBO is the fastest. It can be performed many times per frame. We use this option whenever possible.

One trick. Sometimes it is “neccessary” to use textures of different sizes. On such example is estimating average frame luminance in HDR rendering. To avoid expensive switches we use textures of the same size, and control region of interest by means of glViewport.

Prune · April 27, 2010, 1:25am

Interesting. What hardware were you using?

BTW, it seems to me that estimating the luminance only needs to be done every few frames, especially since exposure control is normally low-pass filtered.

In my case, I’m switching resolutions in several cases: HDR bloom, where I blur and combine two to three reduced resolutions, rendering to cube-map for reflections etc., and half-resolution ambient-occlusion. Since I usually have at least a couple of these things going on at a time, using same-size textures for everything might become a memory issue…

SergeyH · April 27, 2010, 2:26am

Dual 9800GT a couple of years ago, single GTX285 now. Targeting dual 1280x720 with 8xMSAA.

Idea with estimating luminance only every few frames is interesting, I’ll keep it in mind. Currently in our engine it isn’t the bottleneck, but it isn’t the fastest part either.

Using same-sized textures isn’t the best option in my opinion, some middle ground should be found. As for memory issue - it is often possible to reuse the same texture when intermediate results can be discarded, and in some cases it is possible to allocate large texture (for example 2048x2048) and render to different parts of it using glViewport.

Prune · April 27, 2010, 10:40am

I guess if there are several reductions by halving the size, they can all be fit in a texture 1.5x1 the size of the starting size (2x1 if POT, but I find everything works fine with NPOT on my hardware).

By the way, as a bit of an aside question I thought that glGenerateMipmaps only could be used for POT textures, but I don’t get any GL errors if I run it on an NPOT texture. When was this requirement relaxed?