FBO performance

On NVidia hardware, I’ve noticed that rendering to an FBO is slower than rendering to the render target associated with my rendering context (the backbuffer). I would like to render to an FBO for use with postprocessing, however, this slow down is very prohibitive. At first, I was rendering to non-power-of-two textures. I thought this might be the source of my problem, so I made the render target a standard power of two nonsquare texture. A slight performance increase was noticed, but nothing major. Then I changed the texture to be square, but got no performance increase. I was also rendering to an RGBA 64-bit float-16 surface, so I changed it to an RGBA 32-bit surface. Still, poor performance.

One odd thing I’m doing that might be the cause:

I use the stencil buffer for clipping an arbitrary volume (a depth only pass is rendered at the beginning of each frame). I clear the stencil buffer when clip volume change. Perhaps there is some stuff in the driver not optimized here?

I have noticed that if I disable the clipping volumes when rendering to the back-buffer, I get approximately the same performance as when rendering to an FBO. If I’m rendering to an FBO and I disable the clipping volumes, I get no performance change. I’m guessing that the shaders are always being executed, even for fragments that are rejected by the stencil test.

Any ideas on what could be causing this? Are there some things that can cause rendering to an FBO to be slower than rendering to the back buffer? This behavior occurs on both my GeForce 6600GT and GeForce 7800GT. I’m currently using driver 84.25, but this has been happening since the first driver to support depth-stencil render buffers came out.

Any help would be greatly appreciated.

Thanks,
Kevin Bray

Any ideas on what could be causing this? Are there some things that can cause rendering to an FBO to be slower than rendering to the back buffer?
im no expert but i believe this is expected (and logical)
the reasons to use FBO are if u wish to use the framebuffer later on, eg as u mention as a texture with postprocessing or if u need an area that is larger than the window or if the window will be covered by another window. not for any speed reasons.

i add if u do need the backbuffer as a texture later on using FBO will most likely be the fastest option

By any chance are you doing any glCopyTexImage*/SubImage with your FBO surface? I don’t know about the current driver, but i got really bad performance on nVidia hardware a year or so ago, if i used the aforementioned calls with FBO. I wonder if you are doing something similar?

If I recall correctly you have to use something like http://oss.sgi.com/projects/ogl-sample/registry/EXT/packed_depth_stencil.txt in order to use a stencil buffer in a FBO on nVidia hardware.

That’s right… Packed depth stencil is needed. I ran into the same problem just a few weeks ago.

Originally posted by zed:
[quote]Any ideas on what could be causing this? Are there some things that can cause rendering to an FBO to be slower than rendering to the back buffer?
im no expert but i believe this is expected (and logical)
the reasons to use FBO are if u wish to use the framebuffer later on, eg as u mention as a texture with postprocessing or if u need an area that is larger than the window or if the window will be covered by another window. not for any speed reasons.
[/QUOTE]I anticipate that rendering to an FBO would be just as fast as rendering to the primary surface. It is in D3D as far as I can tell. It would be somewhat useless otherwise… I’d be better off using CopyTexSubImage which is signifigantly faster than rendering to an FBO currently.

By any chance are you doing any glCopyTexImage*/SubImage with your FBO surface? I don’t know about the current driver, but i got really bad performance on nVidia hardware a year or so ago, if i used the aforementioned calls with FBO. I wonder if you are doing something similar?

Hmmm… not currently, but this is something I’ll likely be doing in the future. Thanks for the heads up on this.

If I recall correctly you have to use something like http://oss.sgi.com/projects/ogl-sample/registry/EXT/packed_depth_stencil.txt in order to use a stencil buffer in a FBO on nVidia hardware.
Yep, I’m already using it. My problem isn’t actually rendering to stencil, my problem is performance. When I render to the primary surface, it seems that a shader is not run if the stencil test fails. When I render to an FBO, it seems like the shader is run all of the time regardless. Of course, this is just a guess since I can’t actually know if shaders are being run or not. Performance wise though, it seems like this is the case.

Does anyone know if early Z is not used when rendering to an FBO?

Thanks,
Kevin Bray

There were a couple posts several months ago about early Z working for 8-bit per component render targets, but not floating point. Perhaps early stencil doesn’t work on any of them?

It is in D3D as far as I can tell.
It is if you’re rendering to a buffer, not a texture. When rendering to a swizzled texture, you’re going to lose some performance.

If you need full-speed performance for render targets, use a renderbuffer as your destination target. If you need to texture with it, you’re going to have to live with some performance loss… unless you use a texture rectangle.

A texture rectangle (as opposed to an NPOT) is generally not swizzled on nVidia hardware. So you should be able to render to it with regular performance.

Originally posted by Korval:
[b] [quote]It is in D3D as far as I can tell.
It is if you’re rendering to a buffer, not a texture. When rendering to a swizzled texture, you’re going to lose some performance.

If you need full-speed performance for render targets, use a renderbuffer as your destination target. If you need to texture with it, you’re going to have to live with some performance loss… unless you use a texture rectangle.

A texture rectangle (as opposed to an NPOT) is generally not swizzled on nVidia hardware. So you should be able to render to it with regular performance. [/b][/QUOTE]Hmmm… very good to know. Is rendering to a texture slower than rendering to a buffer even when rendering to a standard power of two texture?

Thanks for the info!

Kevin B.