FBO perf...

Like others, I’ve noticed slowdowns using FBO’s on NVidia hardware. That said, I did some quick tests last night to see what all I could find in regard to performance. To test performance, I simply rendered to an FBO that was the same size as my window’s client area (512x512) instead of rendering to the main window. What I found was that all rendering to all different configurations got me a very similar, and quite large, slowdown. The configurations I tried were:

RGBA32 color renderbuffer + D24S8 renderbuffer
RGBA32 color texture2D + D24S8 renderbuffer
RGBA32 color textureRECT + D24S8 renderbuffer

One exception to my results was rendering to shadow texture, with no color texture attached. This ran very fast (most likely 2x Z-only rendering kicking in).

All that said, it goes against any performance characteristics I’ve been told in the past. Things that I’ve heard are:

  • Rendering to renderbuffers is faster than rendering texture2Ds.
  • Rendering to a textureRect is just as fast as rendering to a renderbuffer.
  • Rendering to a texture2D is slower than rendering to a renderbuffer or textureRect.

Now then, are these things true? Are the slowdowns I’m seeing expected? If these are simply limitations in the current driver, which configuration should I prefer for the best rendering performance (specifically, for use with post processing and HDR :wink: ?

Note that I have 7800GT + 93.81 drivers.

Thanks,
Kevin B

FBO are not mainly intended to replace the ‘main’ front or back buffer. Simply render to a texture using CopyTexSubImage instead and you’ll see how fast FBOs could be: they are really faster under my machine.

Rendering to a texture might be slowler than to a framebuffer because there are filters and parameters applied to it.

Jide,

I’m wanting to render to an FBO that is GL_RGBA16F so I can do HDR rendering. However, at current, the time it takes to render my scene goes from about 25 ms to about 55 ms on average on my 7800GT. This performance drop occurs on any FBO, not just a floating point one. At current, this means HDR would not be possible unless I greatly scaled back my shaders. However, I don’t think I’m doing an unreasonable amount of work, I simply think there is an unoptimized driver path that I’m hitting. Is there a way to work around these issues? Specifically, is there a way to render to an RGBA16F render target without killing my framerate?

Kevin B

Rendering to a texture might be slowler than to a framebuffer because there are filters and parameters applied to it.
No matter what filtering/clamping you use - these are irrelevant when you render to texture.

I’ve just tested these:
-render to 1280x1024 screen, glCopyTexSubImage2D to 2048x1024 texture
-render directly to 1280x1024 region of 2048x1024 texture

No significant difference in my game. Rendering to this texture took about 50% of total frame time in my game in the test, so I would notice at least 30% difference if I would run into such a performance drop you described.

My configuratoin is exactly the same: 7800GT + 93.81 so I believe you actually have a performance problem in your applicatoin.

  1. Make sure you don’t have such thing like automatic mipmap generation enabled for you FBO texture
  2. Make sure you always have viewport and scissor properly set when you render to this texture or clear it

If you’re using 2D texture then you can use POT or NPOT. Don’t know if NPOT will cause any performance problems - I’m not using it since I want to remain compatible with GeForce FX. So, for 1280x1024 I use 2048x1024 texture. If I wouldn’t use scissor test, then I would waste 40% of memory bandtwidth to clear/render to unused 768x1024 pixels.
With 800x600 I use 1024x1024 texture - that means over 50% of unused pixels.

k_szczech,

Very interesting results.

The one thing I’ve found that does look like an issue is shadow mapping. If I turn off all shadows, my framerate is nearly identical to rendering to the main window. For me, all lights are dynamic so I’m switching back and forth between the offscreen FBO and the shadow map texture constantly. That said, perhaps changing between FBOs is what’s hurting me. Are you doing any FBO switching during the course of the frame that might invalidate my conclusion?

I’m certain my viewport and scissor rect are the size of the entire render target when clearing. Also, I chose my test case resolution (a 512x512 window) to ensure that everything is as identical as possible (rules out non-POT rendering performance and rendering to a render target that is a different size as the main window).

Kevin B

While running with the instrumented driver that comes with NVPerfKit, I noticed that the counter GPU0/fast_z_count is roughly 25,000,000 (I’m assuming this fast z rejections per second?) when rendering to an FBO (doesn’t matter what kind). When rendering to the backbuffer, that number goes up to about 45,000,000. This seems to be exactly the kind of hit I’m experiencing, so it looks like this slow down is related to early Z. It could be something I’m doing to shut off early Z, but at this point, it doesn’t really look like it. I’m literally not doing anything differently when rendering to an FBO vs. main window. Are there perhaps different criteria for maintaining early-z when rendering to an FBO?

Thanks,
Kevin B

Okay, it seems I’ve found the problem. It appears that early Z rejection is disabled if stencil test is enabled and the current render target is an FBO. Is this a known driver limitation? For any NVidia people on the forum, I can create a demo app that repros this pretty easily.

Thanks,
Kevin B

It appears that early Z rejection is disabled if stencil test is enabled and the current render target is an FBO.
Are you using packed_depth_stencil?

Are you using packed_depth_stencil?
Yes. AFAIK, that’s the only way to get stencil support using FBO’s on current hardware.

Kevin B

Originally posted by ebray99:
It appears that early Z rejection is disabled if stencil test is enabled and the current render target is an FBO.
Having read your post, I was eager to find out, if my program encounters the same problems. Guess what, I’m experiencing the same issues when rendering soft shadows using a modified shadow wedge algorithm. (I’m using a GeForce 7950 GTX with newest drivers.) And yes, the only way to use stencil in FBOs is the packedDepthStencil-format.

Have you tried different FBO configurations (16/32 bit and so on)?

This problem is serious for me, since a huge amount of geometry has to be rendered that could really benefit from early-z-rejection. I’m not modifiying z, not in the vertex nor in the fragment shader.

If you’ve come up with any new information, please post it here.

Thank you,

Guenther

Why won’t you submit a test case to nvidia? They can tell you it it’s a bug or not…

Why won’t you submit a test case to nvidia? They can tell you it it’s a bug or not…
As far as I know, that has to be requested by someone at NVidia. If that’s not the case, please point out how I can go about submitting a test app and I’ll gladly do so.

Kevin B

do u switch not only colorbuffers, but zbuffers as well for your rendering?

Originally posted by ebray99:
[b] [quote]Why won’t you submit a test case to nvidia? They can tell you it it’s a bug or not…
As far as I know, that has to be requested by someone at NVidia. If that’s not the case, please point out how I can go about submitting a test app and I’ll gladly do so.

Kevin B [/b][/QUOTE]It looks like you have to be a registered developer to use the submit function…
You could pm cass directly, he will surely be able to help

They sure are slow. It takes about 2 msecs just to render one projected shadow in my engine, on my GEForce 7800.

halo:
They sure are slow. It takes about 2 msecs just to render one projected shadow in my engine, on my GEForce 7800.
You aren’t by chance sequentially rendering to textures with different resolutions and/or formats using the same FBO are you? I saw major slowdowns even over RTFB-CTT until I (at spasi’s recommendation) created separate FBOs per res/format and dynamically selected one for use based on that. Apparently, the FBO reconfig involved for switching res/fmt is pretty expensive.

You aren’t by chance sequentially rendering to textures with different resolutions and/or formats using the same FBO are you?
Nope, I was simply rendering to a single offscreen render target. As it turned out, everything was related to GL_STENCIL_TEST being enabled. If that is enabled, early Z appears to shut off. As soon as GL_STENCIL_TEST is disabled, things speed up again.

It looks like you have to be a registered developer to use the submit function…
I am a registered developer over there, but I don’t log on all that often and didn’t realize that option was available. One of the guys at NVidia also contacted me, so I’ve submitted a bug report along with a repro app.

Kevin B