glCopyTexSubImage faster than FBO?

I am currently looking into FBOs, until now I have always used glCopyTexSubImage. I came across this tutorial

The available (src+bin) file of the tutorial, has timing for doing the same thing with FBOs and glCopyTexSubImage.

The problem is that on 2-3 systems I tested the demo on (Quadro 5500 FX, Geforce 6600 GT, GTX 480) glCopyTexSubImage either outperformed FBOs or was on paar.
This aslo held true when I compiled the source and increased the size of the dynamic texture from 256 to 1024.

Can anyone explain this?
Is the demo also giving the same results on your system?

Thanks in Advance

Not for sure. I haven’t retimed this lately. But years ago I recall that there was an NVidia issue with FBOs where changing the size or format of the attachments on an FBO was pretty expensive. So much so that I too was getting better performance with a render-to-system-framebuffer and then copy-to-texture (glCopyTexSubImage) approach, versus a render-to-texture directly using FBO approach.

The solution an NVidia guy offered is that you have one FBO per res+format combination you need to render to. That way you never force the FBO to go through this expensive reconfiguration. I haven’t retimed this lately though so I don’t know if this is still necessary for best performance.

I haven’t looked at that demo yet though. Are they changing the rendered res or format an FBO is targetting?

I have witnessed the same behaviour on older hardware+older drivers. My guess is, that glCopyTexSubImage2D is just a very old, thus very optimized path.

Updating the drivers helped to bring glBlitFramebuffer on par (if not faster than) with glCopyTexSubImage2D on the same old hardware.

Mind you that glCopyTexSubImage2D is not allowed on multisampled framebuffers, so glBlitFramebuffer is the “safer bet” :slight_smile:

Plus glCopyTexSubImage2D from normal framebuffer is only guaranteed to works for pixels passing the ownership test, so it can easily break due to tooltips, popups, window slightly out of screen, etc.

This is not a problem particular to glCopyTexSubImage2D, but to all operations on the window framebuffer, even glBlitFramebuffer can be affected by it.

No the demo is pretty straight forward.
It uses standard techniques and ways of performing FBO, as featured also in all other tutorials on the net.
This is why I am asking, maybe I oversaw something in the tutorial.

Even if glCopyTexSubImage has a few drawbacks, it is important
to know that for regular usage you can still use it.
I was under the impression as advertised by the vendors that FBOs are always superior to the “old” way.

Which might no be true. We still have to do trial and error.

Please if you find anything useful or run the demo, let me know.

Ran demo last night on a GTX480.

Even at full-screen (1280x1024), whether AA cranked up or off, frame rate for both paths (FBO vs. glCopyTexSubImage) was ~1400fps. Difference between them was noise (down in the couple fps range).

If you convert this fps to milliseconds, you’ll see what I mean.

This is apparently a poor test for timing, at least on a modern GPU.