I’m developing a 3D engine which is both OpenGL & Direct3D capable.
I’m seeing quite bad OpenGL performance (possible pipeline stalls) when running Linux & the NVIDIA driver (up to 50ms/frame.) When the same machine is booted into Windows, performance is as expected (below 10ms/frame)
I’m seeing this both on a laptop with a Geforce GT540M and a desktop machine with GTX580.
On Mac OS X, the same OpenGL rendering code also works without performance issues on NVIDIA hardware. Also, Linux + AMD hardware seems to work fine.
The performance issue seems to be proportional to the number of times I change the surfaces bound to the FBO (I use a single FBO object.). Therefore forward rendering without shadows works fine, but anything like adding shadows or postprocessing, or doing deferred rendering starts to bog down the performance.
Actually I’ve narrowed things down a bit … it is not the amount of surface changes after all, but rather the amount of drawcalls that go to the FBO instead of the backbuffer.
For example a forward-rendered, complex scene without bloom post-effect has no problems, as it goes directly to the backbuffer. But the same scene with bloom on must be rendered to the FBO first so that it can be operated on, and for a complex scene that causes a > 20ms performance hit.
Only in one specific scenario. And I use a number of FBOs to render frames, with NVidia, on Linux (for many years), on GTX580, GTX480s, GTX285s (and others) just like you are.
The only time I’ve seen anything like this is when you’re hitting up against (or flat blowing past) GPU memory capacity. When you do, that means the driver can/will start tossing textures and such off the board to try to make room so it can keep everything it needs for rendering batches on there, and that can result in massive frame time hits as it tries frantically to play musical chairs with CPU and GPU memory to render your frame. This includes your shadow textures, which may be swapped off the board to make room for other things when you’re not rendering to them.
So check how much memory you’re using. Use NVX_gpu_memory_info. It is trivial and well worth your while. In my experience, you should never see the “evicted” number > 0 (on Linux). If you do, you’re blowing past GPU memory. Shut down/restart X via logout/login or Ctrl-Alt-Bkspc (or just reboot) to reset the count to 0.
Also, if you’ve got one of those GPU memory and performance wasting desktop compositors enabled, disable it (for KDE, use kcontrol GUI to disable effects/composting, or just Shift-Alt-F12).
As far as controlling which get kicked off first, glPrioritizeTextures is generally mentioned as a no-op. And while NVidia hasn’t updated their GPU programming guide in a good while (3 years), we might have some clue as to how to influence texture/render target GPU residency priority through advice there (see below). But best advice, just never fill up GPU memory and then you don’t have to worry about this.
The problem indeed seems to be using a single FBO. As a test, I switched to using another FBO for shadow map rendering (switching between shadowmaps and the main view is the most frequent rendertarget change for me), and most of the “unexpected” performance hit went away. The rendering as a whole is still some constant factor slower than on Windows & OpenGL, but it’s much more consistent now.
Now just to implement the multiple-FBO mechanism properly and transparently to the caller
The implication of your statement is that there’s a newer improved version. However, IIRC from the NVidia post, it’s not that this a path that was written inefficiently, but just that it’s a slow path. It said that reconfiguring the resolution or internal format of an FBO was expensive, and to avoid doing that a lot.