I am working on a deferred renderer for some time now, and i’m down to performance improvement part.
I have a model that has 36 meshes summing about 74000 faces, The geometry is rendered as VBOs into GBuffer and i get around 1 ms in result for rendering only the geometry (using a Geforce GT 440 on Linux).
Now my question is: does anyone have experience with this, and can tell me if this is about the best you can get? it is very hard to benchmark and compare to others and the easiest answer can come from experience i guess.
I know i really should be doing profiling my opengl calls but it is all a real hassle since i’m working on linux and i just want to know if there is a big difference with what someone has already worked on.
You can do a fair amount of basic “kick the tires” tests without even touching the code. First, it doesn’t look like you are triangle setup bound. Even if you have the slower version of a GT440, it should be capable of rasterizing around 594,000 tris/ms, so you’re only getting around 10% of that rate.
Next, resize your window a little. Do you see an improvement with small reductions in window size and worse performance if you resize the window a little larger? If so, then you’re at least partially fill-bound. You can find this out a number of other ways (with code changes though), such as drawing the same number of tris but drawing them smaller, reducing the number of G-buffer channels you’re writing to, etc.
Are you doing any state changes between drawing these faces? If so, you’re at the mercy of your CPU and its speed there. Comment these out, and check the timing. Also to make sure you’re profiling just G-buffer rasterization time, ensure your timer starts after G-buffer binding and basic setup has been started and you’ve done a glFinish. And before you stop your timer, ensure you’ve done a glFinish(), otherwise you may not be timing what you think you’re timing.
my shader for this test is just outputting 3 channels with positin/normal/diffuse(white) and no effects. I have tried both directly outputting the position to a 16f frame buffer without packing, and a also packing only the view distance to a 32bit framebuffer and while using a 32bit framebuffer is faster in general, but the number of channels and data size doesn’t really affect the Gbuffer rasterization time that much.
I guess it could also be a driver problem on linux, i will test the timings on windows and repost if that is the case for future reference for others.