Render performance to GBuffer

I am working on a deferred renderer for some time now, and i’m down to performance improvement part.
I have a model that has 36 meshes summing about 74000 faces, The geometry is rendered as VBOs into GBuffer and i get around 1 ms in result for rendering only the geometry (using a Geforce GT 440 on Linux).
Now my question is: does anyone have experience with this, and can tell me if this is about the best you can get? it is very hard to benchmark and compare to others and the easiest answer can come from experience i guess.

I know i really should be doing profiling my opengl calls but it is all a real hassle since i’m working on linux and i just want to know if there is a big difference with what someone has already worked on.

Thanks in advance

You can do a fair amount of basic “kick the tires” tests without even touching the code. First, it doesn’t look like you are triangle setup bound. Even if you have the slower version of a GT440, it should be capable of rasterizing around 594,000 tris/ms, so you’re only getting around 10% of that rate.

Next, resize your window a little. Do you see an improvement with small reductions in window size and worse performance if you resize the window a little larger? If so, then you’re at least partially fill-bound. You can find this out a number of other ways (with code changes though), such as drawing the same number of tris but drawing them smaller, reducing the number of G-buffer channels you’re writing to, etc.

Are you doing any state changes between drawing these faces? If so, you’re at the mercy of your CPU and its speed there. Comment these out, and check the timing. Also to make sure you’re profiling just G-buffer rasterization time, ensure your timer starts after G-buffer binding and basic setup has been started and you’ve done a glFinish. And before you stop your timer, ensure you’ve done a glFinish(), otherwise you may not be timing what you think you’re timing.

Thanks for the replay. i wasn’t calling glFinish for correct timings, but now i see that the resterization is even slower :(, 1.5 ms
the timer is now working this way:

//all bindings/state changes

my shader for this test is just outputting 3 channels with positin/normal/diffuse(white) and no effects. I have tried both directly outputting the position to a 16f frame buffer without packing, and a also packing only the view distance to a 32bit framebuffer and while using a 32bit framebuffer is faster in general, but the number of channels and data size doesn’t really affect the Gbuffer rasterization time that much.

I guess it could also be a driver problem on linux, i will test the timings on windows and repost if that is the case for future reference for others.

You measure CPU time. That has little to do with the time required for the GPU to render your stuff. You want to use ARB_timer_query to measure actual rendering time on the GPU.

ARB_timer_query is useful if you know for sure that you are GPU limited. If you don’t, it’s best to start with CPU timers IMO, as what you end up getting from the GPU timers in this case is useless.

Could be there’s a trick I don’t know about though, so please provide tips if you know what I’m talking about.