Fill rate, memory bandwith, and performance

I’m making a few experiments to see how various common bottlenecks can affect performace. And I’m surprised to see that my GeForce3 Ti200 really has problem drawing things on as screen. Whithout writing anyting (by culling everything), and with VAR, I get about 19Mtri/s (flat colors, no texture enabled, basic OpenGL pipeline and no array data optimization). But when it comes to writing things on the screen, the time to render my 48 meshes (1.9K polygons each) dramatically increases: 8.9Mtri/s for an allmost occluded (fullscreen 1024*768, 32 bpp)screen. Things get better when the screen is less occluded (with the same number of meshes). I’m very surprised (if not disappointed) to see that the fill rate is such a problem. Is this normal? Is there a way to improve things? Is there a common pitfall I fellt in?
Thanks…

NVidias’ cards are renound for having tremendous fill rates. I did a comparison between a geforce2 and a wildcat 4210 (very expensive professional card), and the geforce2 wiped it’s arse on the wildcat when it came to fill rate (and the wildcat doesn’t support multitexturing, so you’d expect it to make up for this in its fillrate to account for multipass renders…but nope, it doesn’t) - the wildcat beat the geforce2 to death on t&l though.

Thanks for your answer knackered!
Could you tell me what kind of test you make to compare fill rate capabilities? It could enable me to find were my error (or wrong interpretation) is.
(this problem really gets on nerve these days…
Thanks!

Regardless of how powerful cards gets fillrate will continue to be the most dominating limitation for most applications out there.

OK Humus,I’m glad to see that this is a common problem…
But nVidia claims that the GeForce3 can do some kind of early Z-test to increase performances. This should give best results when you draw front to back. The problem is that things don’t get better when I do this. Has anyone ever seen the effect of front-to-back drawing with nVidia cards?

Yes, rendering front-to-back should give you a good speed up on GF3/4 and Radeons. You may want to run my old little benchmark GL_EXT_reme to see that this is the case. Real world situations won’t get quite as good performance increases though.
Another thing to check for is whether you do a lot of state changes. Do you often switch textures for instance?

Thanks Humus.
No, and that’s the point: the simplest possible pipeline, with no texture enabled, no state changes, just a call to glDrawElements…

>>Yes, rendering front-to-back should give you a good speed up on GF3/4 and Radeons.<<

shouldnt it give a speedupon ALL cards, IIRC my gf2mx + tnt2 both benifit from this (they dont have early z out)

Humus, I used your little benchmark, but could you be more specific about how the tests are really done? For instance, when counting vertices/sec, are writes on the frame buffer allowed? What is the object being drawn in the overdraw test? How are you counting pixel and texel fillrates?
Did you include wglSwapBuffers in your time measures (this is really important when dealing with small objects: it seems to be the step that takes most time, even with 0 swap interval. But that’s another problem…).
Thanks

Originally posted by zed:
[b]>>Yes, rendering front-to-back should give you a good speed up on GF3/4 and Radeons.<<

shouldnt it give a speedupon ALL cards, IIRC my gf2mx + tnt2 both benifit from this (they dont have early z out)[/b]

Yes, that true, but the difference is much smaller though than on Radeons and GF3/4 as they have HierarchicalZ/occlusion culling which benefits quite a lot from front-to-back rendering. There is some bandwidth savings for older cards too, but no fillrate increase as such. With HierarchicalZ you can reach effective fillrates far beyond the teorethical maximum.

Originally posted by Rml4o:
Humus, I used your little benchmark, but could you be more specific about how the tests are really done? For instance, when counting vertices/sec, are writes on the frame buffer allowed? What is the object being drawn in the overdraw test? How are you counting pixel and texel fillrates?
Did you include wglSwapBuffers in your time measures (this is really important when dealing with small objects: it seems to be the step that takes most time, even with 0 swap interval. But that’s another problem…).
Thanks

Writes to framebuffer is allowed with the T&L tests, otherwise you wouldn’t see the animation

In the overdraw test it’s just projective textured quads that covers the whole screen. So pixels = width * height of screen. The same for the fillrate tests, except that it’s not projective.

I do call glFinish() in the end before I calculate the final time so that all operations are done before getting the final timing.

Im my test, I make several different measures, mainly:
-the time taken by the calls to glDrawElements
-the time taken by wglSwapBuffers
But I don’t really understand some of the results:
when drawing front to back, DrawElements takes most time, and it’s the opposite when drawing back to front. Strange…Besides, the total time is the same in both cases.
So it makes me wonder what these 2 measures really correspond to…