fp slowdown.. is this normal?


on my geforce fx 5700, when using ARB vp + NV fp instead of standard opengl lighting, performance drops to less than 50% (and I am not even using ARB_fragment_program but NV_fragment_program). Is this normal? The program is not doing anything special… just diffuse+specular bump mapping. Can this really be??


GeForce FX does have fixed function lighting hardware, so calculating lighting using a vertex program hardware can sometimes be slower than using the OpenGL fixed function pipeline.

Can you post your vertex and fragment programs?

Yeah, that doesn’t sound strange.
Normal OpenGL lighting is just per vertex basis and no per fragment. So there are no complex fragment calculations. If you use fragment programs instead there’s much more work to do for the fragment processor.

yes of course I know that there’s a lot more per-fragment work to do with bump mapping and fragment programs, but the fact that performance decreases THAT much seems strange to me… even more when I think about the things I read about ATI hardware which seems to be doing this much better. I knew that GF FX is not as good as ATI radeon when using the “newest” extension, but I am surprised that it seems to be that bad.

I will post the vertex and fragment programs soon after some rework.


I don´t know how well you sort your stuff from front to back, when using display lists this can be especially tricky.
Therefore the GPU might have to compute a lot of invisible pixels.

Try adding a z-only pass (no textures, glColorMask set to all false, only depth-test and depth-writes, no alpha test (!), etc.)

Then in the subsequent passes only use depth-testing (GL_LEQUAL), but no depth-writes (and still no alpha-test).

Maybe it can speed your app up, it should be worth a try.


I second the recommendation of doing Z-only first (and making sure you never modify Z in your fragment program).

It’s very likely that a single-texture vertex-lit thing will be 2x the speed of a normal-mapped per-pixel-lit thing. Typically, the FPS will go from 400 fps to 200 fps or something like that. This is because of the higher memory traffic and (much) higher fragment processing load.

The good news is that 200 fps is still plenty good frame rate. You can do 8x as much fragment work and still have a decent frame rate.

If you’re doing single-pass bumped specular per pixel and getting like 50 fps, then something else is wrong, like CPU overhead, or you’re geometry limited using insanely high-vertex models.