Shadow mapping for point lights demo

Humus · September 23, 2002, 7:22am

That would confirm that it’s the rendering to the cubemap that’s slow. It’s as if the card doesn’t support rendering to cubemaps natively and does a copy-to-texture operation. Still that shouldn’t make it this slow.

harsman · September 23, 2002, 7:33am

I think render to texture is slower than copy to texture on nvidia hardware, especially with older drivers. It has improved with recent driver releases but it might still be slower. I haven’t done or seen any recent benchmarks.

Lars · September 23, 2002, 8:40am

It could also be because of the 5 general combiners u are using. There is an Nvidia presentation where they said how expensive the combiners are but i don’t remeber which one it was.
On Geforce3 up,i think only two general combiners are totally free, till four costs twice as much and till 8 four times as much (without guarantee). So if you can reduce the number of stages, it could get a bit faster.
I ve seen you where not using the final combiner (just passing D), maybe you can squeeze something from one general combiner into the final combiner and reduce the number of used ones to 4.

Maybe the following works:

glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_A_NV, GL_TEXTURE2_ARB, GL_SIGNED_IDENTITY_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_B_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_C_NV, GL_CONSTANT_COLOR0_NV, GL_SIGNED_IDENTITY_NV, GL_BLUE);
glCombinerInputNV(GL_COMBINER2_NV, GL_ALPHA, GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_BLUE);
glCombinerOutputNV(GL_COMBINER2_NV, GL_ALPHA, GL_DISCARD_NV, GL_DISCARD_NV, GL_SPARE0_NV, GL_NONE, GL_NONE, GL_FALSE, GL_FALSE, GL_FALSE);

glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_A_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_B_NV, GL_ZERO, GL_SIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_C_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER3_NV, GL_RGB, GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_INVERT_NV, GL_RGB);
glCombinerOutputNV(GL_COMBINER3_NV, GL_RGB, GL_DISCARD_NV, GL_DISCARD_NV, GL_SPARE0_NV, GL_NONE, GL_NONE, GL_FALSE, GL_FALSE, GL_TRUE);

glFinalCombinerInputNV(GL_VARIABLE_E_NV, GL_TEXTURE0_ARB,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_F_NV, GL_PRIMARY_COLOR_NV,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
// Was it E_TIMES_F ??? not shure, but something like that

glFinalCombinerInputNV(GL_VARIABLE_A_NV, GL_E_TIMES_F_NV,GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_B_NV, GL_SPARE0_NV, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_C_NV, GL_ZERO, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glFinalCombinerInputNV(GL_VARIABLE_D_NV, GL_ZERO, GL_UNSIGNED_IDENTITY_NV, GL_RGB);

This should do the same but only using 4 general combiners. I can’t test it, because i only have a geforce2go and dont like the emulation mode that much

Lars

edit: some code layout changes

[This message has been edited by Lars (edited 09-23-2002).]

Lars · September 23, 2002, 8:49am

…it is even simpler, just do the operation from combiner 2 in combiner four, and then remove one combiner… no need for the final combiner

glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_C_NV, GL_TEXTURE0_ARB, GL_UNSIGNED_IDENTITY_NV, GL_RGB);
glCombinerInputNV(GL_COMBINER4_NV, GL_RGB, GL_VARIABLE_D_NV, GL_PRIMARY_COLOR_NV, GL_UNSIGNED_IDENTITY_NV, GL_RGB);

I hope it gets a bit faster…

Lars

[This message has been edited by Lars (edited 09-23-2002).]

Humus · September 23, 2002, 10:03am

Hmm, yeah, that should work. But I don’t think that’s the bottleneck anyway. If it was, then there would be a larger difference between hi-res and low-res framerates. From MZ’s post above:

maximized window on 1162 x 864 desktop: 20 fps
fullscreen 640 x 480: 30 fps

Purely fillratewise the 640x480 score should be more than three times higher. I also don’t really think a GF3 should be that much slower than a Radeon 8500. I get 130fps in 640x480 and 60fps in 1152x864.