Some time ago we wrote about very strange problem, causing a driver to stall for some dramatic time, when FP uniforms are changed (or set for the very 1st time) and some geometry were drawn, but not even a bit of comments were got, except for other people complained about pretty the same problem and the official advice to pre-render everything (which is not fairly well in most cases).
But the problem were found and localized - nVidia drivers don’t like exact numbers like ±0.0f, ±0.5f and ±1.0f in FP uniform constant!!! Changing even a bit of mantissa of these “magic” values fixes almost all our problems. To all appearances, taking into consideration FP uniforms “constant” nature, driver thinks that it can improve this shader in order to make it much more fast and powerful (sic), and it creates unique shader realisation for this FP uniform value sub-set!
So please bear in mind, that some optimizations may be wanted to take place right in the middle of your application’s execution.
Hope, that helps someone, who wrecked not a single week to localize, why sometimes lags take places.
And please post a link to that test program on the forum too. I think many of us would like to try that test program to confirm it (lots of quick, free test data for you).
The same goes to GLSTATE uniform semantics. If you want to use state uniforms directly in your fragment shader, bear all these magic things in mind.
Even for glstate.light[0].position.
Tested on GeForceFX, GeForce6, GeForce7 on 93.71 forceware (the very last official drivers)
I had not enough time to make test app, but now I’m ready to post it.
I hoped, that this bug would be fixed, but it is not fixed yet, so we’ve got small test application, which creates some VBOs with the same shaders, which are copied number of times to force effect to appear. It renders 200 quads with 200 copies of the same shader and with unique VBO each.
Buttons 0,1,2,3,4,5 makes it to change one uniform, which presents in lighting calculation as simple additive value (‘H’ displays some help dialog).
0 - uniform is 0.300 (default)
1 - uniform is 0.000 exactly
2 - uniform is 0.001 exactly
3 - uniform is 0.500 exactly
4 - uniform is 0.999 exactly
5 - uniform is 1.000 exactly
After you push the button, program will measure next frame time.
As you may see, when we set this uniform to one of the “dangerous” values (0, 0.5, 1) for the first time, we’ve got big lag.
Nothing special, shader is very easy (if it were more complicated - delay would be much worse, but it’s enough to see, that lag really takes place).
Tested on my 6800GO and I get the big delay only for the value 0.0 (key 1)!
Btw, why does the text in the menu bar changes after pressing it a second time?
There is nothing strange with 8800, I also don’t see any lags on them and on NV30.
It doesn’t happen on GeForceFX (driver is not so optimized, as for NV40) and on 8-series (they have “real” FP uniforms instead of “fake” ones, so the optimization is not necessary).
nVidia tells, that they have some routine in their driver, which optimizes FP in-place, when uniform is changed. They tell, that this optimization must be very fast, but as you see - this is not true sometimes. They don’t want users to have control on this process, but I hope sometimes it would be very comfortable to have some switcher to turn this process off.
they have “real” FP uniforms instead of “fake” ones, so the optimization is not necessary
The pre-G80 cards have real FP uniforms too. It’s just that there are instructions that can be used to eliminate the uniform if it is a specific constant.
Longs Peak is scheduled to have the ability to define certain uniforms as constants, so that such optimizations will be performed only if the user specifies that the uniform is const.