nVidia FP uniforms driver optimization lags

Hi all!

Some time ago we wrote about very strange problem, causing a driver to stall for some dramatic time, when FP uniforms are changed (or set for the very 1st time) and some geometry were drawn, but not even a bit of comments were got, except for other people complained about pretty the same problem and the official advice to pre-render everything (which is not fairly well in most cases).

But the problem were found and localized - nVidia drivers don’t like exact numbers like ±0.0f, ±0.5f and ±1.0f in FP uniform constant!!! Changing even a bit of mantissa of these “magic” values fixes almost all our problems. To all appearances, taking into consideration FP uniforms “constant” nature, driver thinks that it can improve this shader in order to make it much more fast and powerful (sic), and it creates unique shader realisation for this FP uniform value sub-set!

So please bear in mind, that some optimizations may be wanted to take place right in the middle of your application’s execution.

Hope, that helps someone, who wrecked not a single week to localize, why sometimes lags take places.

Thank you, this is useful information.

I would advice you submitting a test case to Nvidia, they should fix it.

By the way, according to shader’s complexity, this lag varies from 50 mS to 200 mS, which is unacceptable by all means.

Thanks from me also, a 50ms pause while my app is running would screw things up spectacularly!

And please post a link to that test program on the forum too. I think many of us would like to try that test program to confirm it (lots of quick, free test data for you).

By the way,

The same goes to GLSTATE uniform semantics. If you want to use state uniforms directly in your fragment shader, bear all these magic things in mind.
Even for glstate.light[0].position.

Tested on GeForceFX, GeForce6, GeForce7 on 93.71 forceware (the very last official drivers)

Okey, back to this topic…

I had not enough time to make test app, but now I’m ready to post it.

I hoped, that this bug would be fixed, but it is not fixed yet, so we’ve got small test application, which creates some VBOs with the same shaders, which are copied number of times to force effect to appear. It renders 200 quads with 200 copies of the same shader and with unique VBO each.
Buttons 0,1,2,3,4,5 makes it to change one uniform, which presents in lighting calculation as simple additive value (‘H’ displays some help dialog).
0 - uniform is 0.300 (default)
1 - uniform is 0.000 exactly
2 - uniform is 0.001 exactly
3 - uniform is 0.500 exactly
4 - uniform is 0.999 exactly
5 - uniform is 1.000 exactly
After you push the button, program will measure next frame time.
As you may see, when we set this uniform to one of the “dangerous” values (0, 0.5, 1) for the first time, we’ve got big lag.
Nothing special, shader is very easy (if it were more complicated - delay would be much worse, but it’s enough to see, that lag really takes place).

Link on test program with sources: http://slil.ru/24377623

By the way, NV30 and G80 generations are free from this issue, so, it happens on all GeForce6 and GeForce7 chips.

I can confirm your results, I see it on my 7900gs too. OS is Vista with latest beta drivers.

Tested on my 6800GO and I get the big delay only for the value 0.0 (key 1)!
Btw, why does the text in the menu bar changes after pressing it a second time?

Text in title bar changes because it shows what was the previous uniform value, what is it by now, and next frame duration after uniform has changed.

what about one of the builtin uniforms
eg lightdiffuse color
or one of an vertex attribute
eg glColor

Vertex attribute, surely, don’t get such a result.

Built-in uniforms behaves like common uniforms (as I said above: “The same goes to GLSTATE uniform semantics”)

Confirmed here on NVidia GeForce 6800 Ultra AGP8X (1.0-9773 drivers) with Athlon 64 3500+ on WinXP:

val = 0.300 -> delay = 0.014
val = 0.000 -> delay = 0.572
val = 0.001 -> delay = 0.014
val = 0.500 -> delay = 0.649
val = 0.999 -> delay = 0.014
val = 1.000 -> delay = 0.650

I look forward to the NVidia explanation on that one.

I’d try this on the higher end cards at work (various GeForce 7s & 8s) but we only run Linux on those.

(Not) confirmed on Geforce 8800GTX 768mb ForceWare 158.22 on WindowsXP 32bit.

All measurements are 0.014 with vsync enabled, 0.002 without vsync. But I do see a delay visually switching to 0.0 and 0.5…

Same results on a Geforce 8800GTS 640mb ForceWare 160.03 on WindowsXP 32bit.

def

There is nothing strange with 8800, I also don’t see any lags on them and on NV30.

It doesn’t happen on GeForceFX (driver is not so optimized, as for NV40) and on 8-series (they have “real” FP uniforms instead of “fake” ones, so the optimization is not necessary).

nVidia tells, that they have some routine in their driver, which optimizes FP in-place, when uniform is changed. They tell, that this optimization must be very fast, but as you see - this is not true sometimes. They don’t want users to have control on this process, but I hope sometimes it would be very comfortable to have some switcher to turn this process off.

GeForce 8800GTX - Linux-x86_64 driver 100.14.06 - Dell 690 (2 Dual core 3.2 Xeons)

Key 0: val = 0.300 -> delay = 0.001
Key 1: val = 0.000 -> delay = 0.518
Key 2: val = 0.001 -> delay = 0.001
Key 3: val = 0.500 -> delay = 0.537
Key 4: val = 0.999 -> delay = 0.159
Key 5: val = 1.000 -> delay = 0.539

Cycling through the keys again, the delays all ranged from .200 to .400 sec.

they have “real” FP uniforms instead of “fake” ones, so the optimization is not necessary
The pre-G80 cards have real FP uniforms too. It’s just that there are instructions that can be used to eliminate the uniform if it is a specific constant.

Longs Peak is scheduled to have the ability to define certain uniforms as constants, so that such optimizations will be performed only if the user specifies that the uniform is const.

Originally posted by Korval:
The pre-G80 cards have real FP uniforms too.
Are you really sure in it? I have another information on this.

We should lobby to get an application that changes uniforms into the next Spec suite to penalize such driver behaviour.

Philipp