glTranslate - wrong behaviour?

f-error · September 22, 2008, 2:00pm

I stumbled upon some problem regarding the handling of calls to
glTranslate(…) on NVIDIA graphics cards.
It seems to be a bug/feature directly connected to NVIDIA adapters,
since I could not reproduce the error on ATI or INTEL hardware in any way.
I am not exactly sure if this is the right place for such a matter, so it would be
nice if somebody could confirm that or point me to somewhere else.
(It takes some effort to explain the problem thoroughly, so I would not like to
spread this issue here, if its clearly the wrong place.
I hit NVIDIA forums but those dont seem to be the best place either. )

thanks for your concern

ZbuffeR · September 22, 2008, 2:09pm

First time I hear about an implementation problem on glTranslate.
All previous problems were due to error(s) between the keyboard and the chair

Zengar · September 22, 2008, 2:11pm

I very much doubt that glTranslate would cause any problems, that part of the driver probably didn’t change for the last 5 years. You can submit bugs to nvidia through ndivia developers website. Anyway, if you describe your problem here we will probably be able to find a solution.

f-error · September 22, 2008, 2:55pm

Well it is not exactly a problem with glTranslate. It seems to be some problem with how the card reacts to it.
So lets give it a try.

I am not a native, so if something sounds fishy you might consider looking for a communication problem.

I ran into this “effect” when I was drawing some “dataTable” in 2dimensional arranged fields at the x-z plane, using JOGL.
To sum it up first:
calling glTranslate(1.0, 0.0, 0.0) a 10000 times, followed
by glTranslate(-10000.0, 0.0, 0.0) does not result in placing “focus” at the center. (at least not on any of my NVIDIA cards)

First thoughts were about “unclean matrices” or “floatingpoint errors”. Something like doing a rotation and not beeing able to do 100% accurate translations afterwards due to non discrete values.
Therefore I ran a few tests using clean matrices and discrete values that can easily be covered by IEEE754.
The error remained.
I was developing on a Geforce 7600 GS with a very old Driver from 2006 at first. Unfortunately updating the drivers did not change the behaviour.
After that I tried switching hardware, which partially resulted in “solving” the problem.
I tried a few ATI cards (about 4 or 5, don’t remember the exact hardware but can get the info if neccessary) and some INTEL cards (i855 i945), additionaly i tried some more NVIDIA cards (6200, Geforce Go, …), with the effect happening on all the NVIDIA cards.

So what did I do?
To get things clean I used “lesson 4” from NeHe and put some very basic code in it.

glTranslate(0.0, 0.0, -4.0) to get a better view at the center.
draw a simple object (glutsphere …) at the center
repeat a translation along the e.g. x-axis with a stepping size of 1.0 for about 10000 or more repetitions
translate “back” in one step
draw the same object again
To be sure I implemented this in Java using JOGL and in C++
with the same results.
On all NVIDIA cards I tested so far, the second object ist not placed at the center. It has a drift along the axis used for translation which increases as the distance for the back and forth translation increases.
Additionaly this effect is directly influenced by the viewport.
Resizing or stretching the window (which results in “reshape” followed by a reset of the viewport) will make the second object “jump” along the axis.
On all cards from ATI and INTEL this does not happen.
Both objects are placed exactly at the center, not matter how far the back-and-forth translation is.
I should mention that this effect only occurs if the translations are done with different amounts of “steps”.
glTranslate(10000, …) followed by glTranslate(-10000, …)
results in correct placement.
(The same happens if the back translation is done with the many steps as the forth translation.)
I made a little testprogram which shows the drifting object and the modelview matrix, which seems to be correct.
One can clearly see the effect effects of the translation and the matrix is exactly the same right befor drawing any of the two objects.

Doesn’t look like I can upload any code here. So maybe I can get a hold of some webspace, so you might re-check.
That is, if there isn’t a simple one-row explanation to this matter, and how its gonna be solved.

system · September 23, 2008, 8:26am

calling glTranslate(1.0, 0.0, 0.0) a 10000 times, followed
by glTranslate(-10000.0, 0.0, 0.0) does not result in placing “focus” at the center. (at least not on any of my NVIDIA cards)

I would say that’s normal.
When you call glRotate, glTranslate and these other matrix operations, the driver uses the FPU and SSE to do the matrix computation. The SSE unit probably introduces more precision issues than the FPU unit.

If you want to avoid that, use glPushMatrix() and glPopMatrix() or do your own matrix math and upload with glLoadMatrixf()

ZbuffeR · September 23, 2008, 9:24am

I think I recall there is an option in Nvidia Control Panel to enable/disable the use of CPU optimized instructions. Might want to check.

f-error · September 23, 2008, 10:59am

Well its not about “avoiding” the problem. (Allready found a few ways, by changing the ways I translate [skipping immediate mode e.g.]) Basically this behaviour breaks the rules of Group-Theory since the inverse element becomes broken.
So if the translations used in opengl are provided by some extern mathematical system I cannot rely on “follow the laws of matrices and it works”.

So I am very interested in how to “disable” this effect.
But it seems I need to catch up a bit on that topic.
Could you explain a bit more what the NVIDIA cards do and <the others> don’t?

Zengar · September 23, 2008, 11:26am

Numerical errors will always be present if you use “too many” matrices. Computers are not ideal computing devices, this is what you learn in numerics. No idea why it only affects Nvidia cards, this will be driver-dependent. To avoid this problem you can implement your own matrix tracking that deals with precision errors.

f-error · September 23, 2008, 12:30pm

I allready adressed those errors. I also cannot see any reason why only NVIDIA cards would be effected from numerical errors (i.e. being not able to store indiscrete values in floatingpoint datatypes)
And then, adding up 1.0 in a loop should not produce numerical errors, since its a discrete value that can easily represented by IEEE754.
And as I mentioned before, I am not using “to much matrices”.
Theres no rotation or any other transformation than the ones I mentioned in my description.

I check the matrix 3 times.

right before drawing the first sphere
right after the translation loop
right before drawing the second sphere
In state 1) and 3) the modelview matrix is identical, in state 2) there is one different value, that is the loop-translation of e.g. 10000 units. (exactly as it should be)
Those matrices look exactly the same on any graphics card, and they do not show any differences, but the rendered image does.
Additionaly changing the viewport does change the drift of the second object (on NVIDIA cards), but does not affect the matrices.
So it seems to be some error out of the focus of opengl,
because the matrix says that the sphere was drawn at the center,
but the image prooves it wrong.
This should imply that there is a distortion just on the graphics card, like NVIDIA cards are unable to add up those numbers, whereas every other vendors card can do it.

Maybe you could try this yourself, so we can be sure to know what we talk about.
I uploaded my C++ testprogram (yay, finally the free webspace from my isp is usefull)
http://www.muckelzwerg.de/test/
(use ‘+’ and ‘-’ to increase and decrease the translation distance. starts at 0)

CatDog · September 23, 2008, 5:56pm

Tested on GF7950GX2: the sphere makes some strange movements and leaves the view at around 23000.

CatDog

f-error · September 24, 2008, 4:05am

Thanks. At least I am not imagining things.

To address the issue of “too many matrices”:
What is too many? Every glTRanslate results in one matrix to be pushed on the stack, or am I wrong?
Hence I tried to create the drift with a smaller number of calls to glTranslate.
If I increase the stepsize to 100000 (houndredthousand) I can reduce the loop to 10 repetitions and the drift is still clearly visible.
I dont’t think that calling glTranslate 11 times is “too much”.
So it appears there is some strange behaviour regarding translations on NVIDIA cards, and it gets visible if you stack many translations or use very long distances.
But it should be no problem to calculate and use units up to “a million”, or am I missing something in the specs?

f-error · September 24, 2008, 7:38am

Today I contacted a friend who works as a game developer.
He asked the companies cg-specialist and this guy also voted for the FPU/SSE optimization.
I’d like to look further into this topic, so the next questions are:

how is the error generated?
It is not visible to opengl (matrix is clean) so it only applies
when the card calculates the positions of the vertices.
Should the FPU realy have trouble summing up the value 100000 for ten times and subtracting a million from it?
how can this behaviour be prooved?
I did not find an option in NVIDIA-Controlpanel or somewhere else, to change the CPU optimization.
If ATI and INTEl do not rely on single-precision calculations,
there should be some way to get a definite proof that this is the cause.

system · September 24, 2008, 9:05am

I got the same result with my 8600M, the sphere moving in a strange way then goes offscreen.
I tried with my own library which just uses C++ and got the correct result : the sphere doesn’t move.
I tried with the same lib but with SSE, and the sphere doesn’t move.
PS : I just call glLoadMatrixf before rendering the sphere.

I’m sure nVidia can answer your question or just stop using these GL matrix functions.

f-error · September 24, 2008, 10:25am

Yes, using loadMatrix will “avoid” the error since it does prevent multiplication of matrices.
Of course using multMatrix you get the same error as when using glTranslate, because you just spare gl the creation of the matrix.
So basically this means that you can get distortet rendering if you have some “incompatible” matrices stacked for multiplication.
If you never multiply and just load some “absolute” transformation instead, you will not see this effect.
The only way to avoid this error, and still use multiplication of matrices seems to be calling the inverse after each transformation, to get back to the identity matrix. (neutral element of the group)

When the problem first occured I was drawing a lot of objects one after another.
And to save transformations I calculated the “difference-transformation” between every object and its successor.
Having a lot of objects this results in a lot of matrix multiplications.
“draw obj1 - transform - draw obj2 - transform - draw obj3 …”
Certainly, it is possible to replace the difference transformations by absolute transformations so you wont have
glTranslate or multMatrix between the drawing but loadMatrix instead.
At this point I cannot provide some simple example that can not be “repaired” in this way.
The only (and much more abstract) unrepairable problem I have,
is the usage of some extern library that provides such lists of transformations, since they are mathematically 100% correct.
Its not that easy to convince some math professionals that 10 times 1 is different from 1 times 10.

Dou you happen to know how I could contact NVIDIA over this matter?
I could not find some satisfying contacts on their site.
Maybe trying the developers board?

system · September 24, 2008, 6:20pm

developer@nvidia.com
You can send them a description and source code.

If you never multiply and just load some “absolute” transformation instead, you will not see this effect.
The only way to avoid this error, and still use multiplication of matrices seems to be calling the inverse after each transformation, to get back to the identity matrix. (neutral element of the group)

In classical OpenGL, you are suppose to glPushMatrix and glPopMatrix and that should garantee that you get back your original matrix.
In modern GL, have a look at GL 3.0 spec, all matrix functions are declared deprecated.

Anyway, contact them and maybe they can fix it.