Too far from origin (again)

Dark_Photon · March 4, 2010, 5:01pm

Define “zoom”. Do you mean, move the eyepoint up near that CAD building so it fills a large portion of the field-of-view?

devdept · March 4, 2010, 11:42pm

Yes, exactly.

Alberto

Dark_Photon · March 5, 2010, 8:36am

Ok, let’s just think for a second about what this means. Suppose your units are meters just for ease of discussion.

Suppose you model the building around it’s own local origin (center of building is 0,0,0). The tallest building on earth is 828 meters. That means representing the building positions in float32 (where you get ~7 decimal digits of precision), your coordinates are gonna be accurate to 0.001m-0.0001m (i.e. ~1mm or maybe slightly better) when represented in float32 (assuming you compute them exactly and then just store them in float32).

So representing the building to the required accuracy with float32 is no problem! Take the vertex 1,1,1 in building space. We can represent that pretty much exactly, right?

What about transforms? the MODELING transform (which positions this building, modeled about its own local orig, into the world) is going to have a translate component of (800000, 1999900, 0) (these are your numbers):


MODELING =(( . . .  800000 )
           ( . . . 1999900 )
           ( . . .       0 )
           ( 0 0 0       1 )

Similarly, since you’ve just said that the eye is close to the building, the VIEWING transform is going to have a translate component of very near (-800000, -1999900, 0).


VIEWING = (( . . .  -800000 )
           ( . . . -1999900 )
           ( . . .        0 )
           ( 0 0 0        1 )

So what’s the problem? See, those big numbers just ate up all or nearly all of the 6-7 decimal digits of precision we have with float32 representing the shear magnitude of the numbers, leaving little or nothing for sub-meter accuracy. As you’re computing MODELVIEW, right after you stacked the VIEWING transform on with that huge translate (e.g. with gluLookAt), you’ve just trashed any accuracy your MODELVIEW can have for preserving sub-meter accuracy.

In other words, your “world coordinates” are huge (compared to float32 precision), and that’s what causes the problem.

Take your 1,1,1 building object-space point. After you transform by the MODELING transform, you get 800001, 1999901. You’ve got 6-7 digits to the left of the decimal so you’ve only got 0-1 left to the right of the decimal. So when you represent this in float32, what you actually get is maybe accurate to the nearest meter or so if you’re lucky – you’ve just lots all your submeter precision.

In diagram form:


eye-space = MODELVIEW * object-space
eye-space = (VIEWING * MODELING) * object-space

  ^ small            ^ HUGE!!!         ^ small

So looking at this diagram, you can see that the two HUGEs tend to cancel each other out (for objects close to the eye anyway, which is all you care about). If only there was a way to compute the aggregate MODELVIEW transform more accurately so that you didn’t lose all that precision in computing it…

Well, there is. Use doubles, if that provides you sufficient precision (it has ~15 decimal digits of precision instead of ~7 decimal digits for float32) which should be enough for your example. If doubles don’t offer enough, use 64-bit integers or something – whatever you have to to compute the aggregate transform accurately.

devdept · March 5, 2010, 9:23am

Thanks Dark Photon,

I will make a sample to check if I have understood everything well.

Thanks again,

Alberto

dorbie · March 6, 2010, 9:08pm

Aleksandar:

You are correct. If you maintain viewing and model matrices with high precision and have local object space coordinates, then the modelview result automatically produces a transformation matrix with low numbers for objects around the viewer. Object space numbers then transform to eyespace with high precision and never see large numbers.

You are not missing anything, this is the right way to do things, and with shader based implementation and software matrix stacks passed in as uniforms there’s nothing left for developers to complain about here.

Do you really don’t understand or don’t want to understand?
Everything what you have said is known since the dawn of computer graphics, and nobody denies that. But, there are some cases when it is expensive to rebuild lists, buffers or whatever technology that is used. Building a whole planet is such case. I still firmly claim that the proposed method is VERY useful in some particular cases (not for CAD drawings, certainly), and that cannot be reproduced in fixed functionality.

I’m sorry for the late answer…

I understand perfectly well.

It is well known and it works in any system including CAD and planets.

There is no need to rebuild lists etc. The vertex numeric positions never change. Only the view_matrix * model_matrix result changes. You can use a fixed offset, but that’s a bit outmoded IMO, you can simply use a continuous double precision view and model matrix and it’ll work beautifully. (casting to single before you send in your uniforms).

You can store coordinates as double precision, but in fact that is overkill and there is no hardware support. All you really need is to maintain double precision matrix offsets.

If you have a complaint you’re certainly not articulating it well.

You mention display lists, I say fix your code and use VBOs, the days of writing war and peace with branch bloat in your dispatch and using display lists to sort out the mess should be over. At a minimum you can call your display lists with no transforms or only local transforms in there (if that) and it’ll still work.

People have been asking for double precision graphics hardware for a long time, I hope the HW guys are not foolish enough to listen, at least not for a few more generations.

Alfonse_Reinheart · March 7, 2010, 1:36am

If you have a complaint you’re certainly not articulating it well.

He’s talking about cases where the geometry itself has large numbers that must be represented by doubles rather than floats.

Imagine a single mesh that has millimeter precision that must extend out +/- 10,000 kilometers from the origin. The vertices themselves must be represented by doubles.

Of course, the right thing to do in that case is to break up the mesh into pieces.

Aleksandar · March 7, 2010, 4:59am

I have to make some things clearer, obviously. Imagine that you have to model Earth. A semi-major axis is 6,378,137.0 m. Using floats for calculation or displaying does not allow any object less than few hundred meters to be displayed at all. The only way to handle that problem is to restrict minimal hight of the viewer to at least 2000m, or to divide a planet into blocks. Each block can have its own coordinate system, with the origin in the center of the block. Thus far everything perfectly fits into our story of using single precision…

The size of the blocks depends on the resolution we want to achieve. For example, if we want a decimeter precision, we need to confine one block to a diameter less than, let’s say, 150km. So, in order to implement our Virtual Earth, we have to deal with hundreds or thousands of local coordinates systems. As long as we are inside the boundary of the single block, that is not important. But when we are crossing the border, we have to deal with many blocks. In which coordinates system we should draw all of them? If we use a single coordinate system we have to rebuild all visible blocks except one (which CS we are using). On the other hand, we can draw each block in its own coordinate system, but on this way we can have a large translation (the thing we wanted to avoid) and gaps at the boundaries (because of differences in calculations).

To make things even worse, blocks of 150km in diameter cannot be monolithic. In order to use spatial coherency of the terrain we are walking through, we have to subdivide them. The only solution is to juggle with multiple coordinate systems and gaps filling.

So, if we have to deal with huge objects, we must divide them into many subobjects, each with its own CS. It is tricky and error prone (and even slow if we have to refill VBOs). In my tests I have proved that overhead of sending two floats instead one for each vertex representing such huge object is not significant (about few percent) and the implementation is clean and fast. Of course, small objects should not use two floats for each coordinate-value. Objects inside the terrain block are represented in the way you have explained.

I have ceased to use DLs two years ago, when the new spec claimed that they are deprecated. But even before I used DLs as VBOs, just for storing vertices not transformations. The only reason for using DLs was their speed (and they are still faster than VBOs).

Double precision support exists, since GeForce GTX 260 (or to be more precise with CUDA compute capability 1.3 devices (GTX260, GTX280, GTX285, GTX295, Tesla S1070, Tesla C1060, Quadro Plex 2200 D2, Quadro FX 5800, FX 4800)). The problem is that DP operations are expensive for these GPUs. I hope Fermi will change it. (For the broader audiences, OpenGL still does not support DP operations. Everything mentioned above considers CUDA and OpenCL. But it is just the matter of time when it will be included)

devdept · March 8, 2010, 1:13am

Hi Dark Photon,

Can you please check this essential GLUT example and confirm that it uses the approach you recommended?

Thanks,

Alberto

Dark_Photon · March 8, 2010, 7:18am

Exactly! You got it.

(And after hacking away the Windows-isms, I can confirm it works perfectly here on NVidia/Linux.)

Aleksandar · March 11, 2010, 8:20am

Dorbie, as you can see, “a long time” lasted just 5 days, because OpenGL 4.0 supports 64-bit double precision!!!
The revolution is realy started!

devdept · March 12, 2010, 1:05am

Does it mean that doing:

glVertex3d(x,y,z);

will pass real doubles?

Thanks,

Alberto

Alfonse_Reinheart · March 12, 2010, 2:01am

will pass real doubles?

It will if you happen to be running a GL 4.0 implementation. And it will do so at half performance. And while the HD 5xxx cards have sold reasonably well, they’re far from the majority at the moment.

Also, ATI isn’t exactly known for quality drivers, and right now, they’re the only GL 4.0 game in town. When NVIDIA finally gets around to releasing Fermi, you could expect some reliability. Though it’ll still cost you half performance.

Or, you know, you could do some simple subtraction on the CPU and get it all to work on any GL implementation.

Aleksandar · March 12, 2010, 4:07am

Cutting performance to a half is very frivolous estimation. How fast it well be, we will see when Fermi finally comes. Current hardware has a serious problem with doubles because the number of DP computation units is very small (apart from the fact that DP operations are generally slower than SP).

Alfonse_Reinheart · March 12, 2010, 11:20am

Cutting performance to a half is very frivolous estimation.

It’s the only estimation we have. NVIDIA says that all double-precision operations happen at half the speed of single-precision. Sure, they may well be lying, but I’d wait for benchmarks to come out before deciding on that.

It should also be pointed out that the best chance for NVIDIA to survive the coming GPU/CPU merger (since they don’t make CPUs, that pushes them out of the game) is to make GPUs that are useful to as many people as possible. And that means pushing features like double-precision and IEEE-754-2008, which are things that scientific analysis and such really, really want.

dorbie · March 12, 2010, 1:33pm

Dorbie, as you can see, “a long time” lasted just 5 days, because OpenGL 4.0 supports 64-bit double precision!!!
The revolution is realy started! [/QUOTE]

Yea, I noticed that too

Groan! It’s not what I’d call a revolution, this will be the red headed stepchild for a long time to come.

If you’ve been tracking the GPGPU stuff you’ll know that some hardware already had a few double precision floating point units, but they’re vastly outnumbered by single precision units. At best the rest would be left to emulation on single precision if it’s even possible. Consider yourself saved by the GPGPU war, but they may only have given you enough rope to hang yourself with.

It’s still advisable to use DP transforms in software and cast to single precision modleview matrices.

dorbie · March 12, 2010, 1:39pm

Oh God, you’re making me ill. Stop it.

I hope nobody sees a shiny new OpenGL 4 DP feature and anticipates dispatching their DP verts one at a time to it.

One thing to note here is that real DP uniforms and matrix transformation are just as important as attributes for this class of problem, and even moreso for some applications because you can always promote attributes for multiple DP xforms.

dorbie · March 12, 2010, 1:41pm

Thanks for the explanation, but this is known, the pertinent part is your last sentence.

Aleksandar · March 13, 2010, 2:48am

I like your temper!

Of course that I’m interested in GPGPU. All my previous posts make it obvious. By splitting some uniforms and only position coordinates into two floats per DP value, I have solved several problems with my terrain. If next generation hardware really can perform fast DP operations, I’ll “transfer” more calculation (e.g. Geographic to Cartesian transformation) to GPU.

According to NVIDIA’s Next Generation CUDA Compute Architecture: Fermi (http://www.nvidia.com/content/PDF/fermi_white_papers/NVIDIAFermiComputeArchitectureWhitepaper.pdf)

The Fermi architecture has been specifically designed to offer unprecedented performance in double precision; up to 16 double precision fused multiply-add operations can be performed per SM, per clock, a dramatic improvement over the GT200 architecture.

The chart on the page 9 shows 4.2x speed gain compared to GT200 architecture. I cannot claim that it is true, but we will see…

devdept · March 15, 2010, 1:39am

Sorry Dorbie,

I’m not an OpenGL expert as you are, btw it was only an example we are also not passing one vertex at time in our program.

Considering the performance drop of using GPU DP we are not interested in this precision any more.

Thanks,

Alberto

Pierre_Boudier · March 15, 2010, 6:36am

if you are only interested in using double precision in your object * mvp, then performance on high end hardware will not be bad. you might even not notice any drop at all.

on HD5870 (amd):

you have ~550 Gflops of DP
your primitive rate is 850M triangles
then you have ~650 flops per triangle
with non indexed vertices, you have 250 flops / vertex before you are ALU bound

in practice, many other part of the GPU will impact performance, but it is pretty rare to be limited by the length of the vertex shader.