# Multiple Z Buffer Formulas

The last post I found on this topic is from Mar 2000
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=7;t=000017

Would be nice if OpenGL 2.x can support multiple z buffer formulas.

For the application I write, I need a large visibility range.
e.g. (let us assume the length unit is meter)
near clip plane: less than 1 meter (zero would be best)
far clip plane: greater than 50.000 meter
max. distance between two z-buffer values: 0.001 meter

The ‘standard’ z buffer formula (with the log2(far/near) distribution) don’t capture this needs.

Some formula with a “(far-near)/(2^zBits)” distribution would be nice.
e.g.
zval = ((far-near)/(2^zBits)) * (ze-near)

with:
far = far clip plane
near = near clip plane
zBits = z buffer depth
ze = distance between eye and object

The “(far-near)/(2^zBits)” part can be precalculated on the glFustrum() operation.

## .oO( DepthBufferFormula(enum formula, float reserved1, float reserved2);

formula = GL_LOG2SOMETHING | GL_LINEAR | …
reserved1, reserved2 = parameters for more specific formulas
)O°

FYI: With the formula above one could capture half of earth on a z buffer resolution of 1 millimeter and a z buffer depth of 32 bits.

Shinta, who still asking himself, whats the point in depending the z buffer res on the near clip plane.

Nowadays, you can compute whatever depth value you want (just don’t expect optimal performance when you do). The z-buffer itself is just comparing numbers, so that isn’t the problem. The “problem” comes from how the z is computed. If you want to change this, feel free to do it in a fragment program.

> The “problem” comes from how the z is computed.
> If you want to change this, feel free to do it in a fragment program.

.oO(I could also sort all objects for distance and draw them from back to front, without using the z buffer… that would save memory and might be faster. - One can overdo it with shaders, don’t you thing?)°

Shinta, “less is more”

Not saying that this wouldn’t be useful, but since you’re asking for what sounds like hardware modifications…
Wouldn’t a w-buffer capable card be able to handle this for you already? It would just be up to you to provide the w coordinate.

(Korval, I hope I’m not talking gibberish here: w-buffers are exposed through some OpenGL extension, right?)

> Not saying that this wouldn’t be useful, but since you’re asking for what sounds like hardware modifications…
> Wouldn’t a w-buffer capable card be able to handle this for you already? It would just be up to you to provide the w coordinate.

I haven’t done anything with w-buffer yet, so I don’t really know, how to use them. Is it possible to do the ‘normal’ z-buffer functions with this w-buffer? (And if yes, how?)

Shinta

The current z-buffer formula makes depth values linear in screen space which enables all sorts of performance benefits for hardware, z-compression, multi sample compression etc. Having general z-buffer formats would defeat all this, but if you don’t care about performance, it’s perfectly possible to roll your own z-buffer via shaders. It’s perfectly possible to put view space z into a 32-bit floating point render target for example, which is as much pecision as you’re gonna get on most hardware, since that is all the precision transform math is done with.

A w-buffer generally requires a per fragment divide which is expensive in hardware and that’s probably the reason it isn’t supported on newer hw.

whats the point in depending the z buffer res on the near clip plane.
Because, most of the time, what is close to us is far more important than what is far away.

Besides, the math worked out nicely that way, too.

that would save memory and might be faster.
I seriously doubt it would be faster (I don’t know about saving memory). We’re not talking about a highly complex shader here. We’re just talking about computing the z-depth in another way by using the shader (linearly rather than the normal way). You’d use maybe 10 opcodes at the outside.

Because, most of the time, what is close to us is far more important than what is far away.

Right, “most of the time”, but not “every” time.

I tell you, what I need:
Imagine a city model with the size of Hamburg (textured, with LOD and digital ground model (DGM)). It needs to be display all visible objects (occlusion and backface culled) at once and in realtime.
Okay, the rendering speed it exceptable(>15fps), but the objects really far from the ‘viewer’ flickers all the time.
Building A is sometime drawn in front of building B even though its behind A, cause of the lack of depth buffer precision.

What is the point in have a depth buffer precision at the near clipping plane in nanometers and at the far clipping plane in miles?
I looked at the formula(I hope it was the right one) and I couldn’t find any clue why it is done that way. It’s not that the calculation is faster or nicer or something. It’s only that objects near the viewer usually are more important. If that’s the only reason, I don’t understand why there are GL_FLAT vs. GL_SMOOTH for the ShadeModel or GL_NEAREST vs. GL_LINEAR for GL_TEXTURE_MAG_FILTER, but nothing for the depth buffer.

You’d use maybe 10 opcodes at the outside.
I will, if you promise me the speed don’t be slower than 95% of the current speed.

@ harsman:

The current z-buffer formula makes depth values linear in screen space which enables all sorts of performance benefits for hardware, z-compression, multi sample compression…
I don’t get it.(really) Could someone please explane it to me?

Shinta

I will, if you promise me the speed don’t be slower than 95% of the current speed.
You realize, of course, that your problem is unique, yes? As such, because it is unique, you cannot expect IHV’s to just drop a feature that is useless for 99.9% of the applications, especially considering that they have given you the tools to do exactly what you want (at reduced performance). It’s like asking for hardware bump mapping, even though we have fragment programs that can do it for us, just because we don’t want to spend the performance to do it ourselves.

What is the point in have a depth buffer precision at the near clipping plane in nanometers and at the far clipping plane in miles?
Because most people don’t use such ranges. Most people use reasonable scales for near and far clip planes, like between, say, 0.5 meters and 1km or so. Maybe 10km for the far buffer, but that’s kinda pushing it.

Also, you can play tricks on the z-buffer. You can always render in depth-based sections, where you render the far portions first, then clear the z-buffer, and render nearer portions. Because you respecify the range for the near/far clip, you can get the precision you want.

It’s not that the calculation is faster or nicer or something.
Well, considering that you later expressed ignorance as to the express purpose or the expected results of the calculations as done normally, it seems somewhat dubious to say that it isn’t “faster” or “nicer”.

I can’t explain it because I don’t fully understand it. But, then again, I also don’t care because for virtually all cases of interest, the behavior of the normal z-buffer is exactly what we want.

If that’s the only reason, I don’t understand why there are GL_FLAT vs. GL_SMOOTH for the ShadeModel or GL_NEAREST vs. GL_LINEAR for GL_TEXTURE_MAG_FILTER, but nothing for the depth buffer.
The reason is obvious (and your analogy doesn’t make any sense): without hardware support for goroud interpolation (which everything uses, not just mostly everything) you can’t get realistic graphics (let alone decent looking non-photorealistic graphics), period. Without hardware support for bilinear texture sampling, you get horrible pixelation artifacts, which you can only recently get around (in a fragment program), but most people are going to want them. Features are added to hardware based on user need, not whim. And, just because something is a priority for you doesn’t make it a priority for others.

With your formula, depth values aren’t linear in screen space. That is, if you have three vertices of a projected triangle (in screen space) and their associated depth values, you can’t just perform a linear interpolation between them when rasterising said triangle. This will lead to incorrect results because of perspective foreshortening (similar to the texture warping you would see in old Playstation 1 games or in older real time software renderers).

To combat this, you need to perform a perspective divide per fragment, which is fairly expensive to implement in hardware. In addition to this, the assumption that depth is linear in screen space offers plenty of optimisation opportunities when it comes to compression, which is important because z-buffer bandwidth is often a bottle neck when using anti aliasing.

I don’t think having a linear distribution would be useless, I’m just trying to explain to you why it typically isn’t supported natively on modern hardware.

Try this vertex program as a starting point. Play around until satisfied

``````!!ARBvp1.0
TEMP temp_pos;
TEMP w;
DP4 temp_pos.x,vertex.position,state.matrix.mvp[0];
DP4 temp_pos.y,vertex.position,state.matrix.mvp[1];
##not really required, but certainly useful
DP4 temp_pos.z,vertex.position,state.matrix.mvp[2];

##compute 1/w
DP4 w.x,vertex.position,state.matrix.mvp[3];
RCP w.w,w.x;

##do the perspective divide *only* on the x/y components
MUL result.position.xy,temp_pos,w.w;
##put whatever you want into result.position.z
##the expected (unclipped) range is [0;1]
##e.g. MOV result.position.z,temp_pos.z;
##the closest to the "normal" behaviour would be
##MUL result.position.z,temp_pos.z,w.w;

##disable implicit perspective divide. We already did it
MOV result.position.w,1.0;

##special care must be taken with texcoords
##we "turned off" the perspective divide, so we lose
##"perspective correct" texturing. We'll use "projected texturing"
##to compensate
MUL result.texcoord[0],vertex.texcoord[0],w.w;
MUL result.texcoord[1],vertex.texcoord[1],w.w;
##more texcoords as required

END
``````

untested

Take note that
a)this is not necessarily position invariant (in x/y) with default transformations.
b)if you use ARB_fragment_program you must use TXP everywhere, instead of TEX.
c)if you use ATI_fragment_shader you must use SWIZZLE_STQ_DQ for texture lookups. And you cannot do 3D texturing this way.
d)dependent reads will be difficult …
e)intersecting faces may intersect in unexpected places.

Or you may try this one. It’s a bit less intrusive because it does not circumvent the perspective divide. But your z computations will be in a different (dynamic) range, and thus get a bit more complex.

This is IMO a better tradeoff for “production” code, because it makes fragment operations (texture lookups) faster at the expense of vertex operations.

``````!!ARBvp1.0
TEMP temp_pos;
TEMP w;
ALIAS w=temp_pos.w;
ALIAS oow=w.w;

##post-transform z and w, and we can't read from
##a result register
DP4 temp_pos.x,vertex.position,state.matrix.mvp[0];
DP4 temp_pos.y,vertex.position,state.matrix.mvp[1];
DP4 temp_pos.z,vertex.position,state.matrix.mvp[2];
DP4 temp_pos.w,vertex.position,state.matrix.mvp[3];

##compute 1/w
RCP oow,w;

##move the x, y, and w components to the result unaltered
MOV result.position.xyw,temp_pos;
##put whatever you want into result.position.z
##the expected (unclipped) range is [0;w]
##e.g.
##MUL temp_pos.z,temp_pos,oow;  ##put into [0;1] range
##MUL temp_pos.z,temp_pos.z,temp_pos.z; ##square
##MUL result.position.z,temp_pos.z,w;   ##put back into [0;w] range

##the normal behaviour would be
##MOV result.position.z,temp_pos.z;

##"normal" texcoord handling
MOV result.texcoord[0],vertex.texcoord[0];
##etc
END
``````

@ Korval:

You realize, of course, that your problem is unique, yes?
Oh, yes, I did. But I can also imagine alot more application, where a linear distributed zbuffer would be usefull. (e.g. VRML viewer, like the cosmoplayer; I don’t no what the delevopers are thinking, but it has the worst zbuffer i’ve ever seen.)

@ harsman:
Ahhhh, now I see. That interpolation is a reason for the fomula I can accept. It’s bad that it was never mentioned in any sources i read. So thank you for enlighten me.

I don’t think having a linear distribution would be useless,…

jumpingandwavingthehands See, Korval, I’m not the only one… g

@ zeckensack:
Thx for the code. I’ll try it…

Shinta

But I can also imagine alot more application, where a linear distributed zbuffer would be usefull.
Wouldn’t a floating-point logarithmic depth buffer suffice? Or do you really need depth to be linear in screen-space?

Imagine a city model with the size of Hamburg (textured, with LOD and digital ground model (DGM)). It needs to be display all visible objects (occlusion and backface culled) at once and in realtime.
Wouldn’t dynamic LOD and multiple passes at different ranges of depth help? You’re not the first person who wants to draw a city on a GPU. Perhaps you should look at other similar projects and what they have done.

Wouldn’t a floating-point logarithmic depth buffer suffice?
I don’t think floating-point values would increase the resolution, unless the bring more bits than the usually 24 or 32 bits for depth buffers.

Or do you really need depth to be linear in screen-space?
If I understand the answers of harsman right, the depth buffer is already linear in screen-space, so that the depth buffer values can be linear interpolated between the calculated vertices.
I would need the depth buffer linear in viewer-space (left-hand iirc), to get more precision near the far clipping plane. Or, alternative, more depthBits (>32).

Wouldn’t dynamic LOD and multiple passes at different ranges of depth help?
LOD is done already, the multi pass at different ranges would need to sort the objects, and if I do that, I could draw the model from back to front in one pass without even need a depth buffer (okay, intercepting faces won’t work).

Shinta

Hi,

I haven’t read all the posts in this thread, so please correct me if I’m in the wrong direction, but I just want to point out few things out there

W-buffer : Actually, in perspective view OpenGL manages its depth buffer as a “W-Buffer” (as mentioned by Direct3D) and in orthographic view OpenGL manages the depth buffer as a “Z-buffer” (as mentioned by Direct3D). So, talking about W-buffer support in OpenGL is pointless. BTW, it is clearly stated in the spec that the depth buffer should be called the “DEPTH” buffer, not the “Z” buffer as many people tend to call.

Depth output in fragment programs : I for one would never use the fragment program option to output a depth value. It overkills performance because many (if not all) graphics card perform an optimization called “Early Z test” which greatly improves fillrates, and this feature is simply broken when modifying the depth output inside a fragment program. So unless you NEED to replace depth or unless performance is not an issue, try something else.

Depth precision in general : if you get the nanoprecision at the near plane and miles precision at the far plane, you’re more than likely to have misconfigured your near/far ratio. The most common issue is to set a very close near plane. Many programmers set it to 0.1 by default or even lower. But in real life, you rarely view objects closer than 1 meter. The second common issue is to use a low-precision depth buffer. Always call glGetInteger with GL_DEPTH_BITS at least once in your program (typically, just after the framebuffer initialization) in order to check the depth buffer depth, even if you think you have configured it in the pixel format. For instance, old NVIDIA cards (up to GeForce4) will give you a 16-bit depth buffer is you don’t request a stencil buffer ; and requesting an 8-bit stencil buffer provides a 24-bit depth buffer. So always make sure you have at least 24-bit depth precision before investigating on algorithms such as depth replacement or the near/far ratio improvement.

HTH
Vincent

@vincoof:
“Depth precision in general :…”
Let’s do some calculations:
To get the z-buffer-value (zw) form the z-value in eye-coords (ze) calc:
zw(ze):=sfar(near - ze)/(ze*(near-far))

Solved for ze:
ze(zw):=sfarnear/(zwnear + sfar - far*zw)

with:
s:=2^z-Buffer-Bits (DepthBits)
far=far-clip-plane
near=near-clip-plane

ze(s)-ze(s-1):= max. distance between two z-buffer-values

Let’s take some real numbers:
s:=2^24=16777216 (that’s default, 32 bits would be much better, you’ll see)
far:=3000 (Meter)
near:=3 (Meter)

ze(s)= 3000
ze(s-1)= 2999.821375515
ze(s)-ze(s-1)= 0.178635 (too big!)
ze(1)-ze(0)= 0.000000178635 (too small, needless small

if I would take your suggested 1 (Meter) for the near-clip-plane:
ze(s)-ze(s-1)=> 0.536262989 (far too big!)
ze(1)-ze(0)=> 0.00000005958478

0.5363 (over a half of a meter!) isn’t near some good value. (as I worte in my first post, 0.01 would be great)

Solve ze(s)-ze(s-1) for near, to get the optimal near value at a needed far-clip-plane and max. (or should I say, min.) depth buffer resolultion:
optNear(far, s, maxZRes):=-far*(maxZRes - far)/(s*maxZRes + far - maxZRes)

Some real values:
optNear(3000, 2^24, 0.1)= 5.346646
If I want to display something near the far-clip-plane (3000 m) at max. 1 dm depth resolution and I got only a 24 bit depth buffer, my near-clip-plane needs to be min. 5.34 Meter away.

optNear(3000, 2^32, 0.1)= 0.020954 (The same only with 32 bit depth buffer.)

optNear(3000, 2^32, 0.01)= 0.209532 (my wished resolution)
optNear(5000, 2^32, 0.01)= 0.5820077
optNear(10000, 2^32, 0.01)= 2.327762 (my wished max. distance)

You see, it only works on 32 bit depth buffer (or better), in an acceptable manner. A different z-buffer-formula would be great, but I know (now), that wouldn’t be specified in OpenGL.

Shinta

Some math never hurts

However, the point is that if you have a 3D model of a 3km² scene, you’re not likely to have modeled anything at the half-meter precision. And if this really happens, you still have to render the object 3 kilometers away to let Z start fighting.

Otherwisen the solution to the problem is to render the scene multiple times, splitting the scene in the depth. For instance, first you render the scene with near==200 and far==10000, then you clear the depth buffer only (nor the color buffer) and draw the scene again, but with near==1 and far==200, on top of the previously drawn scene.

However, the point is that if you have a 3D model of a 3km² scene, you’re not likely to have modeled anything at the half-meter precision.
Right, I(not quite by myself) have modeled whole cities like Berlin, Hamburg and Cologne at 5-10 cm precision, really. I work for a company doing geoinformatical stuff (3d-GIS, arealphoto-scanning, orthophoto-generation, …, whole-city-realtime-visualisation).

Otherwisen the solution to the problem is to render the scene multiple times, splitting the scene in the depth. …