How to do Linear Z buffer

I expect you will also find that cases like this are wrong:

\ <—*eye


OR this actually (I think):

\ <—*eye


But for the most part stuff might look passable.

Option 1:
Try a 1D 16 bit texture ramp texgen in eye Z mapped into the depth buffer as I suggested earlier. It costs you a texture unit though.

Option 2:
For your depth effect use render to texture (I suspect you do already) then use a dependent texture read in your effects fragment program to relinearize the depth buffer. Again a 1D texture ramp, this time with the correcting mapping to linearize the readback value to linear eyespace z.

I think both will work, one costs a texture unit & texgen the other assumes a readback or render to texture effects implementation and costs a dependent texture read on your effects pass.

Take your pick.

[This message has been edited by dorbie (edited 06-06-2003).]

Thanks for the suggestions

Option 1 would be difficult since we perform mulple post passes after the scene is finished.

Option 2, this is what I would have preferred but it seems that dependant textures dont work with depth textures. (

One options ive been toying with is the copy depth to color extension. But it seems like this would really slow down the game alot.

I would have thought a depth to color would be possible without breaking the bank. Try 16 bit z to 16 bit luminance (is that supported?).

Another option might be to simply use a variation on Option 1 but write color components or even destination alpha writing linear z into the component from a texture(with 16 bit destination components) in one of your passes and copy that to texture. You may able to use a color component inexpensively if you have something like a depth buffer pass early on, or alpha if it’s available and doable. Depending on the effect you’re after 8 bits linear may even be enough instead of some higher precision formats. I dunno what you’re after (depth of field, volume fog or some other cool thing). The required precision would vary with these applications.

[This message has been edited by dorbie (edited 06-06-2003).]

I wouldnt doubt that the 8bits would be more than enough for what we want to use it for, but to loop through our objects and render an extra 150’000 polygons seems way out of the question.

I really should try the depth to color extension, maybe its not as bad as I would imagine. Just seems like alot of expensive extra copying. What it needs is the ability to use DEPTH_STENCIL_TO_RGBA_NV as a parameter to glCopyTexSubImage.

I think i’m going to test that extension and post again.

Why not just texgen to destination alpha(modified Option 1) if 8 bits is enough and read it directly back to color. It should be a breeze.

[This message has been edited by dorbie (edited 06-06-2003).]

Sorry forgot to post about that, my alpha channel is being used. Both in src_alpha operations and in dst_alpha operations.

Can you use a different alpha channel for blending that what you put into the color buffer?

So COPY_DEPTH_TO_COLOR didn’t go so well. Doing a copy pixels seems to be even slower than a glReadPixels operation. Plus I had to enable 24 bit depth buffer and 8bit stencil(costing a 10% of my FPS) since 16bit depth copying seems “fairly uninteresting” to the people who wrote the extension.

I have a feeling I must be doing something wrong to be getting such horrible performance, literally taking 10 seconds for each copypixels call. Stangely nothing seems to show up on screen other than the existing frame buffer, so I suppose it is quite likely I am not doing something correctly.

I’m not sure what it might be ive never tried any copypixel operation before. But AFAIK all my pixelzooms and raster pos’s seem to be setup correctly.

No, but if you could hold off on using destination alpha for the first pass you could use it then and clear after your readback.

You could also look into auxiliary buffers, but I haven’t used them. I think you can write to one of these puppies in your fragment shader independent of the rest of the stuff you’re doing. You’d just send the texture unit output there, then readback. Like I say I’ve never used this, so I could be talking out my ass on this particular point.

[This message has been edited by dorbie (edited 06-06-2003).]

Originally posted by cass:

I’m relatively agnostic about the whole this-mapping-is-better-than-that-one debates. I’m happy with 24 bits.

Even 24 bits breaks down when you don’t have nice geometry in the distance. Almost all z-fighting was eliminated with a linear Z, I just wish I could have solved some of the extreme anomalies that were occuring. Most of the small anomolies that existed up close where ignorable considering how much better the stuff in the distance looked. Being able to scale how fast you loose depth precision without changing your near plane is certainly a smart feature to give different applications the control needed to perfect their scenes.

[This message has been edited by JelloFish (edited 06-06-2003).]

Originally posted by dorbie:

You could also look into auxiliary buffers, but I haven’t used them. I think you can write to one of these puppies in your fragment shader independent of the rest of the stuff you’re doing. You’d just send the texture unit output there, then readback. Like I say I’ve never used this, so I could be talking out my ass on this particular point.

Ya aux buffers sounds like the right way to do it. But I really have no idea how to output to one of those using register combiners, time for some researh I guess.

Originally posted by dorbie:
No, but if you could hold off on using destination alpha for the first pass you could use it then and clear after your readback.

Ya I guess If it would be relatively cheap compared to everything else to only draw twice the objects that use the alpha component.(once to output a linear depth in alpha), and again to do the correct rgba pass. That might only be 50k polys.

[This message has been edited by JelloFish (edited 06-06-2003).]

Yuo can still do some RGB in the first pass, AND z linear alpha at the same time using an additional texture unit, hopefully you can work things out that way.

I believe there are two gains in the current Z implementation:

  1. Hierarchical Z approaches
  2. Compressed Z approaches

I believe they are conceptually orthogonal (though probably can get extra efficiencies by being combined in clever ways).

Here’s my mental model of hierarchical Z:

A coarse grid (say, on 8x8 or 16x16 basis) stores the highest and lowest values found within a block, possibly using some lower precision like 16 bits (with appropriate rounding). At that point, Z testing can be done for many cases in a simple operation that throws away an entire block (64 or 256 pixels – you’d probably even get decent gains at 4x4)

Here’s my mental model of Z compression:

A block of Z values (say, 4x4 or 8x8) is compressed using some mechanism that could be lossless if Z is “well behaved”. If lossless compression cannot be accomplished, then uncompressed Z is stored in memory. When the memory controller reads in the data, it decompressed on the fly, if the block is compressed. You have to reserve memory for a full, uncompressed block for the entire framebuffer, because the compressibility of each block can change quickly. The win is that the memory controller needs to read much less data if the block is compressed, and thus you get a speed-up as long as actual transfer is your bottleneck.

Possible synergies: Use hierarchical Z values to drive the interpolation for compression, a la DXT5 compression. Use the hierarchical Z data to determine whether the block is compressed or not.

Another possible Z compression model would be to pick one value, and store some number of derivatives off this value, and then store per-pixel some offset from this implied surface, very similar to ADPCM for audio.

I’m pretty sure that I don’t have all the details right here, but these models have, so far, served me well in predicting behavior, so I stick to them :slight_smile:

Jwatte, I think your mental model model of 1 is pretty close and I hope is tied to some region based rasterization that rejects blocks of fragments at some resolution, I think you said this. I think the real optimization as it relates to linear screenspace z would come from a linear subdivision and lerp of min & max depth for the coarse z regions on rasterized primitives.

The mental model of 2 is less clear. MY mental model is corrupted by knowledge of what SGI called ‘compressed’ Z. There’s also the whole issue of variable size representation. “compressed” sounds good to a software guy, but despite my laymans knowledge of hardware issues I’ve learned at least to be cautious about anything that implies variable sized representations and possible reallocation. Oh well, I could waffle on more about guaranteeing lossless in a worst case scenario but why bother, z is never worst case. Old Chinese proverb say; when your best guess is “chocolate donut”, it’s time to stop feeling the elephant.

[This message has been edited by dorbie (edited 06-07-2003).]

Note that I didn’t say variable-size allocation. My intuition is that that would be “exciting” to implement in hardware :slight_smile: What I’m envisioning is something where a block either is compressed, or isn’t, but you reserve the full size for the block.

However, the Z buffer reader (or writer) circuitry can read only 1/4 or 1/8 of the “reserved” space for the block, in the case that the block is compressed. This is a SPEED gain, but NOT a storage gain, which I think is somewhat unintuitive for someone who has traditionally used “compression” to mean “saves bytes” :slight_smile:

Any hardware guys care to comment? I’m fishing for education here!

What if your application doesn’t render from front to back and just throws polygons randomly. Wouldn’t the average case of this situation be similar to just storing the z buff uncompressed?

I imagine that in the hierarchical and even all other lossless compression schemes will give a hit when the buffer needs to be updated.

ATI has Hyper Z solution. I read in one NV doc something about a color and z compression unit being faster in such and such GPU. I guess everyone is doing it.

Even if you throw polygons in a random order, I think the idea is that there will be fairly decent-sized polygons (or areas) of the screen where a plane plus some delta (the “ADPCM” method) could fully represent the block. Each block is probably fairly small (4x4, 8x8, that kind of size). You still get a bandwidth win for the blocks that compress, and there’s no change for the blocks that don’t (such as along polygon edges, I’d imagine).

I have uploaded my stuff at

This thing uses 2 methods to deal with the problem.

  1. using ARB_vp as we discussed here

  2. my own personal trick --> glhMergedPerspectivef()

I’m wondering if #2 will work on everyone’s cards.

The problem is ortho z isn’t perspective correct under linear 2D interpolation, it won’t work in hardware, this is a very similar idea to one discussed earlier in this thread i.e. taking eye space z and passing it straight through, you do an ortho transform to screen z which really just linearly remaps the eye z coordinate 0.0 to 1.0 between the clip planes, so basically these are pretty much different shades of the same idea. To make this correct you need to do perspective correct fragment interpolation & cass pointed out that this makes things like fast coarse z hardware implementation difficult.

You need to test with an appropriate scenario, see my ascii art above or you risk looking OK in some cases but not actually being correct.

My take on this was to do away with all this business of how much precision we have and take whatever comes out of the modelview and interpolate it, since any concept applied later couldn’t undo the limitations of that transformation. I deliberately wanted to avoid any scaling & mapping because it loses precision. It was naive but still worth a thought.

The other issue is storage ropresentation if you go for a linear mapping as you have, I was suggesting using a float in my straw man scheme, but if you use fixed point IMHO that would have undesirable consequences, non linearity of depth precision is a good thing for perspective scenes having it tied to the near far can be a bad thing esp when far/near is high, but that doesn’t mean linear is desirable, so storage becomes important when you consider what you want to do with z in your scenario.

The debate & questions over what representation & precision etc. evaporates if you take the eyespace z from the modelview and simply store it as a float, but apparently it’s just not practical.

Originally posted by dorbie:
taking eye space z and passing it straight through

You can’t do that because it will break GL. This is because a user is free to apply ANY tranform he likes to the projection matrix.

Yes, I know he said there is a major penalty but it’s kind of wierd. That would mean you would get worst performance just by switching a scene to an orthographic projection.

I think the method will work fine, just like it does for ortho projections. There should not be artifacts.

The w-buffer I think just stores z values (or -z values into it), and I have no idea if it’s float or not. These ideas are not far off from each other, but since we don’t have a w-buffer… we can do this instead.

The debate & questions over what representation & precision etc. evaporates if you take the eyespace z from the modelview and simply store it as a float, but apparently it’s just not practical.

Let’s just say that we can take the window z values mapped from near to far (not remapped 0.0 to 1.0) and store them as floats.

Why not do this instead? And store them as floats. 32 bpp floats or more.
I have never understood why someone would wan’t to convert a float to integer and store that instead of the float.

/edit/ damn quotes!

[This message has been edited by V-man (edited 07-06-2003).]