Full support for early-z would be nice

Originally posted by Humus:
We now have a document in the ATI SDK that explains all the pecularities of the HyperZ:
http://ati.amd.com/developer/SDK/AMD_SDK_Samples_May2007/Documentations/Depth_in-depth.pdf

It’s a nice document. I remember that when I using stencil testing, I had blocks of mistakes, as if HiZ or something was messing up.
I figured there was some incompatibility with shaders and using stencil testing since using the fixed pipe worked fine.

Wasn’t it said that even glPolygonOffset turns off early-z and hi-Z?

The only thing that makes sense is forcing early Z if the Z value is written in the fragment shader. The condition is that the written depth value has to be closer or same distance to the cam as the linear interpolated version.

Originally posted by Lindley:
Is there no way to make the hardware a bit more flexible, eg allow the programmer to decide when early-z should be attempted?
Theorethically it could be application controlled through some extension, which could help in a few cases where the application doesn’t care about potential errors that could appear as a result of using HiZ and/or EarlyZ where not appropriate. However, as the actual cases that are accelerated differs between different hardware it would likely create more problems than it solves. I wouldn’t trade a deterministic behavior for a bit better performance.

Originally posted by Lindley:
It’s extremely simple to get a depth buffer into a state you want it to be in. It’d be awesome if you could do that any way you like----rendering, glDrawPixels, attaching a previously set up buffer to an FBO----and then request that the values in that buffer be used for early-Z on the next render pass.
EarlyZ (the per pixel based culling) can remain active for that stuff. HiZ would be much more problematic, as would depth compression. Taking a set of depth samples and converting to a set of HiZ and depth compression parameters is a non-trivial task. Since it would seldom be used it’s not something I’d suggest anyone to attempt to put in hardware.

Originally posted by Lindley:
The only thing that would need to change, I think, is the definition of when a frame ends. Currently, as I understand it, early-z and a bunch of other things are reset on every glClear(GL_DEPTH_BUFFER_BIT). It’d be wonderful if there was a glEndFrame() or somesuch that reset all those things without actually clearing anything.
In what way would glEndFrame() differ from glClear()? glClear doesn’t actually write anything to the physical depth buffer. That’s what’s called “Fast Z-Clear” in HyperZ. It’s only clearing the auxillary data that belongs to that buffer for HiZ and compression.

This anti-GPGPU bigotry is getting annoying.
Lets take a few steps back here and look at the the situation:

GPGPU tries to use hardware in a way it was not designed to be used. GPUs are, as the name implies, designed for doing graphics. They are not for general computation.

If you want to do something else with the GPU, ok, that’s fine. But don’t expect the GPUs or the graphics APIs to be redesigned to fit your needs. That’s what GPGPU is about, working around the difficulties that arise from “abusing” the hardware.

If you want a special piece of hardware designed for doing fast general computations, with no annoying graphics API that prevents you from doing what you want, get a cluster or supercomputer or whatever…

Sorry if this post sounds a bit harsh, but I think some GPGPU people need to be reminded what they are actually doing.

In any event, it looks like the depth_bounds_test_EXT extension may do what I want.

I guess what we really need is an expert system that can recommend available extensions based on desired functionality…

Originally posted by Korval:
[quote]This anti-GPGPU bigotry is getting annoying.
It’s not bigotry. It’s wanting my rendering API to not be coopted into something else.
[/QUOTE]I’m sure “your” API will survive without you enlighting other people how hardcore you are about the meaning of GL.

Originally posted by Korval:[b]
I don’t care what you do with the hardware. I don’t mind things like CUDA or CTTM. These are exactly as they should be: specialized APIs for specialised tasks.

This is better for both parties. Graphics developers don’t have to worry about their APIs getting bogged down by unnecessary baggage, and GP-GPU guys get an API that is designed specifically for their needs.

It’s win-win.
[/b]
It is naive of you to think of GPGPU and rendering as completely unconnected worlds. As HW gets smarter, techniques used in GPGPU can be increasingly more useful in new graphics algorithms. Think about possiblities given by the new extensions like geometry shader with transform feedback - they are already bypassing a big chunk of the traditional graphics pipeline.

And it works the other way too: some parts of the graphics pipeline aren’t exposed in CUDA/CTM, yet they can be abused to perform useful computation in the GPGPU domain.

So, it’s not all black-or-white as you paint it. If, by a chance, GL exposes a feature that doesn’t help you in drawing spinning cube on the screen, it doesn’t immediately mean that “API is getting bogged down by unnecessary baggage”.

Think about possiblities given by the new extensions like geometry shader with transform feedback - they are already bypassing a big chunk of the traditional graphics pipeline.
Which is why transform feedback is exposed in OpenGL.

I don’t really see your argument here.

some parts of the graphics pipeline aren’t exposed in CUDA/CTM, yet they can be abused to perform useful computation in the GPGPU domain.
Then they clearly ought to be exposed in those APIs. What you’re saying is that there isn’t a good GP-GPU API.

So complain to them about that. Ask for more features in their GP-GPU APIs.

So, it’s not all black-or-white as you paint it.
Hardware is hardware. And a graphics API is a graphics API. If hardware can do something that a graphics API can’t make any real use of, then that hardware needs not be exposed to the graphics API.

For example, if there are certain operations added to graphics cards to allow them to handle sound processing, OpenGL shouldn’t be extended to access those features unless they can actually have some specific use in graphics rendering.

As HW gets smarter, techniques used in GPGPU can be increasingly more useful in new graphics algorithms.
Noone in here said we shouldn’t have a feature because it’s useful in GPGPU. Its the other way round, if its only useful in GPGPU, it shouldn’t go into GL.

We are just saying that GPGPU is not a valid argument for any feature in GL.

I don’t think anyone in their right mind would object to an API that could do it all equally well.

The fear that some folks have is that you can’t do it all equally well, that devoting areas of the API to one thing will cloud areas of the other thing.

I think as time goes on the lines will continue to blur but to what extent I don’t know. It’ll be interseting to see where it all goes…

Originally posted by Overmind:
If you want a special piece of hardware designed for doing fast general computations, with no annoying graphics API that prevents you from doing what you want, get a cluster or supercomputer or whatever…
That’s going to be several orders of magnitude more expensive though, which is kind of the reason why people are showing interest in using video cards for general computations, and why IHVs are also showing interest in this field. I think people have to accept that GPGPU is here to stay and it may to some degree affect out graphic APIs too. GPGPU will be relevant to game development as well as people move certain types of computations to the GPU. Havok FX comes to mind, which does some effect physics on the GPU.

I think people have to accept that GPGPU is here to stay and it may to some degree affect out graphic APIs too.
Why?

More importantly, if graphics programmers don’t push against GPGPU creep into our APIs, then the graphics API can be complicated unnecessarily as well as the GPGPU crowd not getting a functional API of their own.

Havok FX comes to mind, which does some effect physics on the GPU.
But those physics are specifically designed for rendering. It’s stuff that the CPU doesn’t need to know about. In effect, it’s an extension of rendering.

Slightly off-topic:

Is there a way to do depth-testing within a range. So instead of a depth fragment being GL_LESS, GL_EQUAL, etc. it has a ‘range’ or ‘margin’ in the Z direction. A fragment that lies within this range would pass (or fail depending on the depthFunc). This is useful to render bullet hole decals on corners of walls without sticking out, shadow mapping and other projections, perhaps also CSG. Would it have drawbacks on things like early-z?
Or is this what GL_EXT_DEPTH_BOUNDS_TEST does? If so, that extension seems NV only. Is it even part of GL2.x?

On GPGPU:

GPGPU, as the name implies, is a GPU hack. If the GPGPU crowd wants the concept to mature, they’ll want their own independent API. Like Korval said, it is better for all parties.

Originally posted by remdul:
[QB] Slightly off-topic:

Is there a way to do depth-testing within a range. So instead of a depth fragment being GL_LESS, GL_EQUAL, etc. it has a ‘range’ or ‘margin’ in the Z direction. A fragment that lies within this range would pass (or fail depending on the depthFunc). This is useful to render bullet hole decals on corners of walls without sticking out, shadow mapping and other projections, perhaps also CSG. Would it have drawbacks on things like early-z?
Or is this what GL_EXT_DEPTH_BOUNDS_TEST does? If so, that extension seems NV only. Is it even part of GL2.x?

That’s what it does. It also makes it so early-z behaves a bit differently, although I haven’t had a chance to experiment enough to say for certain how yet.

Since it’s an EXT extension rather than NV, it’s possible it might work with ATI cards. You’d have to check, though.

On GPGPU:

GPGPU, as the name implies, is a GPU hack.

More or less. But then, email was originally an FTP hack.

If the GPGPU crowd wants the concept to mature, they’ll want their own independent API. Like Korval said, it is better for all parties.
And we’ve got a couple. Unfortunately, it isn’t compatible with older cards. Plus, there are some applications for which the 3Dish stuff inside OpenGL is a useful side feature.

The concept will mature. I’m certain of that. The prospect of a 10x speedup using hardware that’s already commonly included in machines is too tempting to corporations.

This GPGPU stuff will find it’s way into games too. After all we’re really just talking about general purpose computations here, and anyone with one eye open can see that computations are only going to get more intensive.

Think about histograms for HDR, or FFT for water simulations…

Sometimes what’s good for the gander is good for the goose. The more people from different fields pushing the envelope and fleshing things out the better :wink:

On a related note, it’d be nice if there were some way to override the standard mapping between positions in the depth buffer and positions in the render target.

Something along the lines of glDepthCoord, akin to glTexCoord.

This would allow you to use full-window depth information when rendering to, for example, 1/9th of a texture. While the GPGPU applications are more obvious, this could apply to a graphics program if you were attempting to mimic the effects of the glDepthBounds to show “slices”, except with the source being a prerendered texture instead of a 3D scene.