Specific way to implement hiden object culling

A function that would return whether or not a polygon would have any pixel drawn would be useful in the following way (assuming current hardware allows this to be implemented quickly). Of course it doesn’t actually draw anything, it just does depth checks to see if it WOULD have drawn anything.

Solid polygons would be (roughly) sorted front to back and drawn in that order. Objects (and regions of space) can have low-polygon bounding volumes, and using this z-check they can be trivially rejected if entirely covered. Transparent polygons are sorted back to front and drawn after the solid polygons have been handled.

This would make determining which objects are compeltely obscured much more efficient and less complex. The biggest uses would be when a camera is near a large occluding object and when a camera is indoors (a special case of the previous example, I realize). By performing cuh checks heirarchically, it should be very quick as long as the zcheck function can be made fast enough.



My opinion is that this task should be performed by higher level APIs like SGI’s Optimizer.


I still think object based culling should be the application’s responsibility. What you suggest requires running through the entire transform and rasterization stages just to test Z. You could just as easily throw the geometry down the pipe as usual, if you can’t guarantee out of your own logic that it will be completely invisible. Otherwise you’d just double up on Z buffer traffic, you’d synchronize, you’d turn around bus transfer, all very bad things compared with having to do it on your own. Also note that transform capacity is relatively abundant on most of the current architectures.

If it weren’t for the fact that this stalls the pipeline, it would be a reasonable idea. However, basically, you have to make a test render (of a bounding box, for example) and check to see if it actually drew pixels. So, you have to wait until all the drawing up to that point is done, make the test render, and check to see if it drew. When you are checking to see if it drew, the GPU is idle.

If the PC had a reasonable architecture, the time getting data from the GPU would be more reasonable.

If you aren’t pushing the hardware anyway, this is a good idea. But, if you’re trying to milk every ounce of performance out, then this will only add hardware stalls. Now, these stalls may be better than rendering the geometry over again, but given how long it takes to do a read from the GPU, it may not be particularly fast.

This sort of thing should probably stay as an extension.

Actually, 90% of the reason I posetd this suggestion was to see what the state of grpahics hardware was on this issue. Obviously, if the hardware can’t provide the functionality qwucikly, it’s unusable. Mode changes and rendering everything twice are unacceptable. It needs to be a seamless operation that can be performed while normal drawing proceeds. If it was, it would be invaluable. Having the assure visibility quickly in software is a dumb way to do it (I worked on Asheron’s Call where we had to implemnt such a system) when the hardware is so close to providing such a seamless, simple, and (potentially) efficient system.

Both the extensions I posted should be accelerated with recent NVidia HW/drivers (although I haven’t checked which ones, so YMMV). The NV extension should also be relatively efficient as it avoids stalls as much possible.

I’m no hardware guy, but incrementing a counter on ztest pass doesn’t sound that hard, and I’m sure someone mentioned even Voodoo1 could do this (via Glide performance counters).

I’m an EE and I don’t belive it would be very difficult to implement in hardware either, unless the designer had made some very specialiazed trade offs in I/O, making it slow to return the answer. I’ll have to time those functions to see if they are acceptable.

The problem is probably not the hardware layer itself, rather the software that feeds it. Dig this: even the utterly outdated Savage4 graphics chip uses a 4MB DMA buffer queue from driver to hardware.

All of this stuff has to be flushed before anything is in its final state, ready for you to be requested. One key point to high performance, that has been exploited to death, is having graphics and CPU work in parallel. OpenGL intentionally specifies that any operation you request won’t finish any time soon, except you call glFinish. This is pipelining at its best. And if you request information that is the result of some previously queued operations, the only correct way to do that is to first flush the pipe and that is where you lose performance.

I guess what would really be needed is the baility to specify a heirarchy of object inside their bounding volumes which the card itself would then use to provide trivial object culling. This would eliminate any back and forth, although that’s quite an addition!

I’ve never cared much for the whole programmable-GPU stuff (I like what it can do for polygons, but I hate not knowing how long it will take), but when implementations start understanding high-level concepts like “objects”, I’m done with OpenGL. That is a realm that OpenGL, as a low-level library, should never enter into.

Well, what we’re running up against is the problem with slapping a generic interface on very specific hardware concepts. If hardware is specialized so that it’s more and more difficult to interact and control the low level details, then it kind of pushes one towards throwing more and more functionality into the custom hardware itself. Several times in custom computer graphics hardware there has been a cycle of addign custom hardware which then becomes so powerful that it becoems the general hardware! Whether or not this will occur again in the mainstream arena remains to be seen. Anyhow, this discussion is clearly way outside the scope of OpenGL and should be terminated I guess. I’ll write a couple of hardware card guys I know the suggest possible improvements along the lines we discussed if the standard functions are too slow. Thanks all.

Korval, OpenGL and DirectX have had objects for years, and neither appeared to have suffered much from it.

The hardware doesn’t have an understanding of objects any more than a general purpose CPU has. The whole object thing is just a clean way of giving stuff an interface.

zeckensack, please read the NV_OCCLUSION_QUERY spec. The interface is similar to fences, and doesn’t require a pipeline flush before query.

EDIT: Confused implementation and interface… need caffiene.

[This message has been edited by Maj (edited 03-13-2002).]

Originally posted by Maj:
zeckensack, please read the NV_OCCLUSION_QUERY spec. The interface is similar to fences, and doesn’t require a pipeline flush before query.

Did that, and it looks very nice. That one essentially negates the sync problem by having the application do some of the involved work. For special cases where you can do some useful stuff in between starting and finishing the query, this sounds great.