portal visibility using hw acceleration?

lemo · May 6, 2003, 3:26pm

I’m experimenting with portals in my toy 3d engine, and I tried a couple of solutions for portal visibility determination. One of the things I’m thinking about is to use the z-buffer do the work: I’m starting to render the world from the current sector, then I want to ‘render’ the portals polygons for it, but in a special way: I don’t want to touch anything in the framebuffer/z-buffer, I just want to know if any fragment from a portal polygon would have made it to the framebuffer if it was rendered as a normal polygon (ie. if it would pass the z-buffer test, no matter if it was only a ‘piece’ of it).

Is there any way I can do this with opengl? (render a primitive and get back a bool: is it visible or not)

Thanks!

zeckensack · May 6, 2003, 5:48pm

Yes, there is a way: GL_NV_occlusion_query .
But my guess is, it would be prohibitively slow for that task.

What you IMO really need is a PVS (potentially visible set), that’s an (offline) preprocessing step where you determine which other ‘sectors’ might be visible for the current one.

| || || |
| 1 ____ 2 __ 3 |
| | | | | |
|_ __| || ||
| |

| |
| 4 |

cameras inside 1 might look into 2, 3 and 4
cameras inside 2 might look into 1 and 3
cameras inside 3 might look into 1 and 2
cameras inside 4 might look into 1

Non-trivial task, but as long as you don’t allow your sector boundaries to break into pieces during runtime, it’s a preprocessing step.

lemo · May 6, 2003, 6:45pm

Thanks! I would like to avoid any preprocessing, like building a PVS (for many reasons, including that the geometry of ‘sectors’ can change)

Why do you think that NV_occlusion_query would be slow? As far as I can think about it, it should be pretty easy for the video card to know if some part of a primitive would pass the z-buffer/stencil tests. Is shouldn’t be slower than actually rendering the primitive (which is a very fast operation)

And how many cards implement this extension? I was thinking of something using gl select (ie. glRenderMode(GL_SELECT)), but I haven’t tried it yet.

Coriolis · May 6, 2003, 7:31pm

PVS is inferior to runtime portal visibility determination in terms of the amount of geometry culled. PVS answers the question for every single point in a large volume when you only care about a single point inside that volume, so sometimes it can be a huge overestimate.

If you don’t have that many portals, it should be fast enough to do the portals in software. It may even be faster, since you don’t have to read back data from the graphics card, and (more importantly I expect) you don’t have to synchronize with the graphics card. The things I’ve read about occlusion query say to only do it right before drawing an expensive model due to high poly count or complex shaders.

The only concrete advantage I can think of to doing it on the video card with occlusion queries is you can have dynamic objects block a portal at no speed hit. Unless this case is extremely common, I’d really recommend that you just do it in software.

lemo · May 6, 2003, 7:43pm

Right now I’m computing the portals visibility in my code but I have the feeling that it can be improved. Playing with my implementation I came to the conclusion that doing a ‘perfect’ portal visibility determination in software is too expensive.

I got better results with an over-estimation, but I think there is room for improvement, and thinking of alternative I thought that querying the card for visibility would give ‘perfect’ visibility and good perf, but probably I don’t fully understand the importance of cpu/gpu parallelism.

MickeyMouse · May 6, 2003, 10:03pm

If you feel culling using portals are too slow you may combine them with bsp’s for each sector…
…or simply don’t cull faces within sector behind currently processed portal, but cull only portals going out of the sector behind currently processed portal…

lemo · May 6, 2003, 10:18pm

Thanks for the suggestions. Actually it’s pretty close to what I’m doing right now, but I’m somehow limited by the fact I’m trying to use the data from an old game (duke3d). You can checkout my work at www.dukenukem3d.net Normally Duke3d maps are pretty easy to render, but I found some maps with a large number of small sectors visible at the same time.

Right now I end up with a fast, but not 100% accurate portals visibility code (not 100% accurate in that it may result in overdraw), and after some profiling it seems that I can get some decent performance by optimizing the sector drawing itself (I’m thinking to group the primitives with the same texture together, as binding the texture seem to be the bottleneck right now)

Anyway, thanks for all the comments, it’s the first time I’m posting here and I’m getting the feeling that it’s a great place to get advices from some opengl experts

Tom_Nuydens · May 6, 2003, 11:20pm

The problem with NV_occlusion_query is that it can have an undefined amount of latency. Due to deep pipelining in the hardware, it’s quite possible that the result of a query does not become available in the course of the current frame. You can force the query to finish, of course, but that will do your performance more harm than good. A better idea is to try and work around the latency somehow.

I would suggest doing an occlusion query for each portal in frame N, and getting the results in frame N+x. This introduces x frames of latency, which means that the sector behind a portal will appear x frames too late. One or two frames should suffice for x.

To reduce the popping this will cause, you can try to draw your portals slightly oversized during the occlusion query. This may not work for Duke3D-style environments unless you give them a thickness (e.g. draw a box instead of a flat polygon). The oversized portal will become visible a little earlier than the actual portal, which will hopefully prevent a pop.

That said, though, be aware that occlusion queries aren’t free – you have to render something, which means you burn fillrate. A software implementation may still give better results. If the small sectors are a problem, have you considered merging them into larger ones?

– Tom

[This message has been edited by Tom Nuydens (edited 05-07-2003).]

147-2 · May 7, 2003, 12:32pm

Here I have read a lot about occlusion and portal visibility. Well, Im working on a portal engine myself, and I’ll tell you how I establish portal vis.

When rendering is started, you have an initial view frustum. When looking thru the portal, you view the next sector thru the frustum defined by the planes generated by each line segment of your portal and your camera. If the portal is “cut” by one of the planes in your original frustum, throw that one into your frustum too. I did this with a “variable plane frustum stack” in which frusti may have any number of planes. I use a stack of these frusti because it seems appropriate considering how the rendering is done. I don’t think occlusion is even in the equation because sectors should be “convex volumes” and therefore should have no faces occluding other faces.

Humus · May 7, 2003, 12:48pm

Portal rendering with occlusion queries aren’t as bad as you might think. It does indeed work pretty well. I’ve tried it in practice with quite good results.

zeckensack · May 7, 2003, 1:07pm

It depends
Your demos seem to be pretty heavy, fragment operation wise. That might help hide the cost of the occlusion queries.

lemo · May 7, 2003, 2:45pm

I plan to try a few variations (including some occlusion queries) and I will put the results on the website, soon I hope

Humus, what kind of occlusion queries did you use (NV_occlusion_query?)

Zeckensack, yes, I’m trying hard to optimize the portal visibility because I want to be able to spend time per fragment with some fancy stuff (shadows, lightning, bump mapping, multitextures, …)

Coriolis · May 7, 2003, 3:05pm

If you’re dealing with auto-generated portals instead of manually-placed portals, pretty much everything I said is backwards. PVS will tend to be better than dynamic portal calculation, simply due to the number of computations you avoid and because the sectors tend to be pretty small. Manual portalling with software clipping is better than PVS with automatic portalling, but if you are stuck with automatic portalling PVS with frustum culling possibly with limited recursion is probably a better way to go.

Humus · May 7, 2003, 5:32pm

Originally posted by zeckensack:
It depends
Your demos seem to be pretty heavy, fragment operation wise. That might help hide the cost of the occlusion queries.

Well, I used some perpixel lighting with shadowmapping, so quite heavy. I also combined it with AABB’s and frustum culling and did the occlusion culling last in case the other operations didn’t cull it already.
Either way, the portal is a fairly small polygon, especially compared to full rooms you may be able to cull. The latency can be a problem, but there are plenty of tasks you can do while waiting for a query to complete. For my demo though, I didn’t care and used the GL_HP_occlusion_query since it’s slightly simpler to use and saw pretty signficant speedups.
Not sure if I will ever complete that demo though. It’s in terrible shape right now and I haven’t touched it in a long time.

dorbie · May 8, 2003, 12:17pm

The key requirement is to keep the pipeline busy while performing the test. It’s not going to take too long to get back once it’s performed.

You’d better make sure that each draw stage can do useful work while waiting and you obviously don’t occlude against the full screen each time.

The test itself is also not free, it is very likely that you’ll be wasting more time testing (ignoring latency for now) than you save except perhaps if you have some very geometry intensive stuff to draw. This occlusion querry was originally targeted at geometry limited scenes with complex occlusion scenarios where you can afford to waste some pixel fill time testing occlusion, (think million+ polygon CAD models of engine block assemblys), with the hope that you win on the geometry front (and maybe on fill a bit too for some scenarios)because you don’t have to transform the culled geomrtry, instead you just transform the bounds and test the z fragments.

It ain’t gonna help you with duke nukem databases because the fill overhead for the test is not amortized by the savings in geometry you send to the pipe, and coarse Z already helps with redundant fill issues.

This is the key to using this extension sucessfully, you have to expect you can amortize the additional fill overhead and possible pipeline stalls against savings in geometry and state changes by not drawing hidden stuff. Yea you might save fill too with a contrived case, but it’s very unlikely today thanks to coarse Z.

imported_jwatte · May 8, 2003, 6:19pm

The test actually might take as many fragment ops as it would take to fill that same area with a flat color – then you want to go back and re-draw that area with actual geometry.

I second the recommendation for calculating portal visibility dynamically at runtime. The only pre-calculation necessary is the location and shape of the portals, plus the classification of each portals two sides into cell memberships.

Using screen-space rectangles for portal-through-portal visibility is probably good enough, too – especially since applying scissoring at that point is trivial.

lemo · May 12, 2003, 9:47pm

I put together some basic code to perform portal visibility using GL_NV_occlusion_query, and based on this simple experiment I drew the following conclusions:

it works as expected perfect portal visibility
in complex worlds, with may portals visible at the same time, it is slower than the software equivalent (although it was much better at eliminating obstructed portals)

Overall it seems to confirm what most of you stated above. But my implementation wasn’t perfect by any means, I did a primitive portal-by-portal test for visibility, which doesn’t take advantage of gpu/cpu parallelism.

PS: the tests were made on a GeForce4 Ti 4200, and I don’t have any rigorous numbers on the performance, I used a very simple timing using ::GetTickCount().