How fast is occlusion culling supposed to be?

Hi, im just attempting to use the nv_occlusion extension, and im finding it difficult to get an acceptable boost out of it. Basically my loop consists of drawing the land, then testing objects against it that are greater than 100 polygons. Then I readback the pixel counts from the various tests and clip objects as necesary. After I do not clear the depth buffer since the occlusion tests do not write depth, and I render the non occluded objects. Now even in a scene where I am rendering 20000 polygons less(including polys for occlusion test) occlusion testing is still faster than rendering the 20000 (bumpmapped diffuse and specular) polygons.The scene consists of about 100k polygons.

Now is 20000 polygons not an acceptable geometry loss for occlusion culling to work, or is occlusion culling only usefull in scenes where millions of polygons can be avoided?

Does anybody have any successfull stories in practical applications with 100-150k polys per scene - occluded polys?

My guess is that you’re not doing anything while the occlusion query results are pending. In this case, you’ve effectively got a glFinish at that point in your code, so that your cpu is idle. Allowing the video card to finish drawing while your cpu thinks about game logic or the next scene or whatever is faster; even though more work is done overall, the work that gets done is done in parallel, so that it takes less time.

The trick to getting value out of occlusion query is structuring your code to hide the latency of the query. It’s not as bad as glFinish() to request the results of a query (more like glFinishFence()), but even glFinishFence() is expensive if you “finish” it too soon after issuing it.

Thakns -
Cass

I have a feeling stalling is not the issue in my case. Ive moved some workload inbetween the occlusion test and the occlusion query, which has brought the framerate closer but occlusion testing is still slower than just rendering. An example is a scene that runs at 25fps with 90k polys, goes to 23fps with 50k polys. My occlusion test rendering has color and depth writes disabled, as well the vertex program that is bound is a minimal amount of lines.

Is a 30k polygon gain something I should see FPS improvement?

1: Are you using bounding boxes, or the regular geometry in your occlusion tests?

2: How much overlap are you getting between rendering occlusion querries and regular geometry?

3: How many total querries are you running? It may be better for you to render an occulsion querry for several close-together objects.

Originally posted by Korval:
[b]1: Are you using bounding boxes, or the regular geometry in your occlusion tests?

2: How much overlap are you getting between rendering occlusion querries and regular geometry?

3: How many total querries are you running? It may be better for you to render an occulsion querry for several close-together objects.[/b]

1: I am using 12 polygon geospheres for the occlusino testing

2: I’m not sure what you mean, please clarify.

3: There is about 200 tests being done, so you think it would work faster if I somehow grouped the tests by pixelspace?

1: I am using 12 polygon geospheres for the occlusino testing

I’d suggest a bounding box, especially if the objects in question are rather box-like to begin with. Boxes, expecially non-axis-aligned ones, tend to bound tighter, thus giving fewer false positives.

2: I’m not sure what you mean, please clarify.

The idea is that, while you’re waiting for an occlusion querry to finish, you should be spending that time sending more geometry, or, at the very least, doing some CPU processing. Are you doing that, or are you calling “GetOcclusionQueryivNV” immediately afterwards? How much work are you doing between querries?

3: There is about 200 tests being done, so you think it would work faster if I somehow grouped the tests by pixelspace?

Absolutely. I would suggest having far fewer tests (maybe 25-50 tops). Just group close-together objects into one querry. You can even keep using the same bounding volumes; just draw several of them. The more querries you make, the more useless drawing you do.

[This message has been edited by Korval (edited 06-23-2003).]

The spheres fit the objects we have quite well. All objects are drawn in a batch, and then some time is spent doing non related cpu calculations, then the query is made.

So you think 200 tests to save 40000 polygons is too many tests? Its interesting that the extension is that slow, so the cost per test is greater than rendering 200 triangles. I guess I will have to try some sort of grouping approach.

All objects are drawn in a batch, and then some time is spent doing non related cpu calculations, then the query is made.

So, you send all of your occlusions at once? I’m not sure if I like that. Certainly, it doesn’t help in the case of one object occluding another.

So you think 200 tests to save 40000 polygons is too many tests? Its interesting that the extension is that slow, so the cost per test is greater than rendering 200 triangles. I guess I will have to try some sort of grouping approach.

It’s not that the extension, in and of itself, is that slow. It’s the state changes. For each querry, you’re making an additional state change (swapping in the querry shaders, etc).

Admittedly, since you’re rendering all your querries simultaneously, you’re mitigating some of this problem.

Not incidentally, you’re using a non-trivial amount of fillrate. Given that each querry renders 1/40th of the screen at 800x600, that’s around 2.28MPixels per frame. At 23fps, that’s around 52.6MPixels per second. Hardly trivial. If your application is at all fillrate bound, this could be an issue (are you using shadow volumes?).

Since the extension allows it, how many pixels, per frame, are you drawing in your querries?

An example is a scene that runs at 25fps with 90k polys

Whoa, I just did the math on that. You’re only getting 2.25M Tris/sec? What hardware are you using? More importantly, what extensions are you rendering with? Are you using VBO/VAO/VAR? A GeForce2 should be able to get twice that with good use of VAR. If not, before you get too deep into occlusion culling, make sure you’re sending your verts in an optimal fashion.

So for another example scene:
occlusion on: 20fps 57k polys GL_PIXEL_COUNT_NV returns 0 pixels, 143 objects clipped by occlusion. (None of the occlusion tests passed)

occlusion off: 25fps 95k polys

Basically the only objects remaining are the ones too low in poly to be worth occluding/ land and water.

So, you send all of your occlusions at once? I’m not sure if I like that. Certainly, it doesn’t help in the case of one object occluding another.

Isn’t that the way its meant to be done? Send all your tests, then query after the cpu has done something else while the tests are finishing. Im only using the land to occlude the objects, object to object occlusion is disabled.

Not incidentally, you’re using a non-trivial amount of fillrate. Given that each query renders 1/40th of the screen at 800x600, that’s around 2.28MPixels per frame. At 23fps, that’s around 52.6MPixels per second. Hardly trivial. If your application is at all fillrate bound, this could be an issue (are you using shadow volumes?).

All these tests are done in 640*480, I am pretty sure that I am geometry bound ( i decrease the polys/VP instructions the fps goes up). As well the tests shouldn’t be taking up much more fill rate than drawing the actual geometry. Especially not with color/depth writes, and texturing disabled compared to normal geometry using all 8 general combiner stages, and 4 texture units.

Whoa, I just did the math on that. You’re only getting 2.25M Tris/sec? What hardware are you using? More importantly, what extensions are you rendering with? Are you using VBO/VAO/VAR? A GeForce2 should be able to get twice that with good use of VAR. If not, before you get too deep into occlusion culling, make sure you’re sending your verts in an optimal fashion.

Ya I’d love to get our PPS up, but our vertex programs are extremely long, as a result we are geometry bound, even while using strips, and var. No doubt the app has some optimizations to do in the CPU side of things as well.

[This message has been edited by JelloFish (edited 06-24-2003).]

[This message has been edited by JelloFish (edited 06-24-2003).]