Occlusion query

Tips needed.
How many occlusion queries can I have running at the same time. Like if I want to do 100 or maybe 1000 and I check for results 5 frame later, is that ok in general?

If the OpenGL spec doesn’t provide a limit, then that’s how many you have.

ive had over 1000 on my gffx (results returned same frame later on)

There’s some minor cost for issuing queries, so you should avoid doing them for every triangle in a model, for example, but one per object should be fine.

The main thing to be careful about is accidental synchronization issues, but if you’re checking the results several frames later, that should not be a problem.

The important thing for you to do though is just test that you can get the perf you want for the way that you intend to use occlusion queries. That’s the only way to be sure anyway.

Each geometry would be a few triangles (10?).
It’s better to use low poly bounding shape instead of hundreds of poly actual model, or 3 thousand poly model?

Is there a difference between rendering and a OQ?
OQ are limited by which part of the pipeline?
Yes I know OQ doesn’t render something to a buffer but does fragment end just increment a counter instead of rendering to a buffer?

The OQ counting machinery runs in parallel all the time for free. What costs (a little) is reporting back the results.

There are two common ways to use OQs. One is to disable writes and render some simple, conservative bounding volume. The second is more useful for multi-pass rendering. Each time you render an object, issue an OQ. On subsequent passes of that object, skip rendering it if it was fully occluded.

This second model becomes a lot more interesting with the notion of predicated rendering, which is an upcoming feature.


By predicated rendering… Do you mean glBeginConditionalRenderNVX() with query index of issued occlusion query? This thing works great even now )))

Hi Jackis,

Yes, that’s what I meant, but since we never officially exposed that extension, that’s not the exact one I was talking about. We’re working on a multi-vendor one now.

I’d be interested in hearing more about your current use of the conditional render. If you have the inclination, please contact me at cass@nvidia.com.

Thanks -

Actually, we use it just as you’ve mentioned above. We have 2 main things in our engine, which are using that feature.

The first is terrain. Our lighting scheme obliges us to have texture-blended result to make correct lighting evaluation (cause we have some additive-multiplicative specular tune-ups to get it more realistic; in fact this is totally NPR, but as results are good and friendly-looking, why don’t use some hacks, like your team did in Nalu demo? )) So, the first pass is simple texture-blend (4-8 textures in block as average), sorted by cam-distance to get self-occluded blocks occluded bu queries. It also issues queries for each block. Then we got screen texture and say - this is our blended terrain surface. After we make a lighting pass, with parallaxed bump and so on, where each block’s render call is framed by BeginConditional() and EndConditional(). With some viepoints it gaves up to 40% gain with no CPU stalls, which is really important for our tasks, cause we are totally CPU bound by some cases.

The second is procedural objects layer (grass, trees with shadows, rocks with shadows, houses with shadows, surface decals/imprints and so on). The first CPU pass is to decide, which cells from scrolled net are to be drawn this frame. Also, it isuues queries for every cell with it’s AABB rendering. This CPU job is made right after terrain rendering. After it whole scenegraph is rendered (all the cities, objects and so on). And only after procedural objects are really rendered, so, with a very big probability the result is already known. Some months ago I used simple GetQueryObject(), and that was quite good because of no stalls, but with conditional redner this render-call was simplified.

Thank you.

ive posted about this before but occlusion query still seems broken on the gffx (it works but incurs a performance hit)
im using it the standard way

lay down depth pass with occlusion test
… do stuff
read back results
draw lighting stuff meshes (ie everything u see onscreen) only if they pass the occlusion test

ill post a demo on my site soon which exhibits the behaviour, with gf6+ theres no slowdown

This second model becomes a lot more interesting with the notion of predicated rendering, which is an upcoming feature.
Will this be a feature of new hardware or is this something we can look forward to on current hardware?

Korval, as I know, the extension is there since gffx :slight_smile: , it was just never exposed to the public. According to Jackis, is seems to work fine on current hardware.

It’s supported in hardware on GeForce 6 series and beyond.

what’s a “scrolled net”?


It’s my own name to some data structure, which maintains rectangular-to-ractangular indices remapping of some fixed-size rectangle w/o reinvalidating every cell on movin along it.
May be, there is some other name, I don’t know.


One more question.
Will this extension’s interface stay as-is, or it will be changed in some manner? For example, I think, that the ability of more fuzzy-test would be greatly appreciated. I mean, I’d like to discard not only those batches, whose queries issued zero sample-count, but any user-defined threshold,
like glConditionalRenderThresholdNVX(GLuint samples_to_discard).

Thank you.

That could be nice for LOD stuff, say, while the camera’s moving quickly. Though I would wonder if such an addition would impose performance penalties.

By the way, the DX10 language describes conditional rendering in terms of predicate “hints,” in the sense that there’s no guarantee that the predicate will be honored if the occlusion mesh hasn’t completed drawing–to keep the predicates from actually stalling the pipeline. I don’t know if the ARB’s solution will avoid this issue, but this seems to me not at all unreasonable, perhaps even unavoidable. So while the occlusion mechanism is automated to a greater extent, there’s still seems to be a need to “space” your query issue and use, to some extent. Still, this seems awfully slick to me.

There’s been discussion of thresholded queries, but I’m honestly not sure where that stands at this point.

Over time I’d expect more generality in how pipeline queries can be used to make decisions or set pipeline state.


One thing more I’ve mentioned righ now, is that every pair of only calls

[glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queryIndex);]

without asking any object results takes about 0.001 mS of CPU time on my 6800GT with 91.31 drivers. I mean, 1000 occlusion queries tests take 1 millisecond of CPU, and it’s quite a big time I hope.
I expected it to be more efficient, it’s just a trigger on/off action I thought. I’m surprised a bit.

[EDIT]: that was a test with AABB drawing inside BeginQuery()/EndQuery() (drawed by Begin()/End() just for the test). Without any drawing this pair takes about 0.25 microsecond to execute, that is more like a simple lightweight API call (about 700 ticks).
Right now I’m going to switch to single VBO and draw all the AABBs w/o begin-end and I expect some overhead would be gone.