occlusion query not 'almost free'

zed, I’ve just started to see what occlusion query is. So I read the spec of this extension. Altough I didn’t finish this work, I’m almost sure you don’t use them in an efficent manner.

We can read many interresting things in the spec that might revealed you missed some steps. I’m not still sure as I know them since yesterday. But we never know…

Hope this helps.

:smiley:

Originally posted by knackered:
:smiley:
And I’m still not sure because he stippled only few lines of code. It’s only one part, not the whole.

For example, we don’t know how he creates them, neither how he manages the boundings, neither how he is managing the cpu…

Originally posted by zed:

on my computer (5900XT) occlusion on runs ~200fps occlusion off ~300fps, the only difference is the inclusion of those 2 statements,

Well, I read almost all the thread. :slight_smile:

I think, if I read correctly the spec, that this is not good. You should have more than two statements in difference.

Here is a code example we can find out in the spec:

 
        GLuint queries[N];
        GLuint sampleCount;
        GLint available;
        GLuint bitsSupported;

        // check to make sure functionality is supported
        glGetQueryiv(GL_QUERY_COUNTER_BITS_ARB, &bitsSupported);
        if (bitsSupported == 0) {
            // render scene without using occlusion queries
        }

        glGenQueriesARB(N, queries);
        ...
        // before this point, render major occluders
        glColorMask(GL_FALSE, GL_FALSE, GL_FALSE, GL_FALSE);
        glDepthMask(GL_FALSE);
        // also disable texturing and any fancy shaders
        for (i = 0; i < N; i++) {
            glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
            // render bounding box for object i
            glEndQueryARB(GL_SAMPLES_PASSED_ARB);
        }

        glFlush();

        // Do other work until "most" of the queries are back, to avoid
        // wasting time spinning
        i = N*3/4; // instead of N-1, to prevent the GPU from going idle
        do {
            DoSomeStuff();
            glGetQueryObjectivARB(queries[i],
                                  GL_QUERY_RESULT_AVAILABLE_ARB,
                                  &available);
        } while (!available);

        glColorMask(GL_TRUE, GL_TRUE, GL_TRUE, GL_TRUE);
        glDepthMask(GL_TRUE);
        // reenable other state, such as texturing
        for (i = 0; i < N; i++) {
            glGetQueryObjectuivARB(queries[i], GL_QUERY_RESULT_ARB,
                                   &sampleCount);
            if (sampleCount > 0) {
                // render object i
            }
        }
 

So, whether I’m totally wrong with occlusion query (that could be true because I’m just begining them), or whether you’re wrong.

With what I understand with this sample code, there are more than 2 occlusion query function calls in addition.

Can someone says if I’m wrong or not ?

i was using them more like in the second/third example in the spec (since im drawing objects with multiple passes)

ie this code (with about 200 draw commands)

// configure shader 0
for (i = 0; i < N; i++) {
glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
// render object i
glEndQueryARB(GL_SAMPLES_PASSED_ARB);
}

true for occlusion to be useful u have to also query it.
but for me just adding the above 2 lines drops the framerate from 300fps to 200fps, thus somehow just adding those 2 lines throws the rendering onto a slow path.

in the captain courgette game i posted u can turn occlusion on/off this is using occlusion with query as well (thus the occlusion path uses less draw commands) though even here occlusion is still slower than it is when its disabled

Why you render all faces? Instead of that, after first z-fill pass, you can render only boundary boxes of octree (or objects) using occlusion query.
This should remove a lot octree nodes or objects from further passes.

yooyo

yes, as yooyo and I understand that, you might forget to ‘register’ the occlusion test with bounding boxes.
If you draw all your objects for ‘registering’ your occlusion queries, then the gpu will have much work to do in order to test if a sample will pass or fail.

Also, try out the single pass.

you can render only boundary boxes of octree (or objects) using occlusion query
i can see many cases where this can be slower and esp in my case where 95+% of actual geometry is NOT occluded, using BBs you will have a even worse result.

anyways yooyo jide, youre missing the main point of why i posted this topic, which is.
enabling occlusion query as specifyied in the spec (if u are already drawing the object) should come at little or no cost. BUT in my app it comes at a big cost (either from bug i the drivers/falling off the fast path)

// pieces from the spec follow

In multipass rendering situations, however, occlusion queries can
almost always save fill rate, because wrapping an object with an
occlusion query is generally cheap. See “Usage Examples” for an
illustration.

im using occlusion query like the folling part in the spec

// First rendering pass plus almost-free visibility checks

Actually, does anyone know something about the status of NV_conditional_render (or so) ? I thought occlusion queries could become one of the easiest and most efficient optimizations with this extension, but there seems to be no plan to implement it, yet.

Jan.

Hmmm…

I still have mixed feelings towards hardware occlusion-queries.

I think an extremely efficient software-rasterizer (only render bounding-volume geometry to a software 32bit z-buffer) combined with Hierarchical Occlusion Maps (HOMs) implemented with a 1-frame dependancy is a better choice.

Since render-calls are asyncronous you can render your scene while rendering the next frame’s HOM. This will leed to almost free occlusion-culling, given that you have an efficient technique to check projected bounding-volumes against your HOM in screen-space.

Best regards,
Roquqkie

Originally posted by Jan:
[b]Actually, does anyone know something about the status of NV_conditional_render (or so) ? I thought occlusion queries could become one of the easiest and most efficient optimizations with this extension, but there seems to be no plan to implement it, yet.

Jan.[/b]
I’m interested in this to. It’s been almost a year since we were told more information was coming “soon”.

http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011740

Originally posted by zed:
enabling occlusion query as specifyied in the spec (if u are already drawing the object) should come at little or no cost. BUT in my app it comes at a big cost (either from bug i the drivers/falling off the fast path)

How complex is your scene (you seem to stress that it’s isn’t very)? The impact of enabling a feature like this would actually be inversely proportionate to the scene complexity.

To take an analogy from assembly programming, if you add 5 cycles to a 50 cycle loop, you’re increasing the length of the loop by 10%. If you add 1 cycle to a 3 cycle loop, you’re increasing the length by 33%.

IOW, since your scene is simple the overhead is relatively large. When your scene is more complex (ie, you’re getting sub-50 fps) you probably won’t notice the overhead (might drop by 1 or 2 fps).

…Chambers

Originally posted by Adrian:
I’m interested in this to. It’s been almost a year since we were told more information was coming “soon”.
http://www.opengl.org/discussion_boards/cgi_directory/ultimatebb.cgi?ubb=get_topic;f=3;t=011740

I also was interested and I still am.
Maybe it’s time to get an update from the vendor? After all, NV will be working on proto-GF8 right now.
By the way, how much people is really using OC queries? I’m quite curious.

Hi all,

So I’ve been pointed to this thread. I’m part of NVIDIA’s developer technology group. And I figured I’d comment a little.

Occlusion queries aren’t completely free, but shouldn’t have that much overhead. zed, I got your sample application from Cass and ran some tests. I as able to setup a similar system to yours (AMD, FX5900U, 71.83, which is the latest beta driver not sure where you got the 71.90… :stuck_out_tongue: ) and am able to repro a 1.16ms increase in frame time between the “using occlusion” and not. I also ran the sample on a GF6800U and did not see any difference between the two modes. It is possible that something wonky is going on here, and I’ve filed an internal bug. We’ll see if we can’t get to the bottom of this. Not sure how long it will take though. Just thought I’d post and mention that for now it seems to not occur on GF6xxx series.

Also, any more info on this issue that you have would of course be helpfull. :smiley:

Thanks!

-B

“repro a 1.16ms increase” doesnt sound much but thats from 4-5ms total i take it, so it is a sizable chunk

heres the test app if anyone wants to try it (spacebar changes between occlusion on/off)
http://motueka.homeip.net/kea_occ.rar
it would perhaps be helpful to know if it affects other gffx cards eg gf5200,gf5700 or is it just restricted to 5900.
this has become a bit more important for me now, since ive made the camera angle more horizontal (and thus meshes have a higher chance of being occluded), also ive started using occlusion for light coronas as well

I get the error
‘The procedure entry point SDL_HasRDTSC could not be located in the dynamic link library SDL.dll’

you must be using a different version of sdl (sorry i should of included it)
anyways sdl.dll and sdl_mixer.dll are included here
http://motueka.homeip.net/CC_ver2.rar ~600kb

Thanks, but now I get the error
“textures/terrain/heights_terrain.tga texture not found”

The textures folder was in the first download but not the second, regardless though, that particular texture isnt in either download.

sorry i should of explained better
http://motueka.homeip.net/kea_occ.rar is the version to run
this -> http://motueka.homeip.net/CC_ver2.rar is only so u can get sdl.dll sdl_mixer

but to make it even more straightforward ive uploaded a new version of http://motueka.homeip.net/kea_occ.rar with sdl.dll + sdl_mixer.dll included

The heights_terrain.tga texture is not in any of the downloads.