occlusion query not 'almost free'

i decide to try out occlusion querys at the same time i lay down the depth/ambient pass. from what i read this will come with very little cost, but i lose about 10% framerate (this is just with the following code, ie no querying) roughly 200-300 draw calls

glBeginQuery(GL_SAMPLES_PASSED, queriesID );
// draw geometry
glEndQuery(GL_SAMPLES_PASSED);

why such a great loss?

  • opengl shaders are always enabled, ie im not doing a pass with colormask off, could this have something to do with it (though in the spec the example that i follow does my method)
  • queriesID is a number that ive choosen, but ive also tried with glGenQueries(…)

since from my testing with glGetQueryObjectivARB 95-100% of the stuff i send is visable, its really not worth implementing occlusion query testing,

what have other ppl experienced?

from then spec.

// First rendering pass plus almost-free visibility checks
glDisable(GL_BLEND);
glDepthFunc(GL_LESS);
glDepthMask(GL_TRUE);
// configure shader 0
for (i = 0; i < N; i++) {
glBeginQueryARB(GL_SAMPLES_PASSED_ARB, queries[i]);
// render object i
glEndQueryARB(GL_SAMPLES_PASSED_ARB);
} 

The first question that leaps to mind for me is how are you drawing your primitives? I would expect (I haven’t used occlusions queries) that if the data is being pushed across the AGP bus then you would expect a performance hit. But that’s just a guess…

Never found occlusion queries to be entirely free, however IME the performance loss when querying happens approx. a whole frame after the queried geometry is more in the range of a handful of percents at worst (for a few dozen different queries in a frame, for scenes that are more CPU/geometry/AGP limited than fillrate/fragment limited).

Do you reuse the same queriesID for several glBeginQuery in the same frame?
Did you try cycling several IDs over several frame?

The first question that leaps to mind for me is how are you drawing your primitives? I would expect (I haven’t used occlusions queries) that if the data is being pushed across the AGP bus then you would expect a performance hit.
im already drawing the data (as its an ambient/depth pass), thus im not actually drawing any extra data

Never found occlusion queries to be entirely free, however IME the performance loss when querying happens approx. a whole frame after the queried geometry is more in the range of a handful of percents at worst .
the thing is im not actually asking for any results back the 10% loss comes just from adding the two lines
glBeginQuery(GL_SAMPLES_PASSED, queriesID );
glEndQuery(GL_SAMPLES_PASSED );
actually now i think about it its even worse than 10% (as i was ignoring the other stuff in the render frame, eg light pass, alpha passes etc) more likely closer to a 20% loss.
perhaps its something to do with me doing more than a handfull of occlusion querys then again 200-300 aint really that many are they?
also perhaps maybe it has something to do with me having a occlusion query buffer of from 1500-5000 IDs (ie each mesh gets an ID at the start), then again i did try with glGenerate/deleteIds…(…) to generate free ids per frame but it still runs slow

Hi Zed, what hardware and driver are you using?

gffx5900 70.90

How much is your performance loss in miliseconds?
FPS is such a bad way to measure performance since it’s non linear…

a couple of hours testing and im still not much further,
i wrote a simple glut app (but no difference between occlusion + non occlusion) (ill add shaders later see if that makes a difference)

but anyways in my app, the problem also is in immediate mode,
from all the materials i tested the only one that didnt show a difference between occlusion + non occlusion was standard pipeline wireframe tris.
even standard pipeline fill tris seemed to be 2-3% slower

with some of the materials (esp using glsl) theres a huge difference ~50% slowdown.

im downloading new drivers 71.81 to see if they make a difference, if not ill play around with the glut app a bit more

How much is your performance loss in miliseconds?
FPS is such a bad way to measure performance since it’s non linear…
aye milliseconds are just as bad as fps

new drivers no better, a further couple of hours testing no further, i couldnt replicate the problem in a glut app,
i believe its a driver problem that only exhibits itself in the particular way that im doing the rendering,
im willing to send the app cass

Please do send me a small app that illustrates the problem, if you don’t mind, Zed.

Originally posted by cass:
Please do send me a small app that illustrates the problem, if you don’t mind, Zed.
ive sent it to your ru address (not the nvidia one)

sorry like i mentioned i couldnt replicate the problem in a glut app, so i sent a simplified version of my program.
spacekey changes occlusion on/off (1-6 changes screen resolution)
like i said its not to important at the moment for me as less than 10% of meshes are being occluded but this might change dramatically in the future, as im looking into doing occlusion of light volumes.

thanks for looking into it cass

cass or zed, you would be so kind to let us know how this ends, I’m considering the same approach at this time …

Thanks,
SeskaPeel.

Sure - I’ll follow up with what the issue is/was for Zed’s performance drop.

Zed, do you have any idea why you can’t repro the slow-down in a GLUT program? That seems fishy.

yeah i know it does, but i have no idea, my apps engine is big and complicated unfortunatly thus its very hard to replicate the same situation, in a glut app.
u tried the program i assume?, did it not show a difference between the 2 states?
on my computer (5900XT) occlusion on runs ~200fps occlusion off ~300fps, the only difference is the inclusion of those 2 statements,

perhaps its something unrelated like the pointsize being set to say 20.0, and this is somehow causing the rendering to go down a different path, whenever occlusion is on.

i assume u have some driver app that lets u ignore the 2 statements, perhaps run it with that enabled thus glBeginQuery(GL_SAMPLES_PASSED, queriesID ); are ignored, if the framerate goes up perhaps that will shed light on the problem

any further info cass, at least confirmation that its a bug in the driver that is gonna be fixed at a later stage, so i can retain the occlusion culling in the pipeline

Hi Zed, no confirmation yet. Still in the queue.
I’ll let you know.

Thanks -
Cass

Since I’m also interested in how this turns out, I would be glad if the response could be posted on the forum… maybe as a new thread for obvious reasons.
Thank you!

SeskaPeel, Obli are u’s also seeing a loose of framerate when u use occlusion query?
im willing to send ppl a sample app that tests for the speed lose, perhaps im the only person seeing this?, ie it would help cass out if it happens on a wide variety of nvidia hardware.

billybolluxbouncingballs at yahoo.co.uk

Sorry, I am not actually using them, just planning their use in a thing I’m doing here.
I could, however, help with the testing if you need this information.
I sent a mail to the address you specified. Maybe we shall consider it as a faster/alternative way to work/communicate on this?