nv_occlusion and syncing

i have a question that bothers me. i read not long ago in a thread that when querying a few occlusions, the app should let the gpu work, and while that keep the cpu busy. only than to get the query results from the card. so my question is, will it be faster to let the cpu work? i dont see why, becuase it will not speed up the gpu, and that means that the occlusion test is slow, as we could see from that thread subject. so what is going on? can someone explain this cpu/gpu paralelizm thing?

okay… whats happening if you call glDrawElements? nothing. except in some say queue<OpenGLEvent> in the drivers there is a queue.push_back(OpenGLEvent(GLE_DRAW_ELEMENTS,OpenGLEventParams(params)));

and the gpu itself is possibly doing something else… when the gpu has nothing to do, it will look into the queue and pop_front the item to do this job.

that means if you just query something, the query gets into a possibly long queue and will have to wait till it gets done. the query self does not take long as it is simply rendering geometry as well. but you have to wait till it gets processed. simply waiting for this is called stalling the pipeline and stupid. till its done, you should do some calculations on the cpu, because you have this time “for free”

you can for example render the primary portal-cell, the one you’re in. then you send the query in where you draw the portal to the next one. now while waiting for the answer you can process all the meshes in the current portal, means the boned/skinned animations etc, and start drawing them as well. THEN you look for the report of the query, and depending on this, you draw the next portal-cell. etc…

that way you’re parallelising the tasks, means doing both at the same time: rendering and calculating…

if you just

q = query();
if(report(q)==visible) draw()

then in this part here:

q = query() {
if(report(q)==visible) draw()

you’re spending time simply waiting for an answer…
this is stupid.

hope you got some information in this wich you hadn’t before and wich is useful…

Yes, Dave’s answer is correct. The command stream has a certain amount of latency built into it. Requesting the result of a query immediately requires the driver to wait until the command stream is completely drained before it can return. Same problem happens when you do a glReadPixels.

If you can find a way to structure your code so that you don’t need the results immediately (like the way Dave suggests), you’ll be much better off.

Thanks -

thanx, but that is what i already know, what i dont understand is, if you dont have any cpu calcs to do, would it be faster to put an empty loop, like -
for (int i=0;i<10000;i++);
before querying the results, or you just say that doing other renderings before the querying and not after is better, because this way its like for “free”?

you dont get the result faster if you do a forloop. thats just stupid (bether use Sleep(5) or something )

but before querying the result, do some OTHER stuff. your engine is not only drawing queries on screen, but for example updating physics etc. you can update say 1 of your meshes after you’ve send the draw-message and a query, so you do useful calculations instead of waiting for an answer…

or do it like this:

while(possible visible stuff in list) {
draw all sure visible stuff();
start queries();
do all physics update for the visible stuff();
check queries and fill in sure visible stuff in the visible list, not visible drop out, possible visible let in possible visible list();

now THAT are function_names

hope you NOW got it

Okapota : If you really have no other calculations to do then you just call the function and wait until it finishes. You cannot speed up the process.

If you can’t structure your rendering code to hide the latency of the occlusion query, then you’re probably better off not using it at all.

It’s pretty unlikely that you can’t reorganize the way you draw in a way that allows you to hide the latency, though.


well, i have no code yet, i was just intrested in the subject. when i will have code, i’ll remember it. thanx.

I don’t know if this is of any use to you at all, but i’ll tell you how i’ve structured my rendering code. The renderer is designed to work on dynamic objects (all of them can move and change shape & size). It requires a simplification of the objects in 3D and 2D-space.

  1. Perform frustrum culling and remove all non-visible objects and object-groups

  2. Approximate screensize of visible objects, and mark insignificantly little objects as non-visible. Store a depth-value for the visible objects (or groups) and choose potential occluders.

  3. Render potential occluders.

  4. For the rest of the visible objects:
    A. send a visibility query of a batch of objects to the rendering subsystem.
    B. While waiting for the result, test the next batch of non-occlusion-checked visible object screensize-areas against previously occluded object areas.
    C. Set objects that are behind already occluded objects as non-visible. Put the visible ones as next in the visibility-query list.
    D. Get the result of the visibility query. If there were any non-visible objects, store their 2D-areas in a list of occluded objects (so that we can re-use the occlusion-query result in part C) and render all the visible ones.
    E. If there is no objects left, stop, otherwise goto part A.

This system allows to reuse previous occlusion queries, which speeds up the rendering-system quite a lot. It produced a good speed-up for occlusion queries even on a software version, where i read the whole screen-full of floating point Z-buffer values from a GF2MX graphics card (over 3000 occlusion queries at ~14FPS). I haven’t had much time to test it properly on gf3 or 4 because i’m really busy with work and school right now.