Bindless Stuff

Aleksandar · March 28, 2010, 6:03am

It works perfectly!

But I have a minor objection on glGetQueryObjectiv(timeQuery, GL_QUERY_RESULT_AVAILABLE, &available);

It shouldn’t raise an exception (glError()->GL_INVALID_OPERATION) if it is called before first time-query. In order to simply code, and avoid additional checking (as you have already said, we query time for the previous frame if we want to skip waiting), it will be more convenient if glGetQueryObjectiv returns 0 or even -1 in the case of error.

imported_Groovounet · March 28, 2010, 6:48am

Do you like ping pong?

What you can do is actually to create 2 query timers.
One for the current frame timing, one for the previous frame get query at the end of the current frame. Actually you still have a test this way.

An error is probably required because of:

The precedent of the query object (which is a great concept but a mess)
The query object is probably allocated a glBeginQuery so you glGetQuery on nothing except a reserved name.

Aleksandar · March 28, 2010, 7:53am

Yes, it is nice.

But I don’t need two timers. One is sufficient. Even one requires a separate critical section to avoid read/write conflict (because app can query frame rate at will).

It is also very nice that errors are silently ignored by OpenGL.
So I don’t have to worry about it (unless it causes some delay in the pipeline).

danbartlett · March 28, 2010, 7:57am

One thing about timer query is that (from spec):

The timer is started or stopped when the effects from all previous commands on the GL client and server state and the framebuffer have been fully realized.

and from issue (8) “How predictable/repeatable are the results returned by the timer query?” of http://www.opengl.org/registry/specs/ARB/timer_query.txt

Note that modern GPUs are generally highly pipelined, and may be
processing different primitives in different pipeline stages
simultaneously. In this extension, the timers start and stop when the
BeginQuery/EndQuery commands reach the bottom of the rendering pipeline.
What that means is that by the time the timer starts, the GL driver on
the CPU may have started work on GL commands issued after BeginQuery,
and the higher pipeline stages (e.g., vertex transformation) may have
started as well.

The most noticable effect of this I could see was that in certain situations when drawing a lot of stuff, the timer query results would appear to be much faster on a large window than on a small window, since it completes quite a lot of the commands issued on the current frame between BeginQuery + EndQuery before the previous frame (+clear) finishes + timer query starts.
I guess that timing the same code at different places in the scene could have quite different results depending on previous commands issued.
Putting a glFinish before starting the query gave me accurate results for both large + small windows (at the expense of FPS, for benchmarking only), no glFinish needed at the end.

Aleksandar · March 28, 2010, 8:26am

I have to admit that I don’t understand what you mean with large/small window stuff. Of course that the rendering is affected by previous commands. Further more, on the laptops driver shuts down the speed if burden is not enough for full power (for example, 8600M has three levels of power consumption and clock rates).

For the purpose of testing, you don’t need to use query timer. It is aimed for real-time LOD management. glFinish is enough for testing. But as I’ve already mentioned, you should call glFinish at the end of your drawing, or you can get even a 3x higher frame-rate than it is real one (I have tried that on my application last night, and from about 115fps (measured by timer query) a got more than 300fps if there is no glFinish/glFlush commands at the end).

pdaniell · March 29, 2010, 12:04pm

For the purposes of measuring how long the DrawScene() takes in the GPU, neither glFinish() calls should be necessary. The elapsed time query is run on the GL server (in the GPU) inline with all other GL commands. So everything before the glBeginQuery() is complete when the timer starts and everything before the glEndQuery() is completed before the timer stops. Of course, you’ll need some kind of synchronization with the CPU (app thread) to pick up the result.

Aleksandar · March 29, 2010, 12:22pm

I completely agree with you. glFinish is necessary only if ARB/EXT_timer_query extension is not used (and that was the context in which I mentioned glFinish).

But, generally, GPU power management makes it very hard to determine at which core frequency the measure is taken. Further more, it is not guaranteed that the frequency is not changed during the particular measuring. It would be useful if GPU frequencies can be locked for the purpose of fair measuring or boosting the speed in some particular cases.

danbartlett · March 29, 2010, 12:29pm

Everything before the glBeginQuery() will be complete, but if you don’t have a glFinish() before glBeginQuery(), then some of the commands issued after the glBeginQuery() may also be completed before the timer starts if the driver can complete them before the previous queued commands have completed.

eg. If you had:

task1
task2
glBeginQuery
task3
task4
task5
glEndQuery

Then if task1 + task2 take a long time and task3 + task4 can be completed while task1 + task2 are still running, then the timer query will only be measuring the time taken for task5 to complete, since task3 + task4 finished before task1 + task2, and the timer query only starts when task1 + task2 are complete.

Alfonse_Reinheart · March 29, 2010, 12:42pm

Then if task1 + task2 take a long time and task3 + task4 can be completed while task1 + task2 are still running

GPUs are required to do things in order. Thus, task3 is guaranteed to complete after task2.

danbartlett · March 29, 2010, 12:59pm

Ok, it might not complete task3 before task2 is complete, but it might have completed a good chunk of the processing needed for task3 + task4 before task2 is complete.

This is mentioned in issue(8) of: http://www.opengl.org/registry/specs/ARB/timer_query.txt

In certain situations, this can have a large impact on measured time by timer query. In one case, I was getting 50% shorter time intervals measured with a large window compared to a small window.

charliejay · March 30, 2010, 5:24am

Is it more accurate to say that GPUs are required to do things in order if necessary? So task 3 could complete before task 2 and taks 1 provided it didn’t depend on the state of any of the resources used by them?

Alfonse_Reinheart · March 30, 2010, 10:42am

Is it more accurate to say that GPUs are required to do things in order if necessary?

Not in any terms that the user is allowed to see. The GPU cannot report completion of task 3 before tasks 1 and 2 complete. So even if it does reorder things, you are forcibly insulated from the effects of that.

Dark_Photon · July 20, 2010, 2:02pm

In the spirit of Rob’s sharing, I’ll throw this out there.

It’s interesting to apply Rob’s “streaming VBO” technique to static geometry too, and then take advantage of temporal coherence. That is, if you’ve uploaded a batch before and you haven’t orphaned its buffer yet, then… yep, you guessed it. You don’t need to upload it again. Just launch the batch from the old location, again.

In the ideal case (static/near-static scene, static/near-static viewpoint), you end up with perf that’s pretty darn close to NVidia display lists (or bindless preloaded VBOs). Now that is sweet! Worst case, it’s about client arrays perf, which isn’t shabby. Afterall, you gotta get the data to the GPU at least once (though if GL4, you allegedly could use ARB_copy_buffer/bg thread to accelerate that).

You can think of all kinds of ways to improve upon this to maximize perf (maximize cache “hits”).