EXT_timer_query not enough for me

This one will be long… Feel free not to read it :slight_smile:

I’m trying to maximize performance in my game using EXT_timer_query for GPU measurements and QueryPerformanceCounter for CPU measurements (yep, is’t under Windows).

My graph shows when an action was started on CPU/GPU and when it was finished. Looks something like this:

CPU: [Sky][Terrain][Physics  ][Water]
GPU: [Sky    ][Terrain   ]    [Water     ]

The example above, shows, that GPU is idle when CPU finishes physics, and CPU is IDLE when GPU finishes water rendering. In this example you could either calculate physics at the end or remove glFinish at end of frame to get some extra performance.

Now I have a strange results when measuring glGetTexImage2D:

CPU:  [..some stuff..][getTexImage2D][...some stuff...]
GPU:  [.......some stuff........][getTexImage2D][....some sstuff....]

So, as expected, CPU issuses glGetTexImage2D and gets blocked until GPU completes this command, but after command is processed I still see GPU doing something, before it goes on.

I have the following theory for this:

Using EXT_timer_query I don’t get ‘begin’ and ‘end’ times from GPU - I only get ‘time spent’. To create such graph I’m using the following approach:

  1. CPU start time (for one action) is exactly what I measured with queryPerformanceCounter
  2. CPU end time is exactly what I measured with queryPerformanceCounter
  3. GPU start time is what I measured with queryPerformanceCounter
  4. if GPU start time is less than end time of previous action, set it to end time of previous action
  5. GPU end time is gpu start time + time spent (queried tiwh EXT_timer_query)

Now this would work perfectly if GPU would execute commands in exactly the same order they were passed from CPU, but it seems like driver is aware that texture I’m trying to read will not be modified by any commands currentry in the queue and therefore executes glGetTexImage2D before pending draw calls.
When I add glFlush before every glBeginQuery I force the driver not to overlap my queries and then I see that glGetTexmage is sinchronized on the graph (ends at equal moment on GPU and CPU). I guess this confirms my theory.

Such graph is a very useful tool and I would like it to be included in my framework since first release. Unfortunately no one will accept it if it presents false information - you either show wrong order of commands or you kill performance by using glFlush and preventing the driver from optimal command execution.

Anybody have a suggestion? I would really love to offer such graph in my framework.

One alternative we’re looking at is to have a variant of NV_fence that returns a 64b timestamp.

I think this would better solve your problem, no?

Yes, that would be enough.
Now the question is what GPU’s can support such feature, and when could it be exposed in drivers.

I beleive all the same hardware that supports EXT_timer_query would be able to support this
variant.

As for when, I’m not sure whether there’s even a spec yet. Will check.

Ok, seems like I’ll just have to be patient (fortunately I’m good at that :slight_smile: ). In the meantime my framework will have graph based on EXT_timer_query.