Drawcalls being an overhead

Hi All,

I have heard that using too many draw calls is an over-kill and thus we do batching.
I have some basic questions regarding the same.

  1. This overkill is due to the fact that we might have to have a lot of “binds” that happen between these draws and these binds are heavy.
    Is this understanding correct?

  2. Will the new features of Direct State Access in GL remove this draw call overhead?

Can anyone please let me know?
Thanks!

It’s mostly a problem with driver architecture, that derives from API design, so in general it’s not just draw calls what you want to minimize but your contact with the API itself. For many of the calls the driver must perform consistency checks which create unnecessary extra work. The DSA extension and now adoption as core in 4.5 is meant to address some of this overhead, as well as removing some archaic requirements inherited from GL1.

Batching can help a lot with reducing overhead, however this comes at a very big price: it affects your whole production pipeline in the worst case. Your engine has to be able to batch, your asset creation has to be done with batching into consideration, etc. If you ask me, things shouldn’t be like this.

This said, the real work towards reducing driver overhead has been done elsewhere: Mantle is the current flagship for this and DX12 is expected to use the same design philosophy. Recently Khronos announced that “OpenGL Next” is already in progress and AMD offered Mantle as the base for it (which doesn’t mean it will be at all). GLNext will be a complete reboot of the API to address modern architectures and drop the amount of driver overhead, allowing for a better parallel design as well. I have sadly no access to Mantle, but from what I’ve read it seems like an interesting proposal with a very light weight API. It should be technically harder to use as you’ll need to be aware of more to the metal facts, but in my world the less clutter between me and my target the better. I’m personally eager to get my hands on Mantle, DX12 and GLNext.

Even worse, draw call overhead can differ greatly between drivers.

I have done a benchmarking test to analyze this exact problem recently and to my neverending surprise the test shows that on NVidia issuing draw calls with glDrawArrays/glDrawElements can be up to 20x (!!!) faster than on AMD! The test issued approx 30000 draw calls, which took 0.5ms on NVidia and 10.5 ms on AMD with the same system, just the graphics card exchanged and the most recent drivers installed.

For my application it means that batching will make NVidia suffer badly while only offering a modest increase of performance on AMD - because all the batching that is needed comes at a very high cost.

Seeing this result makes me really angry at AMD and their stupid Mantle API. Instead of declaring that a low level API is needed for good performance they should fix their GL drivers. NVidia has already proven that it can be done in a far more performant way

[QUOTE=Nikki_k;1261704]Even worse, draw call overhead can differ greatly between drivers.

I have done a benchmarking test to analyze this exact problem recently and to my neverending surprise the test shows that on NVidia issuing draw calls with glDrawArrays/glDrawElements can be up to 20x (!!!) faster than on AMD! The test issued approx 30000 draw calls, which took 0.5ms on NVidia and 10.5 ms on AMD with the same system, just the graphics card exchanged and the most recent drivers installed.

For my application it means that batching will make NVidia suffer badly while only offering a modest increase of performance on AMD - because all the batching that is needed comes at a very high cost.

Seeing this result makes me really angry at AMD and their stupid Mantle API. Instead of declaring that a low level API is needed for good performance they should fix their GL drivers. NVidia has already proven that it can be done in a far more performant way[/QUOTE]

Out of curiosity, how did you measure those numbers? Due to the dual (multi?) threaded command queuing nature of current drivers it’s very possible your measurements were not really measuring the draw calls at all.

Although GL has been cleaned up lately, I personally am looking forward to working with a lower level API on the desktop.

I got the time before glDraw* and after it, using the CPU’s RDTSC, then added up the intervals for all draw calls I made.
Of couse it only measures the time this needs in my main thread, but ultimately that’s the only thing that’s relevant - if a helper background thread can process the data on another CPU core while I can prepare the next draw call I won’t lose any time, but if all the preparation is done in the main thread, it will hit my code hard. It can make the difference between running at 60fps vs. 20 fps.