Re: VBO & DisplayList,which is faster

To what GClements mentioned (referencing the glDraw*Instanced*() draw calls, and similar with indirect draws), many drivers/GPUs actually can pack different instances in an instanced draw call into shared thread groups. So that’s probably not an concern here.

Regardless, this is a GPU-side perf issue. Your CPU time needed to dispatch all of these otherwise separate object draw calls will be significantly reduced. And if you’re currently making many draw calls perf frame and CPU-side frame time limited, switching to instanced draw calls nets you a huge perf++.

Now with MDI rendering (i.e. the the glMultiDraw*Indirect*() draw calls where we talking about putting different objects in different GL_DRAW_INDIRECT_BUFFER subdraw records), that falls squarely in the category GClements is referring to. That said, again this is completely a GPU-side perf issue. The amount of CPU time needed to queue a few MDI draw calls (or a few instanced draw calls for that matter) is almost zero. This is a huge CPU-side perf++, if you’re currently massively CPU-side frame rate limited. And there are GPU-side techniques to reduce the GPU-side perf cost if/when that becomes an issue.

A big part of the win with using instanced draw calls and/or MDI draw calls is the data and state reorg that you have to do to use them. Namely: 1) pack multiple objects in shared VBOs/IBOs, and 2) get rid of all of the often-needless GL state changes that you are doing between each of those original draw calls … so that it’s even possible to launch a bunch of object draws with a single draw call.

Related thread: