glDrawElements vs glDrawArrays - Revisited

Related to…

It seems like you could save a lot of data if you could manage gl.TRIANGLE_STRIP and cull back faces. You could have the same amount of vertices and texture coordinates as drawElements minus the indices; for 2D graphics.

You appended a post to a 20 year old thread, that someone revived 1 year ago.

Please don’t revive old threads. Start new ones.

Yes, it seems like that would be the case, but generally speaking, it isn’t true for most meshes. Especially if you can use 16-bit indices (perhaps with multiple glDrawElementsBaseVertex calls).

Remember: performance is governed by two factors: the cost to read the data and the cost to process it.

Arrayed triangle strips of arbitrary meshes will tend to replicate vertices. They just have to; they cannot get perfect vertex reuse. As such, they will execute the same vertex shader on the same vertex data, but without the GPU being able to ever be able to detect that this is happening. So while arrayed strips have a lot of built-in reuse possibilities, there will still be a lot of duplicate vertex positions across the mesh.

That means duplicate reads and VS invocations.

An optimized indexed triangle list at least provides the possibility of eliminating these things. The same index always corresponds to the same VS output data, so a post-T&L vertex cache can allow such optimized lists to avoid reading and processing data. The same vertex could be used 3 or 4 times in a mesh without invoking a data read (outside of the inex) or a VS invocation.

Depending on how big your GPU’s post-T&L cache is, and the topology of your mesh, you can achieve nearly a 1:1 ratio of unique input vertex data to VS invocation execution. You’ll only ever achieve this with arrayed triangle strips in certain very specific mesh typologies.

And even if you blow past your post-T&L cache, your pre-T&L cache could still have that vertex’s input data. So even if you have to run a VS invocation for a duplicate vertex again, you may not have to use bandwidth to do so.

Of course, there’s no such thing as a free lunch. To gain this benefit, you have to use indices. And this requires reading more data in general. But in most cases, it’s worth the additional bandwidth, especially if you can keep your indices at 16-bits.