Is triangle_strip useful with indexed rendering?


i`m asking myself if it is worth the work to rearrange my mesh faces so that i can use GL_TRIANGLE_STRIP to draw the mesh. i thought indexed rendering makes it obsolete since vertices with the same index and instance wont be processed more than once.

does anyone know more?
thanks in advance!

Using a triangle strip reduces the size of the index array from 3N to N+2. If you have many triangles and the per-vertex attribute data is small, the memory saving may be worthwhile. However, triangle strips don’t necessarily minimise the number of vertex shader invocations. Vertices with the same index will be processed more than once if the processed vertex isn’t in the cache.

To minimise vertex shader invocations, use disjoint triangles with the vertices ordered so as to maximise cache hits. This typically requires a different ordering than you would get from a triangle strip. The approach is termed “strip mining”, and involves rendering triangles in bands which are more than one triangle wide. Triangles are rendered across the width of the band in the inner loop and along the length of the band in the outer loop. With the width of the band optimised according to the cache size, vertices in the interior of a band will only be processed once while those on the edge will be processed twice (once for each band which shares that edge). This approach is most useful if the vertex shader is particularly expensive.

Just use optimized triangle lists.

In certain circumstances, strips can be useful (particularly with primitive restarting). However, when and where these circumstances arise cannot usually be determined a priori. You generally need to have profiling data in hand to make it worthwhile.

As such, it is best to just pick indexed lists. Maybe later, when you’re trying to squeeze out all the performance you can, you can do some profiling to find out if a stripped mesh could improve performance. Especially since any performance improvement will be modest.

Basically just stick with the answer that is always not-wrong: optimized lists. They may not be perfect in every case, but they’re 90% of the way there. When the time comes to start looking for that 10% (and remember: this performance only matters if your rendering is bound by vertex T&L or vertex reading bandwidth, which it almost never is), you can investigate alternatives.

ok, thank you both.

when is a triangle list “optimised” ?

those meshes i use are typically about 30k triangles in size. only a few have more than 65536 vertices, for now i use unsigned int as element / index type. my vertex looks like this:

struct Vertex {
    vec3 Position;
    vec2 Texcoord;
    vec3 Normal;
    vec3 Tangent;

i have a GTX 1050 TI, latest driver, how big is that “T&L cache” typically ? i’d like to know more about it.

When you’re making good use of the post- vertex shader and pre- vertex shader caches (also called the post-T&L and pre-T&L vertex caches, for historical reasons).

Optimizing for the former reduces the number of vertex shader executions required to render meshes. Optimizing for the latter reduces the amount of non-sequential access that the GPU needs to do to pull in the vertex attribute data referenced by your indexed triangle lists.

As I recall, for many, many GPU generations (the last 14 years), shared memory on the GPU multiprocessor units (SMs) is used for the post T&L vertex cache. The number of vertices that will fit in this cache is dynamic based on how “fat” your transformed vertex data is. The more varyings you output, the fewer transformed vertices that’ll fit in this cache.

GPU shared memory used to be fairly limited. Not so anymore. Just pick a triangle order optimizer that degrades gracefully across various post-T&L cache sizes and you’ll be fine. Tom Forsyth’s is a good choice.

Another win, possibly bigger, is to be able to concatenate multiple primitives to a single draw call. You can do this with strips as well but it constrains the concatenated primitives to all be strips. A list is just so much more flexible.

ok, thank you all.

i render all my meshes in a single drawcall using glMultiDrawElementsIndirect(…). i use gl_DrawID (glsl 4.6) to get the current “material index” from a SSBO, from the material index i get the material, which contains “texture indices” which are used to get the texture sampler from another SSBO containing bindless textures.

before drawing everything, i call a compute shader which does “invisible object culling” (from the view frustum) and discrete mesh LOD selection (depending on the size of its bounding sphere on screen).

i tried a depth prepass, but there is no porformance gain (i query the frame time) … :thinking:

i`ll take a look at Tom Forsyths’ algorithm …