Which way to cover with TRIANGLE_STRIPs is faster?

GLRon · November 28, 2010, 5:44pm

I have an surface that a single non-overlapping triangle strip can’t cover. Which approach is more efficient?

(a) Creating two triangle strips, the first with 10 vertices, and the second with 4 verticles (10 triangles total)

(b) Creating one triangle strip with 14 vertices (12 triangles total: one tri overlaps, and one tri degenerate into a line since all points have the same y- and z-values).

If you know why, that would also be nice to know.

ugluk · November 28, 2010, 6:43pm

Strange question. The answer is:

b)

Why? Because calls into GL cost CPU time. If you consume too much of it, your app becomes CPU limited.

GLRon · November 28, 2010, 6:47pm

Thanks for explaining!

Why a strange question? I started learning OpenGL yesterday and it wasn’t obvious whether to strive to optimize the # of triangles or the # of calls.

(Although I’m using immediate mode now and I realize that in itself’s not ideal for performance.)

ZbuffeR · November 28, 2010, 11:42pm

b) - “one tri overlaps” this is likely to cause depth fighting and be visually problematic.

Try doing sensible things first, like avoid immediate mode, before doing more dangerous things that would bring limited improvements.

mhagain · November 29, 2010, 12:37am

The first thing you should do is read this:

http://hacksoflife.blogspot.com/2010/01/to-strip-or-not-to-strip.html

Triangle strips are seriously old hat; they were great in 1998 but much much faster and more flexible ways of handling geometry are now available, ways which involve considerably less setup on your part, and ways which you should be using instead.

aqnuep · November 29, 2010, 1:00am

If you want to work with triangle strips, you should rather use primitive restart insttead of degenerate or overlapping triangles.
However, I agree with mhagain that strips don’t provide that much of performance nowadays. You should rather go with indexed triangles and try to depart from immediate mode as it will always result in CPU bottleneck due to the enormous number of API calls needed.

GLRon · November 29, 2010, 1:03am

> Try doing sensible things first, like avoid immediate mode,

The context is I’m studying a computer graphics textbook and it is asking me to using triangle strips (and fans, and quads, and quad strips, and lines, and points, etc.) in immediate mode.

Textbooks usually do not cover the cutting edge of technology, but (hopefully) lay a strong foundation.

> Triangle strips are seriously old hat;

> You should rather go with indexed triangles and try to depart from immediate mode

Thanks, good to know. I read your article on indexed triangles, and will read it again once I get further along in my studies.

Dark_Photon · November 29, 2010, 5:40am

That’s great for getting started, but definitely not after a few weeks down the road when you have your “graphics legs” and want good performance. Look in the index for “vertex cache” (or “post T&L cache”, though that naming is a bit antiquated) and if you don’t find it, you might consider getting a newer graphics book to read along side it.

A few more excellent blog posts to read regarding triangle strips and why they’re a has-been when it comes to GPU performance, and have been for many years:

ugluk · November 29, 2010, 7:11am

ZBuffer, I got my recommendation from this document:

Rendering Huge Triangle Meshes with OpenGL: Louis Bavoil

To connect 2 strips, use degenerate triangles, or the
GL_NV_primitive_restart extension.
For example, to connect 2 strips of array of indices A and B, you can use:
… A_n-2 A_n-1 A_n-1 B_0 B_0 B_1 B_2…

But I see the use of strips is mostly pointless, nowadays.

mhagain · November 29, 2010, 7:22am

Anyway, if one uses tristrips, can one be reasonably certain they aren’t going to render slower than indexed triangles?

Strips will render slower than indexed triangles because strips are completely unable to make use of the GPUs vertex cache, meaning that duplicate vertexes will need to be retransformed and/or vertex shaders will need to be run again for them.

The only way to make use of your vertex cache is to use indexes - because the cache stores indexes. You can, of course, order your indexes to replicate a strip layout if you want, but the point is that you need to use indexes.

Note that this only applies in cases where this is actually your application bottleneck. If your bottleneck is elsewhere then you won’t notice a difference. But at the same time that doesn’t mean that you should feel that it’s OK to use strips, because doing so could make this become your bottleneck!

ugluk · November 29, 2010, 7:44am

I’m so used to using indices, that I’ve completely forgotten to mention, that yeah, I’ve meant indexed tristrips. Now, triangles in a strips are adjacent, so maybe they make good use of the vertex cache?

But if I extrapolate your statements, it seems, that DrawArrays() would then also render slower than, say, DrawElements(), as there are no indices to cache.

About the bottleneck, yeah, I’ve read about it. You need to identify and eliminate them in sequence.

mhagain · November 29, 2010, 7:57am

Now, triangles in a strips are adjacent, so maybe they make good use of the vertex cache?

Not really. See http://home.comcast.net/~tom_forsyth/blog.wiki.html#Strippers

The ultimate stripper will get you one vertex per triangle. But even a very quick and dirty indexer will get you that, and good indexer will get close to 0.65 vertices per triangle for most meshes with a 16-entry FIFO vertex cache. The theoretical limit for a regular triangulated plane with an infinitely large vertex cache is 0.5 verts/tri.
So indexes at worst will typically give you the same as strips, with a good setup giving you ~1.5 times the throughput of the best strip.

But if I extrapolate your statements, it seems, that DrawArrays() would then also render slower than, say, DrawElements(), as there are no indices to cache.

Correct, yes.

ugluk · November 29, 2010, 8:18am

Well, there’s still some hardware, that does not have vertex cache (like mobile phones and gadgets). For those the good ol’ strips are still the way to go

GLRon · November 29, 2010, 10:51am

> Look in the index for “vertex cache” (or “post T&L cache”,
> though that naming is a bit antiquated) and if you don’t find
> it, you might consider getting a newer graphics book to read
> along side it.

Thanks, I’ll keep that in mind. The next chapter covers vertex arrays and retained mode so at least it’s going in the right direction.

> Well, there’s still some hardware, that does not have vertex
> cache (like mobile phones and gadgets).

I’m also a full-time software developer writing an iOS app, so that’s a great clarification.

iOS programming also requires Open GL ES, a subset of OpenGL.

mhagain · November 29, 2010, 12:28pm

Indexed triangles are advantageous for more than just the vertex cache. You can, for example, use them to group multiple strips (or fans, polygons, quads, etc) sharing the same state together without needing to fool around with primitive restart or degenerate triangles.

I would seriously recommend benchmarking them against strips before making any commitment to using strips, and definitely before making any investment in stripping your geometry.

ugluk · November 29, 2010, 4:27pm

Yeah, ipod/iphone are Steve Job’s fodder for the masses. No vertex cache there I think.