dovkruger, ultimately you need to do your own performance testing to prove to yourself which method is better. But if you don’t, just use well-optimized indexed TRIANGLES, with VBOs when possible. Talking about performance when using immediate mode (glBegin…glEnd) is probably pointless.
Look at the diagram on page 21 here:
OpenGL ES Hardware (Bitboys)
Then google on post-T&L vertex cache.
With tri-optimization, you’re addressing vertex transform throughput and vertex bandwidth, in that order. If you’re fill bound now, this is all academic and you won’t see any difference.
Re NvTriStrip, great training wheels library, but “very” slow, and does not handle degenerate geometry (e.g. edge with > 2 face neighbors, sometimes used in lower-LOD models).
Now read this:
Fast Vertex Cache Optimization (Tom Forsyth)
Just spend a few hours to read it, understand it, and implement it. You’ll be glad you did. It’s much faster and doesn’t fall flat on degenerate geometry (in fact, it doesn’t care about mesh topology).
Now, having optimized for vertex transform throughput with optimized indexed TRIANGLE primitives, what about using indexed TRIANGLE_STRIPs to perhaps save some index bandwidth? You don’t want to bust up your draw calls (one glDraw*Elements call per strip) for best performance. So that leaves NV_primitive_restart or degenerate triangles. The problem with the former is:
That leaves using degenerate triangles to connect strips. That may net you a little bit of bandwidth shuffling indices around, depending on the tri-optimizer output, but that’s it. Check it and see.
So just use well-optimized indexed TRIANGLES, with VBOs when possible, and benchmark everything else (including your immediate mode version and indexed TRIANGLE_STRIPs with degenerate tris) against that. If you find any method that’s faster, let us know!