This answer comes from a NVIDIA called “OpenGL Performance”
Very excellent document
These maximize reuse of the vertices shared within a given graphics primitive, and are all similarly fast.
These aggregate (potentially multiple) disjoint triangles and quads, and amortize function overhead over multiple primitives.
A bit slower than the independent triangles and quads.
DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,1)
Saves data in video memory, eliminating any bus bottleneck. Very poor read/write access.
DrawElements/DrawArrays Using wglAllocateMemoryNV(size,0,0,.5)
Saves data in AGP (uncached) memory, and allows hardware to pull it directly. Very poor read access, must write sequentially (see below)
Can encapsulate data in the most efficient manner for hardware, though they are immutable (i.e. once created, you can’t alter them in any way).
Compiled Vertex Arrays (glLockArraysEXT)
Copies locked vertices to AGP memory, so that the hardware can then pull it directly. Only one mode is supported (see q, 7 below).
DrawElements and DrawArrays using Vertex Arrays with Common Data Formats
Optimized to assemble primitives as efficiently as possible, and minimizes function call overhead. 13 formats supported (see q. 6).
Multiple function calls required per primitive results in relatively poor performance compared to other options above.
All Other Vertex Arrays
Must be copied from application memory to AGP memory before the hardware can pull it. Since data can change between calls, data must be copied every time, which is expensive.
- T&L is automatic. But now, instead of T&L, the Vertex Shader is the new generation. But the GeForce 3 & 4 always support T&L for comptability.