glDrawElements vs glDrawArrays

I liked the way glDrawArrays worked and comparing it with glDrawElements, it looks more elegant.
The nvidia docs say that glDrawElements is faster because of potential vertex sharing and such but is there really a big difference? Is this true on all cards (Windows, linux)?

V-man

if you know the command specs you know that gldrawelements only send indices, the vertexarray willbe pulled totaly on hardware, and converted to right coords( on some hardware), instead of drawarrays where each vertexdata is send to the card(causing higher bandwidth usage) and you have to arrange the data in an different way, and i think that only by gldrawelements the vertexcache of nvidia works (but i dont know it 100%, i think only it should be so …)

Size of data sent to the card :

DrawArray :
sizeof(vertex) * nb of triangle * 3

DrawElements :
sizeof(vertex) * nb of vertex + sizeof(indice) * nb of triangles * 3

So in every case, the second will be far faster. That is true for todays cards and for pci cards too (in fact, this has more effect on pci cards because of the very limited bandwith).

That is not necessarily true. If there is no (or very little) vertex reuse, then DrawArray will be faster. Without vertex reuse, there is little need to bother with indexing into an array. Remember, regardless of whether you use DrawElements or DrawArrays, you will still need to transfer:

sizeof(vertex) * # triangles * 3

For a particle system, for example, DrawArray would probably as good or better than DrawElements in terms of performance, since there is little vertex reuse.

In general the DrawElements method should be faster, because if Vertices are shared by two or more faces (and it should be like that), than they aren’t send trough the bus twice or more.

But I’ve got no models (3DS Models) for testing this. My models aren’t optimized and so they consits of three times more vertices than faces. In this case DrawElements is as “slow” as DrawArray.

a broad statement but
given the same data eg
int indices[] = {0,1,2,3,4,5};
glDrawElements(GL_TRIANGLES,6, GL_INT, indices );
glDrawArrays(GL_TRIANGLES,0,6);

drawarrays is always faster than drawelements
but the problem is with most data u cant usually use drawarrays.
though i believe there are exceptions i believe the nvidia extensdion VAR/fence only works with drawelements (ive never used it though so this could be wrong)

Originally posted by zed:
bdrawarrays is always faster than drawelements
(2)but the problem is with most data u cant usually use drawarrays.[/b]

  1. Only because you don’t use shared vertices

  2. You can always use drawarrays. The fact is just that very often you’ll send many times the same data (each time a vertex is re-used, its data must be sent again)

“i believe the nvidia extensdion VAR/fence only works with drawelements”

It is very possible to use VAR and glDrawArrays.

Some people have had problems with “shared vertices, but not shared normals or tex coord at those same vertices”. I suppose the solution would be to have another array of normals or tex coords and give the pointer via

glNormalsPointer() (I think it was)
glTexCoordPointer()

Another thing that troubles me is what primitives to use? I can just use GL_TRIANGLES and make a single call to glDrawElements or glDrawArrays or if I’m also using GL_TRIANGLE_STRIP and GL_TRIANGLE_FAN, then there’s gone be a whole lot of calls to those functions!
Has anyone noticed that when you launch Quake 3, it says “using a single call to glDrawElements”.

V-man

Nope. Quake doesn’t draw it with just one single call to glDrawElements. However, it draws most of the geometry with glDrawElements.

as korval saiz i was wrong about u cant use var + drawarrays, it was drawarrays + vertex cache.

ubt i still believe i am right about drawarrays being quicker than drawelements in my above example i gave, unfortunately most models dont come nicely ordered so ( particles systems being an exception though) thus theres a lot vertice sharing.

quake3 does its own backface culling and sends practically everything with glDrawElements( GL_TRIANGLES, … )
which was the best way, but not so now with the use of hardware t+l

Isn’t vertex data already on GPU when we are calling glDrawElements and glDrawArrays ?
I assumed that Array Buffers are stored on GPU.

You are replying to a post from nearly two decades ago. GPUs were different back then.

I made test on my Geforce 1080 TI rendering grid 0x1000 x 0x1000.
So 16777216 quads, or 33554432 triangles.
Here are results:
FPS 92 glDrawArray( GL_QUADS, … )
FPS 58 glDrawArray( GL_TRIANGLES, … )
FPS 16 glDrawElements( GL_QUADS, … )
FPS 11 glDrawElements( GL_TRIANGLES, … )
Other gird sizes where giving similar speed differences.
So it look to me that from performance point of view glDrawArray is better option (in 2020).
Except if you are at risk of running out of memory, or if you are manipulating mesh between frames.
Also it’s worth noticing that best result is theoretically inlegal in modern OpenGL.

I know that thread is old, but it’s first result when searching for comparing of glDrawArray and glDrawElements. So I decided it’s worth sharing research I have done in here.

Given those numbers, I’m going to assume that you’re using client-side arrays for either or both of the vertex arrays and element (index) arrays. I’d expect the numbers to be much closer if both were in GPU memory (buffer objects). In practical applications glDrawElements would be faster because you wouldn’t be duplicating the shared vertices, resulting in fewer vertex shader invocations.

And how did you render them? A 1k X 1k grid has tons of vertex reuse, but glDrawArrays can’t do much of it, and it has far less reuse with GL_TRIANGLES.

Optimized indexed mesh data should mean that the number of vertices in your data is exactly 1k * 1k, or 1M-vertices. If you’re passing the same vertex arrays to your indexed rendering commands as your array commands, then it’s going to have to process the same bloated amount of data. So of course it will be slower.

If you compare optimized indexed arrays, then you’ll find improved performance.

I had optimised data for glDrawElements. I had exactly 0x1001 * 0x1001 vertices.
But before GClements comment I didn’t know you can store indices on GPU.
I never before seen anyone doing that.
It was hard to google information about it, but I finalny found example of code doing that. I haven’t tested it yet.
When you read OpenGL documentation it doen’t even suggest that indices argument can be NULL.

FWIW, it’s 0x1000 (=4096), not 1000 (hence the “16,777,216 quads, or 33,554,432 triangles”).

If it’s 4096x4096 quads, that should be 4097x4097 vertices. If nothing’s shared (i.e. 4096x4096x4 or 4096x4096x6 vertices), there isn’t really much an advantage to using glDrawElements; but there shouldn’t be much of a disadvantage either (unless you do something dumb like storing the arrays in client memory). To get the maximum advantage, you need to sort the quads to obtain the maximum benefit from the vertex cache. Sorting the quads along a Hilbert curve or strip-mining with a strip width based upon the cache size will be an improvement over rendering them row-by-row.

It seems that storing indices on GPU works. I added 2 more tests. Now it looks like this:

FPS 154		glDrawElements( GL_QUADS, ... , NULL ) + GL_ELEMENT_ARRAY_BUFFER
FPS 154		glDrawElements( GL_TRIANGLES, ... , NULL ) + GL_ELEMENT_ARRAY_BUFFER
FPS  92		glDrawArray( GL_QUADS, ... )
FPS  58		glDrawArray( GL_TRIANGLES, ... )
FPS  16		glDrawElements( GL_QUADS, ... )
FPS  11		glDrawElements( GL_TRIANGLES, ... )

Now I need to rewrite code that I was writing for last few months.
But I’m grateful for that information.
Thank You.