glVertex OR glDrawElements in DisplayLists?

Does anyone know the performance
differences of using:

compared to
/glDrawElements/ glDrawArrays

with display list?
I guess there is a difference, since it
is only the commands that are stored
and optimzed.


MY opengl book states, that the array functions would be wrapped to normal glvertex stuff internally, so it’s probably only a bit faster upon display list creation.

Not necessarily correct. Yes, the specification explains vertex arrays in terms of glVertex/Color/Normal* etc calls, but that is not how things need to be implemented. In fact, any implementation that does this is going to be horribly slow rendering vertex arrays.

Using vertex arrays within display lists can be faster than using glVertex calls. In fact, it might be that the driver tries to optimized the display list built with glVertex calls into a format that looks like a vertex array internally.


The fact is, that it is internally exactly the same. The driver won’t spawn glVertex calls, but it will move the same data over the bus at the end! Means, the format doesn’t change.

What is internally exactly the same?


There is no way to tell which would be faster without having the source code to the display list building part of the implementation. Neither one may be faster; the display list might be the exact same thing either way. The only way to find out is to benchmark it yourself.

The geometry representation is exactly the same internally. The only thing is that you replaced many calls with just one, but the order in which the geometry is processed should be exactly the same. So why should the driver produce different display list contents?

The geometry described is the same, that is correct. However, the tags that go with the geometry to describe what it is, are not necessarily. A vertex array needs only a few tags to describe to the hardware what it is. Instead, for each vertex there might be a tag stored. If the display list is left in this state, the one with vertex arrays has less to transfer over the bus, and could be faster (if it is bus limited). Since it is a display list, the driver can probably optimize the vertex tags away, and the data will look like a vertex array again. However, that takes time, and display list build time is sometimes important to applications as well. If the rendering is not bus limited, removing tags from the vertices is a waste of time.

Thus the answer is, it all depends on the OpenGL implementation and the system it is running on. Benchmarking it is the only way to find out which one is faster. If you do benchmark, make sure you look at display list build time as well if that is important to you.


Personally i think that display lists are a total waste of time. I’ve built them before and get no fps increase what so ever. In fact i get a better fps just through drawing each face seperatly using a vertex array. I’ve tried this on both a AMD athlon 1.4 with GTS2 and a Duron 800 @ 1000 with GeForce DDR. Apart from the speed differences neither system shows that display lists are faster. Either that or im really doing something wrong.

Kaos, we were comparing vertex arrays build into a display list vs glVertex() calls build into a display list. You’re comparing vertex arrays to vertex arrays compiled in a display list, or in other words immediate mode vertex arrays vs display list mode vertex arrays.

Display list mode vertex arrays have the potential to be faster than immediate mode vertex arrays, because the data can be cached by the OpenGL implementation. If it actually makes any difference depends on where the bottleneck is. Display list vertex arrays can alleviate the AGP bus as a bottleneck. If you’re rendering something that is rasterization bound (lots of textures maybe?) it is not going to matter if your vertex array is in a display list, or not.


Sorry. I was just saying thats all… Plus please take into account i’ve only been coding openGL for 2-3months (on off basis). So i leave you to the posting unless any of you friendly people want to help in this post area… Please dont say get Nehe. Got it, used it and suparsed it in areas.

Originally posted by Korval:
There is no way to tell which would be faster without having the source code to the display list building part of the implementation. Neither one may be faster; the display list might be the exact same thing either way. The only way to find out is to benchmark it yourself.

I think barthold would have access to at least one good source code implementation.

“I think barthold would have access to at least one good source code implementation.”

True, but there’s no way to know how nVidia’s or ATI’s implementations work without someone who wrote it telling us. They don’t all have to do the same thing. One could envision an implementation that compiles glVertex calls into an internal vertex array format.

NVIDIA says that glVertex3f is much slower than glDrawElements(…), and I think it’s logical, because you have more calls to do using glVertex3f. Even a display list can’t weighten this. But if you’re not sure, take a big object, let’s say 20.000 ore 30.000 faces and render it twice.

Then you can see the difference

Yes, of course batched drawing is faster. But i think the question was wether the compiled display list would be faster.
Thus, compiling should be faster, whereas the endproduct should be the same?

Ok, I’ll chip in one more time :slight_smile: There’s no way to determine up-front which way is faster, benchmarking it will tell you.

Yes, glVertex* has the potential to be slower, because there is more call overhead than glDrawElements. However, that doesn’t matter a bit if that is not the bottleneck. (Maybe your app is severely rasterization limited).

Rendering performance has at least 4 potential bottlenecks. Unfortunately performance issues often are an interaction of several of them.

  1. The speed at which data can be moved from the application to OpenGL. (like glVertex, vertex arrays, downloading textures, etc)

  2. The speed at which data can be moved from OpenGL to the graphics card (DMA or fast-writes over the AGP bus)

  3. The speed at which geometry transformation, lighting, clipping, vertex shader programs etc can be done.

  4. The speed at which rasterization, (texture mapping, are you texture memory limited? z-buffering, fragment shading etc etc) can be done.

Each of these main points can be subdivided in many sub-items. It is the job of the OpenGL IHV to balance all these issues as best as they can. Unfortunately it all changes per platform (the Pentium-4 increased memory bandwidth by a huge amount over the Pentium-3). Therefore IHVs try to educate application developers on ‘does and do nots’ to get good performance. We also sometimes extend OpenGL to take advantage of the latest technology, in a way that does not expose the latest HW features directly. That is done so that OpenGL stays flexible, and a long term viable API.

In summary, benchmark it if you care about performance and you have a choice of rendering your data one way or the other. Of course you should not ignore any general performance guidelines from IHVs. But try to understand the context for these guidelines.


Originally posted by ffish:
I think barthold would have access to at least one good source code implementation.[/b]

Three by now. The first one was the drivers for the original Dynamic Pictures Oxygen family. DP then got bought by 3Dlabs, thus I got to learn their code. Then a while later 3Dlabs acquired the Wildcat division from Intergraph, and I got to learn that code. It is certainly interesting to see how different companies have solved the same problem (build an OpenGL engine) in different ways.