Display lists vs. vertices arrays

Warzywo · September 28, 2008, 1:54am

Hi, what in Your opinion is faster while rendering ? I would like to render complex meshes like terrain and simpler one like solid objects. When i make some tests i will place results here, but i would like to have some info :). Thx

Zengar · September 28, 2008, 2:33am

DL lists and VBOs should be about the same speed. It is preferable to use VBOs for everything (as they allow maximal flexibility), but display lists are still a good choice for static objects.

Warzywo · September 28, 2008, 4:44am

DLs are easier to implement ;), but vertices arrays are the fastes. I did some tests, VBO not present:

Renderer: ATI Radeon HD 2600 Pro AGP

terrain mesh - 1024 vertices:

normal ~ 195 fps
DLs ~ 202 fps
v.arrays ~ 205 fps

terrain mesh - 65536 vertices:

normal ~ 49 fps
DLs ~ 87 fps
v.arrays ~ 94 fps

resolution: 1430x940, multisampling 4x

Zengar · September 28, 2008, 5:45am

Actually, on a reasonable implementation DL should be faster then plain client-side vertex arrays

zeoverlord · September 28, 2008, 7:51am

I don’t know about that, i once tried 2 different methods of doing shadow volumes, with immediate mode and with VBO, all the data was rewriten every time, but still the VBO went way faster.
If i can do that then surly the driver also can with vertex arrays and DL.
And from the numbers Warzywo posted i say that is the case, for both.
But vertex arrays are already formatted correctly so i guess it would account for that difference.

Zengar i agree with you that i might seem like DL should have the possibility to have VBO speed, since we have a simple way to tell if the data has changed, and that means you don’t have to build new VBOs every time internally.
My guess is that they just don’t care, DLs have been pretty much deprecated for some time now (for real in openGL3).

Warzywo, would you care to rerun that experiment by adding VBOs into the mix, i think you will find that VBOs are the best for static data.

Brolingstanz · September 28, 2008, 11:56am

Funny thing is DX11 is now introducing DLs as part of its new MT solution. They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread…

Korval · September 28, 2008, 1:47pm

They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread…

Longs Peak Reloaded was going to provide something not entirely unlike display lists too.

knackered · September 29, 2008, 12:42am

Display lists are notoriously slow on ATI hardware. NVidia display lists are screamingly fast, though (just geometry).

Dark_Photon · September 30, 2008, 7:21am

I heartily second that on NVidia.

And I should qualify. I only have our product using display lists on NVidia for (what was called here recently) “geometry-only display lists.” I.e. no OpenGL state transition capture. Just the raw batches (vtx array binds/enables and batch submission). NVidia is super fast rendering these. I could not get VBOs to touch them (and I tweaked formats and packings like crazy; and yes, I’m using a good indexed triangle optimizer).

For instance, some past numbers I captured:

Client arrays: 16.7ms draw
Server arrays (VBOs): 11.4ms draw (31% faster)
Geometry-only display lists: 7.4ms draw (56% faster)

This is draw only, which excludes cull and other frame overhead.

And until I can get the same or better performance from VBOs, I don’t want to get rid of OpenGL display lists. OpenGL currently does not publish enough info about the driver “fast path” to reproduce it, or perhaps even enough functionality.

CatDog · September 30, 2008, 7:30am

Do you optimize your arrays anyway, before putting them into the lists? Does this make a difference?

CatDog

Dark_Photon · September 30, 2008, 9:55am

Yes, optimized arrays always submitted. ACMR in 0.7-0.9, so they aren’t too bad. But you do raise an interesting question: whether NVidia’s display list implementation re-optimizes triangle order.

CatDog · September 30, 2008, 10:08am

Yes, exactly that was my question…

If I find some time somewhere, I’ll give it a try.

CatDog

CatDog · September 30, 2008, 5:38pm

Hmmm, at first glance, I can not second your observations concerning disply list speed. Here is what I did:

I’m drawing a cache optimized static mesh (only positions and normals) using exactly 6 glDrawRangeElements-Calls. Around 5 mio tris and 0.8 mio verts in total. On GeForce 7950GX2 on WinXP.

All vertices stored in one interleaved VBO. Indices stored in one element VBO.
–> 92 FPS
Now instead of using VBOs, I’m wrapping the 6 glDrawRangeElements-Calls into a display list and call this instead.
–> 27 FPS !!!

(And both of my CPU cores jumping to max. With VBOs, it’s around 30%…)

Huh? I’m stopping here, because my original question (is nVidia optimizing display lists?) becomes irrelevant at this point.

CatDog

knackered · October 1, 2008, 4:41am

things i’ve observed on nv hardware with dlists:
1/ you get better performance if you ACMR the triangles before creating the list.
2/ you can actually beat nv display lists with vbo if you pack multiple batches into the same vbo and offset the indices so you don’t have to re-bind the pointers in between.

Dark_Photon · October 1, 2008, 6:16am

Will have to try that knackered. In my above stats (reflective of other tests) I had one batch per display list and one batch per VBO (2 really, one for indices) to keep things apples-to-apples. However, given your info, who knows – maybe NVidia dlists are packing multiple into a single VBO pair behind the scenes… Wish this weren’t such black magic. (gluOptimizeBatches anyone? Heck, I’ll take gluNVOptimizeBatches.)

knackered · October 1, 2008, 6:57am

I think they are packing them into vbo’s based on the order in which the dlists are created. I have a vague memory of doing a test and coming to that conclusion. Create dlist#1 for an object, then create a lot of redundant dlists, then create the next real dlist#2, then render dlist#1,dlist#2,dlist#1,dlist#2 etc. and you get worse performance than if you didn’t create the ones in between.
The “geometry display lists” idea has more legs than a football team. The IHV is better placed to format my static data than me.

CatDog · October 1, 2008, 6:59am

That’s what I did above. Six draw calls using different index offsets. The VBO is bound only once per frame.
But on the other hand, I also only created one display list, containing the six draw calls. Strange.

CatDog

blackwind · October 1, 2008, 7:46am

[quote=“Dark_Photon”]

is that for static geometry or dynamic (say, an animated character)?

knackered · October 1, 2008, 9:53am

are you using VBO’s in the display lists?

CatDog · October 1, 2008, 10:12am

Me? No. Just the plain client array to compile the display list.

(blackwind, it’s all static geometry.)

CatDog