Hi, what in Your opinion is faster while rendering ? I would like to render complex meshes like terrain and simpler one like solid objects. When i make some tests i will place results here, but i would like to have some info :). Thx
DL lists and VBOs should be about the same speed. It is preferable to use VBOs for everything (as they allow maximal flexibility), but display lists are still a good choice for static objects.
DLs are easier to implement ;), but vertices arrays are the fastes. I did some tests, VBO not present:
Renderer: ATI Radeon HD 2600 Pro AGP
- terrain mesh - 1024 vertices:
normal ~ 195 fps
DLs ~ 202 fps
v.arrays ~ 205 fps
- terrain mesh - 65536 vertices:
normal ~ 49 fps
DLs ~ 87 fps
v.arrays ~ 94 fps
resolution: 1430x940, multisampling 4x
Actually, on a reasonable implementation DL should be faster then plain client-side vertex arrays
I don’t know about that, i once tried 2 different methods of doing shadow volumes, with immediate mode and with VBO, all the data was rewriten every time, but still the VBO went way faster.
If i can do that then surly the driver also can with vertex arrays and DL.
And from the numbers Warzywo posted i say that is the case, for both.
But vertex arrays are already formatted correctly so i guess it would account for that difference.
Zengar i agree with you that i might seem like DL should have the possibility to have VBO speed, since we have a simple way to tell if the data has changed, and that means you don’t have to build new VBOs every time internally.
My guess is that they just don’t care, DLs have been pretty much deprecated for some time now (for real in openGL3).
Warzywo, would you care to rerun that experiment by adding VBOs into the mix, i think you will find that VBOs are the best for static data.
Funny thing is DX11 is now introducing DLs as part of its new MT solution. They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread…
They amount to single-threaded command buffers, precompiled for deferred execution in the main render thread…
Longs Peak Reloaded was going to provide something not entirely unlike display lists too.
Display lists are notoriously slow on ATI hardware. NVidia display lists are screamingly fast, though (just geometry).
I heartily second that on NVidia.
And I should qualify. I only have our product using display lists on NVidia for (what was called here recently) “geometry-only display lists.” I.e. no OpenGL state transition capture. Just the raw batches (vtx array binds/enables and batch submission). NVidia is super fast rendering these. I could not get VBOs to touch them (and I tweaked formats and packings like crazy; and yes, I’m using a good indexed triangle optimizer).
For instance, some past numbers I captured:
- Client arrays: 16.7ms draw
- Server arrays (VBOs): 11.4ms draw (31% faster)
- Geometry-only display lists: 7.4ms draw (56% faster)
This is draw only, which excludes cull and other frame overhead.
And until I can get the same or better performance from VBOs, I don’t want to get rid of OpenGL display lists. OpenGL currently does not publish enough info about the driver “fast path” to reproduce it, or perhaps even enough functionality.
Do you optimize your arrays anyway, before putting them into the lists? Does this make a difference?
Yes, optimized arrays always submitted. ACMR in 0.7-0.9, so they aren’t too bad. But you do raise an interesting question: whether NVidia’s display list implementation re-optimizes triangle order.
Yes, exactly that was my question…
If I find some time somewhere, I’ll give it a try.
Hmmm, at first glance, I can not second your observations concerning disply list speed. Here is what I did:
I’m drawing a cache optimized static mesh (only positions and normals) using exactly 6 glDrawRangeElements-Calls. Around 5 mio tris and 0.8 mio verts in total. On GeForce 7950GX2 on WinXP.
All vertices stored in one interleaved VBO. Indices stored in one element VBO.
–> 92 FPS
Now instead of using VBOs, I’m wrapping the 6 glDrawRangeElements-Calls into a display list and call this instead.
–> 27 FPS !!!
(And both of my CPU cores jumping to max. With VBOs, it’s around 30%…)
Huh? I’m stopping here, because my original question (is nVidia optimizing display lists?) becomes irrelevant at this point.
things i’ve observed on nv hardware with dlists:
1/ you get better performance if you ACMR the triangles before creating the list.
2/ you can actually beat nv display lists with vbo if you pack multiple batches into the same vbo and offset the indices so you don’t have to re-bind the pointers in between.
Will have to try that knackered. In my above stats (reflective of other tests) I had one batch per display list and one batch per VBO (2 really, one for indices) to keep things apples-to-apples. However, given your info, who knows – maybe NVidia dlists are packing multiple into a single VBO pair behind the scenes… Wish this weren’t such black magic. (gluOptimizeBatches anyone? Heck, I’ll take gluNVOptimizeBatches.)
I think they are packing them into vbo’s based on the order in which the dlists are created. I have a vague memory of doing a test and coming to that conclusion. Create dlist#1 for an object, then create a lot of redundant dlists, then create the next real dlist#2, then render dlist#1,dlist#2,dlist#1,dlist#2 etc. and you get worse performance than if you didn’t create the ones in between.
The “geometry display lists” idea has more legs than a football team. The IHV is better placed to format my static data than me.
That’s what I did above. Six draw calls using different index offsets. The VBO is bound only once per frame.
But on the other hand, I also only created one display list, containing the six draw calls. Strange.
is that for static geometry or dynamic (say, an animated character)?
are you using VBO’s in the display lists?
Me? No. Just the plain client array to compile the display list.
(blackwind, it’s all static geometry.)