Display list performance

I need some help with DL performances.
Should I optimize in some way the data I send in a display list,
or leave it to the driver ?

I mean, actually I’m doing this :
glBegin(GL_TRIANGLES); // no, triangles strips are not suitable for me …
glVertex(a lot of time);

And I’m getting very poor performance when sending 10.000 vertex or over on GeForce2 card, with 64Mb!

To be more precise :
My app reads in Lightwave models, complete with color, texture, etc.
When looking the model inside Lightwave, it runs almost perfectly
smooth (guess around 30fps. But surely a lot above 5fps!)
When playing the exact same model in my own viewer, I
barely reach 5fps, sometimes even 2fps !
I don’t understand why, as :

  • LW does not use display list (the mesh can be deformed)
  • I DO USE display list, with optimized surfaces
    (ie,I first set the materials parameters, then send all triangles
    with this param… etc) I do not have a lot of
    state changing ! Send all triangles at once !
    Even tried with simple cubes (5000 * 6 quads,
    simplest surface parameters…)
    => LW = smooth, Me = bleah !

-LW does not use strips (the mesh is not smooth, so strips are inadequate)
-I don’t neither (same reason)

I use lighting with 1 light, single sided,
infinite viewer and light…

Should I do something else ?

PS: I use glut as the framework.
(performance hit ?)


Perhaps you (or GLUT for you) are setting up
the render state in a way which forces
software rendering. Look at all the
parameters and environment to see if there is
something which can force software (some
weird bit depth or texture format, some
esoteric render state, etc).

Some months ago, while playing with my GeForce, I noticed that this code:


is a LOT LOT LOT LOT slower than doing:


It had nothing to do with rendering states or whatever… I posted it in this very forum (it was the old version back then) and some people confirmed they had the same behaviour !

I do not know if it was a bug in the drivers. I do not know the behaviour is still the same…

But just in case you are using GL_COMPILE_AND_EXECUTE, try to switch to GL_COMPILE…

If this solves the problem, I guess we need to send a message to nVidia…

Best regards.


P.S. : for those who wonder, I really had a factor 10 increase in performance when doing what is described above !

Same for me on TNT2 ULTRA.
I was first using GL_COMPILE_AND_EXECUTE displaylists and it was damn slow, slower than without displaylists (looking my code for bugs, etc, found nothing …) and then, at last ressort, i tried to GL_COMPILE then glCallList … MAGIC ! Suddenly very fast

We’re well aware of the consequences of COMPILE_AND_EXECUTE. Consider, though; it is NOT equivalent to first compiling, then executing. In particular, if you do a Get inside the display list compilation, in one case you’ll get the old value and in the other you’ll get the new value.

COMPILE_AND_EXECUTE is another of those features that OpenGL should have never included in the first place, right up there with feedback, FRONT_AND_BACK rendering, and edge flags.

  • Matt

Matt, believe me I know that !

The thing is, after having done the compilation once, you’d expect the glCallList being as fast in either case…

This was not the case at all when I ran my tests. Using GL_COMPILE_AND_EXECUTE when creating the list resulted in a very slow display list at glCallList level…

If there is a good reason why there should be a difference, I would really like to know it !



I forgot to mention that I did not use glGet or any fancy stuff in my display list… It was pure glColor/glBegin/glVertex/glEnd calls…



OK, so to be more precise :

-The list is ‘GL_COMPILE’
-I’m sure (!) not to be in software render
mode (imagine 10000 multi-textured poly
with alpha blend in software ? even 1fps ?)
But why soooooo slow ?
-no abusive state change
-no glGet at all !
-no glFinish/glFlush
-no EXT used
-ALL triangles are send at once, with a single glBegin()/glEnd() in the DL.
-It is slow even with just one light
and no textures at all, for there is
more than 10000 vertex (to give a rough idea)

I can mail/post part of the code if you
can help…
(oh please…)

Let me get this straight…

If you compile the display list using COMPILE_AND_EXECUTE, future executions of that display list are significantly slower than that display list compiled as COMPILE?

How much slower? Is it is slower than if you just did those exact same commands that you compiled in immediate mode? At minimum, display lists shouldn’t be any slower.

I had always thought it was just the compiling process that was slow. If it’s more than that, maybe there’s a problem.

  • Matt


Yes you got it: when using GL_COMPILE_AND_EXECUTE, the compile time is awfully long PLUS the FUTURE executions of the list ARE SLOW ! As far as I remember they were almost slower than issuing the direct commands (but that is highly subjective as I did not try to time them…).

As I told you, I haven’t tried again since I switched to GL_COMPILE + glCallList. I am going to try again today… Do you want I send you an application if I manage to reproduce the problem ?

Amerio, can you e-mail me your code (if it is not part of a commercial app !) ? I would like to understand what the problem is…

I’ll post the results of my tests here…



OK, just performed the tests again and I have the same behaviour !

It only happens on one of my HUGE models…

When using GL_COMPILE, I have 8-9 FPS.
When using GL_COMPILE_AND_EXECUTE, I have 4-5 FPS.

The thing is, I can not e-mail the model file (first, it is 16Mb big but moreover it is part of a project we did…). I am trying to find such a big model that would show the same behaviour…

Matt, or anyone at nVidia, have you got an FTP site I could upload such a model + the application + the source to ??? Although this program is nothing special, I’ll ask you not to disclose any part of it.



I already know what the issue is, but fixing it could very possibly be more trouble than it’s worth. I don’t know how to fix it, certainly.

  • Matt

Matt, that’s not really a problem as long as people are aware of it ! I keep telling people to use GL_COMPILE only since I discovered this ! Maybe some commercial apps would benefit of knowing/using it…

Can you explain what the problem is or does it touch a confidential part of the drivers ?



Nope, I can’t talk about anything that relates to the internals of our drivers.

  • Matt

I’m not aware of how NVidia drivers are built but …
Why not patch the driver so that when GL_COMPILE_AND_EXECUTE is used, the drivers simply performs a GL_COMPILE and a glCallList ? Is a way, this is compile then execute, behavior is the same and problems are solved

I’m having the a simliar problem with my own code! I’m using display lists with GL_COMPILE on a Geforce256 DDR, and using a couple of loops and glbegin/end to read in some vertices for GL_TRIANGLES and my app is crawling along… I am feeding in a 361x361 array which is generating about 250’000 polys but I’ve always been told about the massive power of the GeForce based cards so what gives? Any suggestions would be welcome, I’m not using GLUT either I working through windows.



I hope you are not suggesting that you are building the display list every frame.

Try breaking your model up into smaller “chunks” and multiple display lists. I’ve found that on the GF , if you use a model in a dlist which has lots of verts, it slows down a LOT. I ended up splitting the model into 4-5 dlists and now everything is dandy. Go figure! I try to stay around 2000 verts / dlist.

Good luck!

Originally posted by paddy:
I’m not aware of how NVidia drivers are built but …
Why not patch the driver so that when GL_COMPILE_AND_EXECUTE is used, the drivers simply performs a GL_COMPILE and a glCallList ? Is a way, this is compile then execute, behavior is the same and problems are solved

No, this is not sufficient… the behavior is different between the two in certain cases when commands are executed immediately rather than entered into the display list.

  • Matt

I have the same problem
the program just do:

glNewList(listnum, GL_COMPILE);
do a lot of triangle display
(about 50000 vertices)

I profiled it using CProfiler from codeguru
and found that on some machine with GeForce card the List compilation (glEndList) takes very long.
I tried it on several computers with MXs, GeForce 2 GTS, and GEForce 256

On some 2 computer with MX (one with W98 PIII 500, the other with W2000 on PIII 450) it only takes 5 - 6 seconds.
On an Athlon1100KT7W2000with Mx takes 26 sec.
On a PIII 500 NT4 GeForce2 GTS 90 sec.
On a PIII 733 W98 GeForce 256 114 sec.

if I turn of the acceleration of the card
(from display setting, choose the lowest
2 performance setting), it takes less than 0.01 sec. (probably no compilation)?
Also < 0.01 sec. on a voodoo 3 card.

I wonder if NVidia knows about this