Display Lists are running slower than immediate mode

martin_marinov · May 18, 2002, 12:32pm

Hi

maybe the problem is that dl compilation for this is very simple - jast put the vertices in one array, normals in other and so on. So normals get multiplied by 4 in this case, efektively increasing the geometry send by 60% (4 vertices + 1 normal vs. 4 vertices + 4 normals). And since the display list is too large, driver chooses that it cannot fit into VM, so you hit the bandwidth limitation when you glCallList().

I dont have even an imagination what the driver does here, I’m only guessing . So I can be completely wrong, of course
This however proves that the imediate mode has its use cases, and maybe it’s better that it exists - of course developing 3D game seems for me a wrong use case for it

Regards
Martin

mcraighead · May 18, 2002, 1:26pm

I’m in favor of throwing out immediate mode… I think immediate mode should be part of GLU.

Matt

martin_marinov · May 18, 2002, 1:52pm

Originally posted by mcraighead:
[b]I’m in favor of throwing out immediate mode… I think immediate mode should be part of GLU.

Matt[/b]

At least the live will be easyer for OpenGL driver developers after that

Martin

Lev · May 18, 2002, 2:38pm

Immediate mode as part of GLU is a cool idea! It wouldn’t even be that hard to implement via vertex arrays. Since immediate mode isn’t aimed for performance anyway a small glu layer wouldn’t change that much

-Lev

[This message has been edited by Lev (edited 05-18-2002).]

knackered · May 18, 2002, 2:43pm

Originally posted by mcraighead:
[b]I’m in favor of throwing out immediate mode… I think immediate mode should be part of GLU.

Matt[/b]

I agree it should not have been part of GL in the first place, but in GLU. But, weren’t vertex arrays not included in the original 1.0 version of opengl?
Does supporting immediate mode have that much of an impact on what you can do with the rest of GL these days? If so, then maybe it should be moved to glu - but programs would have to be recompiled…

mcraighead · May 18, 2002, 6:17pm

That is of course the problem. GL1.0 had only immediate mode and display lists. There was no good, fast way to do dynamic geometry.

Matt

system · May 18, 2002, 9:30pm

There were vertex array extensions during 1.0 (GL_EXT_vertex_array).
Kind of strange that something as obvious as VA’s wasn’t ready in 1.0

Matt, is that really the explanation for the performance loss? What’s the full story?

V-man

mcraighead · May 18, 2002, 11:22pm

I have no way to know without running the app, but I suspect it’s as simple as the geometry expanding when extra normals are added.

Matt

knackered · May 19, 2002, 3:51am

Specifying a normal for every glvertex call, and removing the glcolor call - these are the results (on gf3ti500 with 8000 cubes and a static viewpoint):-

WITH dlist: 34fps
WITHOUT dlist: 43fps

So no change there.

Matt, with all due respect, you have seen the entire app - except for the creation of the window and context - window is about 512x512, context is 32bit colour buffer, 24bit zbuffer, 0bit stencil buffer, double buffered.
wglMakeCurrent is issued once at initialisation.
There’s a very low priority thread dealing with window messages.

I’m not too bothered by this, because as I said, I don’t use immediate mode in anything important anyway - but I do hope that display lists work better with VA’s, as I create these if VAR or VOB is not supported on the card.

mcraighead · May 20, 2002, 1:53pm

No, the only way I could know what was going on would be to run the app. [Which I probably don’t have time to do at present…]

Matt

Shag · May 20, 2002, 4:39pm

Knackered … this may seem insulting (not meant to be ) … but you’re not doing your drawing based on windows messaged are you?

You mentioned a low priority thread …

[This message has been edited by Shag (edited 05-20-2002).]

knackered · May 20, 2002, 11:24pm

No Shag, I’m not. The message thread just deals with resize, mouse, quit and char messages. The drawing happens in the main thread (WinMain) in this test app.

bsenftner · May 21, 2002, 7:28am

I’m interested in hearing what happens if you switch from using quads to triangles… Drivers seem to be so triangle centric, I would not be surprised if that is the problem…

I remember having something similar occur when I first got a GeForce256 card: my display lists were slower than immediate mode and I could not figure out why. I wasted some time and got all depressed and moved on to other areas of the app that were getting neglected. Then one morning I noticed that my frame rate was higher than normal… the sun spot or whatever it was must have ended because from that day on my display lists have been suitable faster than immediate mode. I believe that nothing I did in the other portions of the app could have affected my display list render speed… so I’m interested in hearing how your investigations turn out.

Rml4o · May 27, 2002, 9:27am

I experienced the same performance problem with my GeForce3 Ti200 card. In some cases, immediate mode would give better performances than display lists. From a few experiments I deduced that display lists are worse than immediate mode when they are small. Display lists must be used in a clever way, that is you must create not-too-small ones, with as many shared vertices as possible, and large strips or fans or quads. Anyway, you should use VAR: it’s the best possible means of specifying geometry. And if you improve the sorting of your indexes, you will benefit from the vertex cache.

knackered · May 28, 2002, 12:36am

I don’t really care about immediate mode, I never use it - it was an experiment, for heavens sake!
I’m sure this performance difference was introduced long ago, and I’ve only just discovered it because I never use it.
The upshot is, I tried drawing the cubes as quads, then as triangles, specifying per face normals, then specifying per vertex normals, creating a small cube-sized display list, then creating a large 8000-cube-sized display list, then creating a medium sized 200-cube sized display list. No matter what I did, display lists were always slower than the uncompiled immediate mode commands.
It doesn’t worry me, because vertex arrays compiled into display lists are still faster than uncompiled vertex arrays - this was all I was concerned about.
I still don’t understand why the performance is bad with compiled immediate mode, but I’m past caring - and the test program I wrote has moved on to other things…