Display Lists are running slower than immediate mode

Sorry about this, I feel a little embaressed at having to ask this question.
In my very simple test environment, I’m rendering 8000 cubes (made out of GL_QUADS).
I’m running this on a dual PIII 700mhz with a geforce2 gts with nvidia drivers about 1.5 months old.
I’m getting 36fps without using a display lists, and 24fps using a display list. Both methods use immediate mode to send the vertices and normals.
Here’s the code (the Draw_Cube() function just issues 24 glVertex+glNormal calls)
:-

//#define USEDISPLAYLIST
#ifdef USEDISPLAYLIST
static unsigned int displist=0;

  	if (!displist)
  	{
  	

  		displist = glGenLists(1);

  		glNewList(displist, GL_COMPILE);
  			glBegin(GL_QUADS);
  			for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)
  				for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)
  					for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
  						context.Draw_Cube(x, y, z, sizex);
  			glEnd();
  		glEndList();
  	}
  	else
  		glCallList(displist);

#else
glBegin(GL_QUADS);
for (float x=-(w0.5f); x<(w0.5f); x+=wstep)
for (float y=-(h0.5f); y<(h0.5f); y+=hstep)
for (float z=-(d0.5f); z<(d0.5f); z+=dstep)
context.Draw_Cube(x, y, z, sizex);
glEnd();
#endif

Can anyone suggest why the display list is slower than the straight immediate mode?

[This message has been edited by knackered (edited 05-17-2002).]

No idea. But I think your code looks really neat and tidy

Mmm, cheers Robbo - your contribution is welcome…
I think I’ll give the new detonator drivers a go…

I think (just a presume you know) that’s because display list stores all the data to a system procedure of a linked fonctionnalities.
So, opengl has to do more steps compared to you another way (immediate mode).

if you compiled your display lists (if you could : your data are just static), i think you’ll have more power.

hope this help you a bit

euh… sorry,
i didn’t see all your code.
you’re not in immediate mode, just calling the display list which was compiled…
so. thousands appologies.

as i saw, you’ve just one dlist ? maybe to huge for a correct done.

have you tried something like that:

for (float x=-(w0.5f); x<(w0.5f); x+=wstep)for (float y=-(h0.5f); y<(h0.5f); y+=hstep)for (float z=-(d0.5f); z<(d0.5f); z+=dstep)
{
displist = glGenLists(1);
glNewList(displist, GL_COMPILE);
glBegin(GL_QUADS);
context.Draw_Cube(x, y, z, sizex);
glEnd();
glEndList();
}

so, you’ll need to have your displist an array for all the dlists.

Sorry. i think i was not in the context.

Thanks saian - I’ll give that a try.
(I don’t really want to use immediate mode, it’s just a test program - it shocked me, that’s all).
BTW, I’ve just installed the latest w2k detonator drivers, and the results in the same program are:-
WITH display lists:- 24fps (no change)
WITHOUT display lists:- 39fps !! (so that’s gone up by 3fps)

Mm, I get a 2fps increase on the previous display list frame rate if I change it to this:-

if (!displist)
{
displist = glGenLists(cubecnt);

  		unsigned int i=0;

  		for (float x=-(w*0.5f); x<(w*0.5f); x+=wstep)
  			for (float y=-(h*0.5f); y<(h*0.5f); y+=hstep)
  				for (float z=-(d*0.5f); z<(d*0.5f); z+=dstep)
  				{
  					
  					glNewList(displist+i, GL_COMPILE);
  						glBegin(GL_QUADS);
  							context.Draw_Cube(x, y, z, sizex);
  						glEnd();
  					glEndList();
  					
  					i++;
  				}
  			
  	}
  	else
  	{
  		for (unsigned int i=0; i<cubecnt; i++)
  			glCallList(displist+i);
  	}

you don’t use something like glPolygonMode(GL_FRONT_AND_BACK, GL_LINE), do you?

For lines, display lists have pretty poor performance (at least on my R8500).

[This message has been edited by kehziah (edited 05-17-2002).]

Is the slowdown still proportionately the same if you increase\reduce the number of primitives?

I’m just thinking that you might have exceeded or undershot some sweet-spot somewhere

from the red book

Very small lists may not perform well since there is some overhead when executing a list
A simple cube is maybe too small.
I think you have reached 2 limits : method #1 : 8000 is too big and the driver doesn’t cache it on the card, method #2 : 8000 list calls per frame causes too much overhead.
What about say 50 cubes per list?

By reading your code, I think the problem is because you are recreating the display list every frame.
Creating a display list is a heavy job, and will certainly cost more than simply drawing.
So try changing your test code so your DL is created only once, and then test by either calling the DL or pure Bgein/End pair…

But he doesn’t do that…

if (!displist) create list
else call list

Shlomi.

Show us the code of the Draw_Cube function.
Maybe you’re doing some matrix transformation stuff that eats all your memory when compiled into a list.

lol amerio

Ok, here’s the draw_cube code - copy and pasted directly (no hidden wires):-

void kGLContext: raw_Cube(float x, float y, float z, float size)
{
glColor3f(1.0f, 1.0f, 1.0f);

size*=0.5f;

glNormal3f(0.0f, 0.0f, -1.0f);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x-size, y+size, z-size);

glNormal3f(0.0f, 0.0f, 1.0f);
glVertex3f(x-size, y-size, z+size);
glVertex3f(x+size, y-size, z+size);
glVertex3f(x+size, y+size, z+size);
glVertex3f(x-size, y+size, z+size);

glNormal3f(0.0f, 1.0f, 0.0f);
glVertex3f(x-size, y+size, z-size);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x+size, y+size, z+size);
glVertex3f(x-size, y+size, z+size);

glNormal3f(-1.0f, 0.0f, 0.0f);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x-size, y+size, z-size);
glVertex3f(x-size, y+size, z+size);
glVertex3f(x-size, y-size, z+size);

glNormal3f(0.0f, -1.0f, 0.0f);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x-size, y-size, z-size);
glVertex3f(x-size, y-size, z+size);
glVertex3f(x+size, y-size, z+size);

glNormal3f(1.0f, 0.0f, 0.0f);
glVertex3f(x+size, y+size, z-size);
glVertex3f(x+size, y-size, z-size);
glVertex3f(x+size, y-size, z+size);
glVertex3f(x+size, y+size, z+size);
}

As you can see, there are less glNormal calls then I previously said…

kehziah: I tried it with less cubes, but the display lists are always slower - the lower the number of cubes, the less the difference - but dlists are always slower.
I’m now running it on my home machine, with a geforce3 ti500 in it, and it’s the same problem - although all frame rates are slightly higher (obviously).

I am guessing you are seeing driver limitations made by nvidia because they want their consumer based cards slower than their Quadro cards at professional apps. And professional apps use things like dlists. They need to make immediate mode fast for games like mdk2 which use immediate mode.

mdk2 uses immediate mode?? where did they get the engine programmers? from the dumps? When I hear(read) such things I almost agree to someone here in the forum (don’t remember who) who would like to see immediate mode banned from opengl.

-Lev

I feel like I’ve entered the twighlight zone today - I mean, I can remember display lists being around 6X faster than immediate mode…or did I dream that year?

Yes, MDK2 uses immediate mode. (It also has a display list option, but it runs slower with display lists, for some strange reason.)

In this case, I’d suggest that the problem is your use of four vertices for each normal. This sort of nonuniform vertex usage is good for immediate mode and likely to be bad in other cases. It’s very hard to optimize that sort of usage.

Now, if you think overuse of immediate mode in apps is a bad thing, you haven’t seen the incredible stupidity of some GL apps.

  • Matt

> I almost agree to someone here in the
> forum (don’t remember who) who would like
> to see immediate mode banned

Might have been me. I think any API designed for performance should be block streaming based; ideally with application access to driver-allocated buffers.

UNIX write() is bad. nVIDIA VAR/ATI MOB is good.

Anyway, display lists ought to work well when you compile them once and then draw them “forever”. It ought to be possible to optimize simple glBegin()/issue/glEnd() cases no matter what it is that you’re issuing, fairly simply, as the driver ought to be capable of expanding all current state per vertex when Vertex3f() is called. Perhaps the display list optimization isn’t that aggressive, though.

Regarding the initial code: the glCallList() should not be in an “else” as you still need to draw it after compiling it; this is just a cosmetic issue though. Also, if the lists are very large, they may be sub-optimal.

What can I say - I’ll try putting normal calls in between all vertex calls.

I don’t use immediate mode, unless I’m doing a test program and want to knock some triangles up quick, and don’t want to link in my container class lib to do vertex arrays. I usually compile display lists from gldrawelement calls, dereferencing the attribute arrays at dlist compile time.
I’m just puzzled by this anomaly - compiled immediate mode is slower than non-compiled immediate mode…

jwatte, bear in mind this is a test program, which was originally intended to test something other than the rendering part, so don’t worry about me missing drawing objects while display lists are compiling