Display lists not so speedy

I’m using display lists to make a tetris clone…

it’s running at 8 fps in sw mode, and 40 in hw mode on some new ati card (dunno which, it’s at the labs at my university)…

that seems really slow, because i’ve heard of hw processing 20,000 polys, and mine’s only doing, at most, 16x10x30 = 4800?

wow, 4800? more than i thought, but still,
there has to be some way to tweek more fps.

since i’m using display lists, that means to draw each block in the grid i have to call a push/pop matrix. is 300 push/pops a lot per frame? could that be what’s bogging me?

I know it’s not clearing the buffers or using the z buffer, cuz i don’t clear the buffers or use the z buffer!

is there anyway to get around them w/o using global data? maybe i need to go look at some newbie display list tutorials…

hrmmm…

any help would be apprecieated.

-Succinct

p.s. i know that the 20000 - 4800 comparison should also describe machines, because 20000 polys are okay for a graphics workstation, but i can’t imagine 4800 lit (but not textured)polys is too much for a hw renderer, even a crappy ati card, to get to 60 fps.

[This message has been edited by Succinct (edited 11-29-2000).]

Originally posted by Succinct:
since i’m using display lists, that means to draw each block in the grid i have to call a push/pop matrix. is 300 push/pops a lot per frame? could that be what’s bogging me?

Possibly. Why do you push/pop? Just load the
identity matrix and call glTranslate() for
each block instead. You could also get some
more oomph by pre-rendering each “shape”
into a single display list, and the “field”
as one display list, and just re-rendering
the field display list when it changes
(which is only when a “shape” drops).


I know it’s not clearing the buffers or using the z buffer, cuz i don’t clear the buffers or use the z buffer!

Clearing the screen, and using Z buffers,
is “almost free” on current hardware.
Try enabling it; perhaps there’s some magic
in the drivers or hardware which optimizes
the case where you ARE using Z buffering,
and you’ll get FASTER performance :slight_smile:

Also, try turning on your profiler, or just
inserting a bunch of QueryPerformanceCounter
(WIn32) or stimer (*ix) or system_time()
(BeOS) calls to know where you’re spending
the time. Try taking out things from your
display list – sure, it won’t do the right
thing, but you can time the performance with
and without various components and settings
that way, to get more data to guide you.

thank you, sir! i’ll try that (the different display lists and stuff)

i’ve actually done a tetris clone, using a display list for each square/cube, four of which make up one tetris piece, doing pushes/pops, some blending, clearing and using the z-buffer, and lighting as well, getting around 300fps on a geForce (1)… while at the same time, this exact code on my imac with rage ii gets 14fps…

could be your school’s ati cards are old and inferior… you might not have anything to worry about.

I take exception to the claim that clearing is “free” or “almost free”.

I know of at least one commercially shipping, popular OpenGL game that, when I first benchmarked it, was running a full 30% slower than it could have been at high resolutions and 32-bit color due to unnecessary clearing. (1 color and 2 depth clears per frame)

  • Matt

although i haven’t the technical proof to back you up, i have the gut feeling that you’re right, mccraighead. clearing is more work, and more work means more time.

the general rule of thumb, i believe, is, “if it’s not necessary, don’t do it.”

Okay, I mis-spoke.

Clearing certainly uses up fill bandwidth. Doing
more than one clear seems excessive. That being
said, unless you’re on the cutting edge of fill
rate, that’s probably not your biggest problem
(unless you do it excessively). Indeed, most
of the time you probably only need to clear your
Z buffer, not your color buffer, because you’re
going to be drawing all of the screen anyway. And
clearing your Z buffer at some time other than
start of frame seems excessive, although I suppose
you could construct cases where it might be
useful (as opposed to just setting up your culling
such that you use the bottom half of the buffer
first, and then the top half, or something).

I apologize for being quite fuzzy in my previous
statement, and hope there’s nothing to object to
in this opinion.