Performance Quandry

So I went back to determine why my rendering function is performaning so slowly. And now I am completely puzzled.

Geforce II GTS

~300,000 Vertices
~300,000 Polygons

~10 fps

Vertex array + glDrawElements

Flat shading, Single sided lighting, local viewer, nothing else fancy, all in 1 mesh.

I was researching VBO, but I am not sure if this will help all that much. General suggestions? Why am I performing so slowly.

Hi, You are rendering 300K trings per frame, it’s a lot of geometry for your graphics card. You should render around 10K-20K trigs for 30 fps. If i like to suppose you’re rendering a terrain application, you should use level of details doing a progressive loading of polygons.

Try the following in your render loop:

static GLuint dl=0;
if (0==dl)

If that improves performance, you will benefit from VBOs.

3Mpolys/s isn’t shabby, but the Gf2 should be able to do more.

[This message has been edited by zeckensack (edited 11-27-2003).]

I am going to go back and try display lists again. Initially, they seem to offer no speed improvement. I read a thread which said display lists should not include everything, but that there should be multiple display lists.

I wish I could combine my vertex array + display list, since my vertex array rendering code is much cleaner than my immediate code
renderer. Perhaps, I will skip to VBO directly.


A single display list isn’t the ultimate solution. I made the suggestion because it’s quite easy to drop into the code.

The parallel here is that VBOs allow you to store geometry in driver-managed buffers, ie graphics card memory.
DLs often provide graphics memory buffers for geometry, too, so the transfer bandwidth is roughly the same between the two.

In the long run, VBOs are better in almost every way imaginable (for geometry storage; there are some other nice things display lists can do for you).

>>General suggestions? Why am I performing so slowly.<<

You didn’t tell how big the polygons are.
A GeForce II should be able to transform 15 MVertices/s.
Does it get faster if you shrink your window? Then you’re fillrate bound.
If not,

  • check the vendor, not NVIDIA? => check your pixelformat code, reinstall graphics driver.
  • check renderer string, no AGP?, => reinstall chipset drivers.

Thanks for responding.

Window size makes little difference.
Polygons can be huge, small, their all normalized to within 1-5 units.

I went ahead and implemented VBOs, but they also seem to offer no performance improvement.

Perhaps, this is a result that I passed all indices in as a GL_UNSIGNED_INT list( 300K+).
The only other thing I can try is to break it up to 65K blocks, and use GL_UNSIGNED_SHORT.
But this is looking just like display lists.


New hardware: Geforce 5600 Ultra:
WITH new hardware:

glDrawElements + VBO

~300K polygons
10 fps ( same as GeforceII) ( point to CPU being limiting factor )

I used interleaved array, glDrawArrays, and this tripled frame rates.

~30 fps

HOWEVER, memory usage doubled. This seems to point to the fact, that opengl is not copying my geometry to Video Memory( about 20MB).
How can I force opengl to use video memory,
and/or AGP memory w/ VBO.

UPDATE:I ran the same interleaved code using
regular vertex array, ie buffer allocated in system memory and
~15 fps.

Even taking a best approach: VBO+Interleaved
I only get 9-10 Million Triangles / sec.
This is with infinite viewer, shading = flat,
light model.

Anyone have any tips for better performance!
(I know about tri-strippification, but that cause visual problems!)


[This message has been edited by maximian (edited 11-29-2003).]

[This message has been edited by maximian (edited 11-29-2003).]

Wonderful. Now I hit a new problem.

I use the actc library to stripify my meshes.
But using this with glDrawArrays, results in a large number of triangle strips(relatively speaking), which actually slow down my rendering performance. Anybody know, how I can avoid the following loop with a triangle strip list.

for( i = 0 to numStrips )

Using the recommended 20ish primitives per strip caused it go process even slower.
Probably because of the all functions calls and overhead.