Performance question

I’get quite confused with my pb of performance.
So, to sump up:
-Win2000 + win AGP fix
-VIA chipset + latest AGP fix
-GeForce2/64Mb, Detonator 6.31 (latest)
-pure Win32 application
-using PerformanceCounter for fps
-disabled VSync

Okay, onto the performance :
-640x480x32 + stencil buffer (tried without)

geometry = 8000 vertices, dispatched in a dozen display list, so let’say 10 DL, each with 1000 vertex arranged in 10 triangle strips of 100 vertices.
My striping code achieve a effiency of 1.25/3
(ie for a 300 vertices mesh, it will make
a 125 vertices triangle strips : quite good! not fast, but efficient).
No abusive state change (at all!)
No stack overflow or abuse (checked).
Each mesh is drawn using N small display list
(around 500 vertex max), imbedded in
another DL (hierarchical DL).
Each inner DL is made with glDrawElements(GL_TRIANGLE_STRIP).

1 light, no texturing at all.

Problem is : 25 fps ! ONLY ???
in 640x480? (tried fullscreen and windowed)
Without DL (direct call to glDrawElements
with corresponding glVertexPointer each time)
-> 20fps.

My question is :
-Is it normal to get such a loooow framerate on such a card ?

Your help may be :
-a pointer to some src that achieve 60fps
with 10000 vertex in 10 objects…

(I know some of you can do this :-> )

Are you sure your frame rate counter works correct? Do you perhaps call glFinish() or glFlush()?
I get that framerate with an at least 10.000 poly model, that is stripped on a TNT, with lighting turned on. What lightmode do you use? Perhaps spotlights?

have a look at the balls! program from the nvidia site

I tweacked my code in all the way I could think of.
No glFinish nor glFlush at all.
No lighting at all : glDisable(GL_LIGHTING) -> fps x2 (no more)
So Lighting isn’t the bottleneck.
(I use both local light and infinite light)
No Texturing at all (so this is not an issue)
FPS counter is exact ! (checked so many times!..)
Geometry setup may not be the issue : I tried without all the glRotates, glTranslate, etc… : negligeable fps gain.
I looked at all the demo src I could get a grasp on. None help ! (I mean, either they are showing simple geometry, ie spheres or cubes…, or the NVidia code is using wglAllocateMemoryNV : I CANNOT USE THIS, as I want a portable code)
I’m sure not to send twice the geometry or this kind of error (checked even with printf!).
Not filllimited (a 64x64 window runs at the same speed as a 512x512)
Tried it onto other cards : same fps, so this is not a hardware issue.

Maybe I could send part of my code to some of you, who may wish to help ?

Or, do you know of a freeware that could “track” this kind of bottleneck ?

Thanks for you help !

Hmmm. This is strange. But maybe this helps: are you using a perspective projection setup with gluPerspective ? If yes, with what angle ?

From my observations (using almost the same harware as you), if you either use very large angles or strange values for the angle, the performance can drop.

Be carefull, if you want an angle of x degrees, you have to pass x/2 as parameter to gluPerspective.

No, you won’t have to pass an value of x/2 degrees. I never saw I’d see 180 degrees…

Oh, amerio, I could have a look at it.
I want NO texturing NO stencil NO LIGHTING
that makes it always so complicated

> No, you won’t have to pass an value of x/2 degrees. I never saw I’d see 180 degrees…

Don’t get me wrong on that: if you want a view angle of 90 deg, pass 45. If you pass 90, you’ll get 180 (the image will be distorted). Just try it.

I do use gluPerspective(60.0, blahblah…)
Not an issue, I guess

I’ll send part of my code to who can help !
(well, the full app is quite too big
to be posted :> )

Thanks for your efforts. Thanks !

>>if you want a view angle of 90 deg, pass 45. If you pass 90, you’ll get 180

I pass 90 degrees and get 90 degrees. You sure you’re not using some obscure version, or thinking about something else?

memo is right. gluPerspective takes half the field of view.

amerio, what I think, is that not the video card or OpenGL, but the computing of the geometry of the scene is at bottleneck in your application. You point, that the framerate is not dependent for example of the window dimensions. This sounds like pure processor overload…
You may check it - for example if u r using MS VC++, u can turn profiling on… (use function timing setting). Then u can see what function exactly gets your fps down.
U may also make a function:
void DrawElements(…) { glDrawElements(…); }
and to replace all glDrawElements calls to this DrawElements - this will help you to devide the computational code from the OpenGL calls. this will also slow down the performance, but it’s only for the profiling - later u can make the function macro, or define it inline.
Try to make such a function for every OpenGL API function, which u call and u think that it is slowing your programm. After that run the profiler and check where is the fps flow
I’m almost sure, it will be the code of pure computations…

You say the “computation” of the geometry might be the bottleneck, but my geometry is absolutely static ! Nothing is being morphed, or deformed. Each and every object
is in its own DL. Then I simply call all
DL… (remember, they all are made with GL_COMPILE, and calling some glDrawElements).
So tracking the glDrawElements make no sense to me, as it is called only once: when I build up the DL…
After that, there is only some PushMatrix/Pop (one per object), and that’s all !
Maybe tracking the glCallList ?

Sounds like T&L is disabled, but I believe that it is always on with OpenGL…
What gives ?

Oh, about my src code : I’m trying to isolate some part of it to track down that bottleneck… (I’m not working on some kind
of demo : it’s a “wannabe” professional app…)

Do you know of some demo w/ src that DOES NOT use wglAllocateMemoryNV ?

The Steve’s Balls (or SphereMark) demo supports vertex array range but does not require it. In fact, you should easily be able to get tens of millions of triangles per second with that demo in display list mode.

  • Matt

Oh I’m going crazy !
Just wrote a simple test app, that draws a single strip with 20000 triangles in it.
No lighting, no textures, no nothing special ! Just the minimal app.

Guess what : 25fps.

Using DL, glDrawElements or begin/end doesn’t change an inch !

What on earth am’I doing wrong ???
I dig into the spheremark, but did not see
any super-cheat-tip-godmode !

I can send src and exe (under 49ko) to anybody.

M.Steinberg : I posted it to you, as you proposed your help earlier…

Sorry Matt, but i am using opengl under NT4.0
and there is a BIG difference with performances when i am using VAR and display lists!!! could you please make a version of the balls source code available using the display lists?? i am really interested??
In fact with VAR my gain is x2!! i did not use glFence all my data are stored onboard forever and could not be modified in any way?? Thus, i wonder how the display lists mechanism could be faster than that?? moreover why Nvidia has done VAR if display lists can do the job?

The code is on our website.

I’m not sure if that is the latest version, but it probably has dlist support…

VAR is always going to be fastest, because it’s direct access to the HW. You should be able to do equally well with display lists in certain cases, though.

Again, you should be able to get >10 Mtris/sec with that app’s display list mode easily.

It’s not too difficult to get the theoretical-maximum numbers, either. I know I was able to get 30.some Mtris/sec with a GF2 Ultra – just turn off lighting and shrink the window. Not sure if I got that with dlist mode or VAR mode.

  • Matt


Are you using the latest OpenGl optimized drivers for your 3D card. I saw a similar problem in a dual boot machine (W98SE / W2000).
When using W2000 the FPS of most applications were quite smaller than that attained when running the same apps at W98.
Most 3D Cards Drivers are not optmized for W2000. Maybe you dont even have hardware acceleration at all when running on W2000.

Good luck,

Rafael Del Rey

I use the latests Detonator 6.31.
Proc is a AMD 1Ghz. Test under PIII show the same speed. So not a proc issue.
Ran it both under W98/W2000 : same speed
Ran it with other cards (ATI) : slower than ever.
And without HW acceleration, I can’t get even 1 fps.
All the NV demos runs smoooooth ! So yes, there is HW…
Am I doomed, or something ?
Want to see the code and exe ?

This isn’t going to solve your problem at all, but perhaps you could get a slight more speed up by using glcalllists instead of glcalllist for every single DL?