Performance question

Amerio, por tu nick deduzco que eres español. Bueno, si no me equivoco, te hablo en español pues se me da mejor que el inglés.
A ver… a mi me pasa algo parecido a ti:
Mi programa es el siguiente:
3000 caras texturadas y con una luz.
Bueno, pues este me va lentísimo en un ordenador que lo flipas:
Un ordenador con 2 procesadores Pentium II Xeon ambos a 450 Mhz con una aceleradora profesional compatible opengl y 3Dlabs con 32 megas de video ram, y medio giga de RAM. Pues con ¡do lo ejecuto en esta makina me va lentiiiiiisimo 30 fps, a lo mejor te pasa lo mismo ke a mi: ke hay ke optimizar un huevo el codigo: ensamblador, mmx, kni…
Bueno pues si haces ke te vaya mejor porfavor escriibeme vale? yo hare lo mismo…
vale graciaaas…
Suso.

Sorry, but I’m french, and ‘amerio’ is italian (“parlo un poco italiano”)
But from what I understand of your post, you’ve got quite a similar pb. You can directly mail me if you want…

Once again, I’d appreciate if anybody would have a look to the small test app I wrote (200 lines), which draw A SINGLE STRIP with 20000 vertices in a DL(GL_COMPILE) and that hardly reaches 25fps on a GeF2/64Mb.
I MUST be doing sth wrong. But what ??
All srcs I’ve seen didn’t help. I’m not jocking. And I’m getting upset by this pb.

Thanks…

amerio, if u wish, please send me your sample app - I would like to test it on my PC and to profile it… I hope, I’ll soon report the results to u
e-mails:
medo@sirma.bg
martin_marinov@hotmail.com

Modern 3d hardware accelerate textured surfaces so texturing is FASTER than solid or wireframe drawing !!!

Check the nvidia demos that come with the card. If you switch to wireframe or solid mode the performance is getting worse.

I would suggest trying texture mapped geometry and tell us if it got any better.

Thanks, but this is not my issue.
I’m not doing detail poly, or this kind of things. I DO NEED the geometry, as it is the body of many objects…
Want to share src ?
(going depressed, by now… ( )

Amerio, if you want, send me your test app (if it is about 200 lines) and I can tell you if I find something strange.
I have a testing app, of a engine I’m working in, where I’m drawing a model (with 39177 triangles and 50725 vertex and 1 single texture) four times (that is 156708 triangles) at more that 43FPS (that is more that 6.7 MTris/seg).
This is using a GL_TRIANGLES display list on a GF2 GTS in W2000. This testing app is running at 640x480x32 windowed application with depth buffer with separate specular enable and one infinite light. It is not optimized for this case as it is a speed test of a general engine of objects with positions, different properties, etc.
And it’s drawing text using a font in a different texture…
Also I’m getting similar results on a Radeon DDR card.
So you should expect good speed using display lists. The same program not storing the model in a display list and using NV_Vertex_Array extension is getting more than 6.0 MTris/seg. Without display lists or the nVidia extension it’s getting (aprox.) 1Mtris/seg on both cards.

One problem I remember I have when starting programming in 3D is having a big texture (512x512) and not creating mipmaps because drawing a small triangle with this texture can be referencing all pixels on the texture. You can try changing your textures with a 32x32 ones (for example) and if it speeds up too much it’s possible that you’re not creating or using mipmaps.

Amerio, send me your code (if you want) and I’ll try to find the bug or anything strange. Maybe if we all work on it, we’ll catch it!!

To me, it sounds like a compilation problem.
Are you sure that you are compiling(in MSVC) a release with optimizations turned on, and linking with the right libs?
Kinda silly, but it might be the problem.

I’ve send my small test app to some of you.
By now, none of you have spotted some kind of weakness in my code.
So, It may be a compiler pb. But how ?
I mean, my code simply call a DL…
Anyhow, I’ve yet to find some src with no NV ext that runs above 50fps with 20000 vertices.
Anyone at NV reading this ? Could it be that the GeF2 can perform fast only with NV ext ? (just wondering, not flaming)

No, it is absolutely possible to get fast performance with display lists. I’m not sure what you’re not doing right, but there must be something.

  • Matt

Amerio:

I do not find anything wrong in your program. You are doing one of the worst case for a graphics card: drawing a big amount of small triangles (10240), filling a big amount of the screen with up to 128 overdraw layers (this is like filling the entire screen several times with small triangles), with no vertex reuse, from back to front, with depth testing and no cull at all (all of them are facing the camera). As the triangles have one side 40 times larger than the other you can see that is slower rendering the box horizontally than vertically.
If you get a little away from the camera (setting eye from 3.0 to 5.0) you will find that you get 100 to 300 FPS depending on the orientation.
I have used VTune to see where the time has gone and it is in the driver. With GPT you can see it happens in wglSwapBuffers and this is because it has to wait, doing nothing, until the DisplayList finish to render. If you change the fps() call before glutSwapBuffers(); you will notice a speed increase. You can do more things in this position like moving something and you will notice no speed decrease.
Anyway, if you find something different, please tell me.

This is the GPT info from a 4 sec. session:

Total frames analyzed = 420.
Total Time
(secs) Function
========== ========
3.597312 GL11_wglSwapBuffers (<= notice this is from 4.0 secs)
0.231673 GL11_glCallList
0.033702 GL11_glEnd
0.012075 GL11_glBegin
0.007096 GL11_glClear
0.005875 GL11_glVertex3f
0.005312 GL11_glRotatef
0.004318 GL11_wglGetPixelFormat
0.002468 GL11_glDisable

Number
of Calls Function
======== ========
10080 GL11_glVertex3f
2520 GL11_glNormal3f
1260 GL11_glRotatef
1260 GL11_glDisable
840 GL11_glPolygonMode
840 GL11_glPushMatrix
840 GL11_glPopMatrix
840 GL11_glMultMatrixf
437 GL11_wglGetCurrentDC
437 GL11_wglGetCurrentContext
420 GL11_glCallList
420 GL11_glGetFloatv
420 GL11_glLoadIdentity
420 GL11_glClear
420 GL11_glTranslatef
420 GL11_glBindTexture
420 GL11_glBegin
420 GL11_glEnd
420 GL11_glEnable
420 GL11_wglGetPixelFormat
420 GL11_wglSwapBuffers

It’s my opinion and I would like if any IHV can give his opinion.

I agree with you : the test app is one of the worst case one can think of.
But my goal app is not such a nightmare .
Anyway, it shows almost the same framerate with the same nb of poly (ie around 30fps for around 10k poly drawn).
I looked at the VAR demo at NV site. It runs damnly fast! (yes, a lot of vertex reuse, but…)

About vertex reuse : I suspect few realworld apps will have a lot of vertex reuse : My app is a VirtualReality Engine => lots of objects with around 1000poly each. So vertex reuse is by definition limited…
My goal app shows around 10/20 objects at the same time, each with its own DL (static objects).

Too bad I should’nt use NV ext. But I wonder if it would even bring some speed (ie more than 2x ?)

Apart : Where can I get a free equivalent of VTune / GPT for Win32 ?

I ran the app and very quickly came to the conclusion that it was a fill rate limitation.

Default window size, 45 fps.

Shrank the window to as small as possible, 380 fps.

P3-700, BX chipset, NV10 SDR, 5.22 drivers (development system, so all I care about is stability), vsync off.

  • Matt

Though many games have 1000 poly models, they are actually bodies, so they’re closed and thus not all polies are visible at a certain time. I got 26 fps for 20k polys and I don’t think that this is very bad…

To Matt:
Sorry, but how come my fps rate IS CONSTANT when I shrink down the window to 16x16 on my system ? (30fps or so, 640x480 OR 16x16… no speed up).?)
And it runs at 400fps on yours when you shrink down the window ?
Mmh ? A driver issue ? (yours is 5.22, mine 6.31) A VIA chipset issue ? (you’re a intel addict, I’m a AMD slave)

Such a fillrate pb is easy to track. I spotted earlier in this thread that it wasn’t that (on MY system… )

I do have tried to shrink down, to resize the poly, etc…

To Micheal:
You got 25fps with 20kpoly; I got 25fps with 10kpoly, optimized strips, no textures, no lighting, even tested with perfect geodesiques spheres (ideal case for a striped object). No gain in fps.
Even with a 16x16 window.
And I don’t believe so much in a software only mode (would be under 1fps).

Too bad I can’t send the full app.
But as the test app seems slow to me too…

So okay, all of you tell me my code is not the pb. Might be the fillrate (?). But on my system and my client platform, speed is the same (AMD1Ghz, VIA, 128M, etc).
And yes, I do need all those poly…
I’d like to get a answer on this. Do I ask too much to my GeF2 ? Or is there anyway I could improve the code / geometry (but better than strips ???)

Thanks again ! (hope I won’t get you tired with my pb… )

I don’t know, but maybe the fact that you’ve written it with GLUT can be a bottleneck. I never even had a look at it (GLUT), so I’m probably wrong.

I could run on a different driver but the bottleneck is obvious: fill rate. Drivers would make no difference.

You should make sure vsync is off.

The platform should be irrelevant in this case. You could be using a PCI card on a Pentium 200 MMX for all it matters here, it’s fill rate that is the bottleneck.

When I had vsync on, it only went from 45 to ~75 by shrinking the window. That’s the only thing I can think of.

  • Matt

I’m gonna give up
I’m SURE (is it bold enough?) that VSYNC is off : if I decrease the nb of primitives, I went up to 300fps and more.
Moreover, I get only around 30fps (50 when wind is blowing is the right direction). Should I consider my monitor refresh rate is 30-50 Hz ? Naaaa.
So okay :
-my code is not the pb (you all tell me).
-Drivers are not the pb (I trust you).
-GLUT is not the pb (pure win32 code shows same speed, yes I tested).
-Fillrate is the pb (okay okay, it won’t run faster with so many “large” poly facing camera).
But it suprise me.
I tried setting the polys size to very small,
and speed didn’t increase that much.
So fillrate… (okay, don’t knock my head! I’m just dubious )
Just compare with the VAR tut on NV site (the one that fills all the screen with a waving biiiiiig flag, with 100k poly, at 150fps!)
Please just tell me : “You ask too much for your GeF2” or “You do it the wrong way” or anything… (snif)

(anyway, it’s sad it simply runs FASTER on your comp. without a better computer/card).

Well, if what I see is that it does increase performance, and dramatically so, when the window shrinks, I really can’t help you.

  • Matt


I FOUND OUT ! (well, actually, all credits go to Carlos Abril. Oh thank you!)

Here is an excerpt of his mail:
<<
I found that if you call
glPolygonMode(GL_FRONT, GL_FILL) and glPolygonMode(GL_BACK, xxx) with xxx
different mode that GL_FILL your speed will slow down significantly (even if
you have glCullFace(GL_BACK) and glEnable(GL_CULL_FACE)) so I suggest
setting it to glPolygonMode(GL_FRONT_AND_BACK, GL_FILL)
>>

And it works ! and solves my pb.
In my test app, I actually did this mistake.
And now, speed is around 200fps for 20k poly ! (yeeeees !)
Tried it in my goal app : fps x2.5 ! Yeapee !
(now with textures, 3 lights, lighmaps!)

But it makes me think of it as a driver limitation rather than a GL limitation (just wondering) (as culling is enabled). What do you think of it ? Is it NV specific, or multi-vendor ?

(oh, I’m just so happy by now! THANKS !)