Strange loss of performance

I have lately used (to find this error) a crippled version of the engine.

  1. Lights are off and remain so. No materials are set. Only thing I do for each instance is that I set the modelview matrix, either by multiplying with a matrix or by calling glTranslate (tried both). No fog. And as far as I can tell, there is no other state change in there either.

  2. No textures in the crippled version.I am sending texture coordinates but I am not using them.

  3. I am not able to test that at the moment but I doubt it. I do not think I should reach 50M vertices/sec using vertex arrays.
    50M*(3+3+2)*4 bytes per second! That’s 1.5 Gbytes.

AGP 8X can theoretically push 2.1 gigabytes per second.

The geforce 3 I am using is on a AGP 4X.

[This message has been edited by neomind (edited 01-21-2004).]

Well if you switched to display lists, make sure all your scene is visible by your camera. The NVidia drivers perform frustum culling on display lists, so it can mess up your benchmarking figures.

If i follow you, you’re now drawing 3 millions triangles per frame ? What framerate do you get ?

Y.

With 300 of the 10000 triangle objects I am getting a framerate of about 5-6 fps. I frustum cull the objects before sending them to rendering so it should not mess with my benchmark figures. Also, most of the scene is visible at all times.

Originally posted by neomind:
[b]I am rendering the display lists like this:
*glGenLists
*activate textures and array pointers
*glNewList
*glDrawElements
*glEndList
B]

You are only generating the display lists once, before the main render loop, right?

What happens to performance if you comment out any changes to the modelview matrix?

Personally I think you should just use VAR/VBO. No point worrying about performance unless you are using the optimum method. If you then still have a performance problem then its worth investigating. I haven’t found display lists that fast for rendering small amounts of geometry many times.

You are only generating the display lists once, before the main render loop, right?

Yes.

What happens to performance if you comment out any changes to the modelview matrix?

I think I have tried this with no change, but I will check it again later.

Personally I think you should just use VAR/VBO. No point worrying about performance unless you are using the optimum method.

The reason why I want vertex arrays (and display lists) is because I want to be able to run the game even on systems with few extensions, for many different reasons. I will use VAR or VBO in the final version and have vertex arrays as the fall-back. And as it should also run on MacOS X and Linux it restricts my choices even more (no wgl).

[This message has been edited by neomind (edited 01-22-2004).]

It reminds me of a file i found on the nVidia web site called BatchBatchBatch.pdf by Matthias Wloka for a GDC.
In a nutshellthe more important is the number of batches you send and not the number of triangles in it.
According to its numbers you need to send more than 130-200 triangles per batch to not be CPU limited. As you send 550 you are above this limit but the raising in performance when you increase the number of triangles per object seems logical. 300 batches with low triangle count must be a high number for GeForce3 ???

I don’t have the link but it should be easy to find on their site.

[edit]By reading it again it gives numbers equivalent to those you get. Please post the link to this file if you find it for future readers.

[This message has been edited by Joel (edited 01-22-2004).]

http://developer.nvidia.com/docs/IO/8230/BatchBatchBatch.pdf

That article was enlightening. It seems to explain most of the performance problems I have (or at least the general characteristics of them). It was certainly something I didn’t know, I have always thought that the cost of a batch is quite small.

Thank you for pointing this out. I wonder if the cost of a batch is smaller with VARs or VBOs?

Originally posted by neomind:
[b]I wonder if the cost of a batch is smaller with VARs or VBOs?

[/b]

As many have said it should be, as it is easier for the driver to fill the pipe (no/faster copy…). But as i didn’t do any tests i can’t really say. The one in best position to tell us what the gain would be (apart nVidia guys) is probably you, if you bench it

i dont know if VBO will help in solving the glDrawElements call overhead. Im using VBOs, but calling it becomes a bottleneck over a few hundred models. I can only achieve about 6Mtrianlges/sec with 1000 objects with <100 faces, but reach 10-14Mtri/sec with 3 models each having a million (!!!) faces. (using radeon 8500LE with separate arrays with 24byte vertex data unaligned).

That article was enlightening.

I thought so, too. Until Cass told me this applies to Direct3D only.

Oh, well. Guess I can’t do much more than to implement VBOs and see if there is any difference. I’ll benchmark using vertex arrays, display lists and VBOs and post to the forum.