Optimal Vertex Array Size

Hi,
At the moment I’m involved in producing an engine as part of a contracted project. The program needs to draw about 20-30 meshes whose polygon count totals around 100,000. I wondered if anyone has any speed up suggestions.

I'm seeing 7-10fps on a Athlon 700, GeForce 256 machine with 256Mb ram, running win2k. I use compiled vertex arrays composed like this:

Vertex - 3 floats
Normal - 3 floats

We will shortly be adding 2 float texture coords, and probably 4 byte colour values per vertex as well.

Lighting is enabled, as is the alpha and depth test (although alpha 90% of the time always passes).

I have a feeling it is the transfer speed that is slowing things down. I tried doing backface culling on the CPU, but didn't see much of an improvement at all.

At present I am considering two things - triangle strips, and splitting the meshes up into smaller sizes.

Before I get going on the second, I'd be interested to hear (esp. from anyone who actually knows the answer) what people think the right kind of size for vertex arrays is. 

Plus any other speed up suggestions will be greatly received.

Thanks in advance,

Henry

Well, I’d first suggest trying to determine where the bottleneck is.

First, you should make sure it’s not fill rate before you go off on geometry optimizations. If you shrink the window a lot, what happens?

You should make sure it’s not GPU-bound. This is a risk if you have a lot of lights or have other expensive features on. The usual suspects: local viewer should be off, lights should be directional if possible, and the number of lights should be minimized, if you want the best lighting performance.

Shrinking your vertex arrays will only help in the case that you are getting poor cache utilization with too large a working set. I don’t know if this is a risk for you.

You might try using a display list if the data is static.

You might try using VAR.

Triangle strips are DEFINITELY good, especially if the strips are long enough.

Consider: you have 100K vertices and are getting 7-10 fps, so you’re doing 700K-1M triangles per second and thus 2.1M-3M vertices per second. If you could use strips and reduce the number of vertices to 1.2 per triangle (from 3), if you sustained the same vertex rate, you’d get 1.75M-2.5M triangles per second.

  • Matt

Hi,

Thanks for the reply. Although I forgot to mention it, the app isn’t fill rate bound - like you say, window size doesn’t make any difference (until we get to the very high resolutions).

There’s only one light enabled, and local viewer is off. Disabling the light gives us another 2-3 fps.

Unfortunately, the data changes every frame (skeletal animation), so display lists are out. I looked at VAR, and want to make use of it, but I want to get non-VAR performance up first. Although it’s a pretty useless comparison, the VAR demo without the extension enabled gets about 3x the performance. This is probably triangle strip related, I guess.

(Incidentally, W2k runs 2-3fps slower than NT4.0 - someone told me this was an AGP bug in 2k. Any light to shed?)

Thanks a lot,

Henry

I wonder if you’re getting AGP 1x in Win2K and AGP 2x in NT4… yes, Win2K does have some AGP bugs, although I don’t know what all of them are, and we’ve worked around most, if not all, of them by this point.

Yes, it sounds like triangle strips are the first thing you should try.

Another possibility is that the VAR demo is getting vertex cache reuse and your app isn’t. I can’t really go into more details on a public forum, but it both saves you bandwidth and improves effective T&L rate (same number of vertices per second, but more triangles per vertex, up to a limit of 2 triangles per vertex, whereas strips can only get you 1 triangle per vertex). If you want any more detail on that, you’ll have to email me… the short is that it very well might mean that switching to VAR with independent triangles might be faster than regular rendering with strips.

  • Matt

Nvidia mentions an optimal array size of 64 Kb ( higher values => cache problems ). With 6 floats per vertex, it means you shouldn’t put more than ~2700 vertices in an array. I’m not completely sure about this, though, since i heard a guy from Nvidia saying 10,000 vertices was still ok. Still wondering who’s right. But i would definately not count on optimal performance with 300,000 vertices.

Also, may i suggest the use of compiled vertex arrays ( CVAs ), or/and to allocate the vertex array directly in video memory ( or AGP mem ) ? There is an extension for that. There is also a document on Nvidia’s papers which explains how to optimize an application for the GeForce.

Y.

Hi,

Thanks for the replies - Matt, I’ll e-mail you shortly about the cache coherence stuff, if that’s ok.

I’m using CVAs at the moment, although I don’t see performance gains with them - prompting me to think that my vertex buffers are just too big for the card to handle quickly. I’ve just built in some profiling code on the renderer to see exactly how large the buffers are, and I’ll chop them down if I can.

Cheers,

Henry