Performance VBOs on Nvidia with GL_UNSIGNED_SHORT

Mars_999 · December 2, 2007, 12:44pm

I am trying to move from unsigned int to unsigned short for the IBO values. Now I have been told that short will give me 2x the performance over int??? I can see 50% less memory, but so far I haven’t seen any performance increase, but from what I can tell I am shader or texture bound. What I have read on Nvidia’s page using unsigned int is ok, but if you use glDrawRangeElements() the driver will convert 32bit to 16bit if the range is small enough for you? Can anyone verify this? So if this is the case I can assume I am getting the best performance I can without recoding my renderer to use a 16bit IBO, as long as my range is < 16bit?

Thanks

Zengar · December 2, 2007, 1:16pm

Best performance is to use unsigned short AFAIK.

Humus · December 2, 2007, 2:25pm

Yes, but expecting “2x the performance over int” seems like expecting a lot. Using shorts saves memory and some bandwidth; however, you are much more likely to be bottlenecked by vertex bandwidth anyway, rather than index bandwidth, so the save is relatively small. It’s usually much more worthwhile to spend time looking at packing data in the vertices, like passing position as shorts and normals as bytes instead of using floats for everything.

Jan · December 2, 2007, 2:28pm

Yes, i’ve done that. Using unsigned short instead of int gives indeed huge performance boosts and is definitely worth the effort (ie. splitting meshes into pieces with max 2^16 vertices).

Of course you will only see the speed-up, if it is a bottleneck. With 2.2 million triangles (5.5 million vertices) my app has nearly only this bottleneck. When you are shader bound, it might be much less. But hey, one bottleneck less is a good thing.

I use the non-range glDrawElements, since computing the range is non-trivial, in my usage scenario and with only up to 20000 triangles to render in one batch that’s not so important anymore, anyway (ATI seems to ignore the range anyway). Whether glDrawRangeElements will convert some information, i don’t know, but it will definitely be slower, than if you have the data optimized from the start. I would not rely on it, also ATI won’t do it.

When i did that, i found a driver bug for the Geforce 7 (only), that made the whole OS crash, when one used indexed data, with indices in a VBO and more than 2^12 indices. Don’t know, whether that is fixed already, but it was a good reason for me to convert the data to unsigned short.

Hope that helps you,
Jan.

Mars_999 · December 2, 2007, 7:25pm

So the consensus is use 16bit if possible, but may not make a huge difference, unless the bottle neck is Vertex processing.

Mars_999 · December 3, 2007, 1:13am

Well I have it working with unsigned short now and see now delta in FPS. I guess I can now, know I am not wasteful!