Polygon Armies - fast ways to render hundreds of models


I was wondering what the best approach to drawing hundreds of the same (animating) model might be?

My initial niave implementaiton was based around my simple model manager. For each instance of the model it would send the same instruction list to openGL. That is binding the texture, setting up vertex/uv/normal array pointers and then calling drawelelements etc.

Obviously not the most suitable method (I assumed) but it achieved relatively good performance.

My next step was to cut out all the unnecessary state changes as nothing actually changed (bar the model matirx) between each render of the model (since its the same mesh, same texture etc). I figured that should show some performance benifit, but was suprised to discover it didn’t. The only other difference was for this test I use gltranslate to arrange each instance of model into a grid.

So now i’m left wondering how to get better performance. It must be possible, as i’ve seen plenty of other games do similar things and get better performance on my system. I was wondering about VBO’s, but since it would have to be read and write i’m not sure if they would actually help?

So I need a method of displaying many meshes (all the same, same texture etc), where the mesh will change each frame (via interpolation - no bones) as quickly/efficeintly as possible. The meshes in question have about 400 polygons, single skinned texture and keyframed vertices which I interpolate. (actualy its the old MD2 format) if that helps.

I’d appriacte any idea’s on this.


The second case you described should give you sufficiently good results about performance.

As far as I remember MD2 required you to make many calls per model to display its whole mesh - it makes a little optimization using triangle fans or triangle strips depending on case where it’s more suitable.

Assuming you converted MD2 format to such format that it would require just one call per model this should be ok.

You could also use display lists or VBO, but VBO should not really speed you up since using traditional array pointers you send your arrays just once, later reusing them for all models.

Anyway - what game has such “hundreds” of animated models at the same time?

[This message has been edited by MickeyMouse (edited 12-17-2003).]

if statechange reducing doenst affect the preformance i would think you’re CPU, Tranformation or fillrate bound…

how do you send the vertices? by glBegin()/glEnd() calls?
does it become faster if you turn off the animation?
do you really see all hundreds of models on the screen at the same time?
what if you make the window smaller, does that increase the framerate?


It should be faster, but doesn’t appear to be. The MD2 model i’m using is a single mesh, but not stripped, just sending it as triangles. Mainly cos i’ll be moving away form the format later on.

So VBO’s are unlikely to help then ;(

Have a look at Rome: Total war. i’m pretty sure they use imposters too, but from the screenshots , movies and the TV show in th UK that uses their engine it looks like they do have hundreds of models on screen at once too. Of course i’m not aiming for anything as complex as that at the moment


i did wonder about fillrate, although i’d be suprised. i’ve only tested on my GF2, but in the middle of moving today, so i’ll test my Radeon9800 later and see if there is change between the two methods.

Now idea how I send the vertices actually. I’m intercepting glcalls to the dll (long story).

For testing purpose i have a 10x10 grid of the models, so all 100 are present on screen (taking up about 1/4 of the screen area, but oots of overdraw naaturally)

I’ll test disabling the animation, but I think that was quite fast.

thanks for the replies

VBO should help here as well… you shouldnt read from them anyhow, you can read from systemmem, interpolate and just store in VBO (probably with the stream_draw flag on them) since this gives the driver the possibility to put it in agp and let the card pop the data when needed instead of waithing for the CPU to push it, and depending on the driver implementation, you can get different ARP memory adresses to write on ( this is if you specify write only) so an unrendered VBO can still be in the AGP memory while you fills it again ( since the driver actually gives you a different one) and increase parallellism in the program.

Noisecirme :
use vertex programs and VBO to render all your data.
This is the only way you’ll achieve max perf for morphing meshes.


Profile your app to figure out where you’re limited. Then fix that.

You may be CPU limited. Fix the hotspot. If you’re CPU limited in the driver, don’t call the driver as often, or make sure you get on one of the fast paths.

You may be vertex transform limited. If so, there’s not much you can do except use LOD.

You may be state setup limited. If so, then sorting states helps. If you have spare CPU and memory bandwidth, using that to cluster nearbly models into one big vertex buffer, positioning/orienting them in software, and then submitting them as a chunk would help.

You may be fillrate limited. Try drawing into a much smaller window, and see if the frame rate improves.

You may be vertex submission limited. Submit as many vertices as possible in a single call (DrawRangeElements(), ideally). Make sure vertices live in efficient memory (NV_VAR or ARB_VBO memory).

You may be vertex data type conversion limited. Submit vertices using hardware-native data types, which means GL_SHORT or GL_FLOAT on most hardware, and GL_UNSIGNED_BYTE for color data.

when the submission of vertices is a bottleneck, then do not submit so much vertices!
ok, this may sound stupid, but it itsn’t.
the first - very simple - optimization, is to modifiy only the nearby meshes evey frame, and to modifiy the others every n-th frame.
this is reasonable when you have allready a good framerate. you can also priorize the animations, so an idle-character which is only standing around do not need so much animation detail like an moving or fighting character.(in my case it was a moving or talking/trading character)
but when you can use vertexprogramms, you can do following trick: instead of submitting one vertexset for frame n, you submit frame n and the vertexset for ie. frame n+5. during display you interpolate between this frames with a vertexprogramm, and because they are allready in videomem, you don’t need to submit them for 4 frames!
if your army will not move syncronusly (=all characters, the same animation, with the same timing) this will improve your speed very much.