Speed of dynamic triangle soup

Lets say I have 20.000 triangles that gets updated 50 times per second. What is the fastest way to render these triangles 5 times? I want to render them using 5 different materials etc…

Would it be faster to generate a display list the first time and then update them using display lists 4 times or should I use vertex arrays 5 times ?

If you use VAR, I’d say go for vertex arrays. Other than that, just try out both versions and benchmark which one is faster

If they are dynamic, and even if you will be rendering them 5 times, perhaps constructing display lists won’t be such a good idea.

Constructing a display list each frame, is a no no!


Why is it no no ? If I shall render the same contents 5 times using different material etc. , the driver could use the on GFX memory to store the triangle data, so I would logically benefit from using display list. I would not have to transefer the same contents 5 times from CPU to GFX mem

Actually building the display list in the driver certainly aint for free …

Display list compilation can be exceedingly slow. It’s definitely not designed to be used per frame, even though you are reusing it 5 times per frame. Display lists are designed to produce the fastest possible rendering performance, while not caring much about compilation performance.

As a point of reference, I have an app that generates display lists for models with on the order of 300,000 triangles. The display list compilation time for this is around .75 seconds. Adjusted down to 20,000 polygons, that would yield a max of about 20fps. Of course, my hardware is likely different from your own, or your target hardware, but you get the idea.

If you don’t want to use VAR, use LockArraysEXT(). That’s what it’s for – submitting the same geometry more than once.

However, some implementations of LockArraysEXT() cut off any optimization at a fairly low number of verts (1000? 4000?) so you might want to slice it in a bunch of little arrays, iterating each material for each slice. Benchmark to be sure.

Imagine what we could do together !

great response…

If i have about 10.000 vertices. Will this not be accellerated by glLockArraysEXT ?

I do not get any speedup when the size is this big. Could you possible set this value max limit ?

Will i benefit from dividing the trangle soup into smaller chunks of lets say 1000 vertices and the do lockArrays and then draw each subchunk several times

enstead of drawing the complete soup multiple times. The data will not change between each draw.

so instead of drawing 5x10000 i will draw 5x10x1000

>>Will i benefit from dividing the trangle soup into smaller chunks of lets say 1000 vertices and the do lockArrays and then draw each subchunk several times<<

display lists and vertex arrays need to be keep under a certain limit
-eg from memory, my testing on my tnt2 DL’s of 65kb will run about half speed of one 63kb.
-also check the drawrangeelements exten (or something) u can do a glGet…() to get the maximium recommended number of vertices/indices that u can pass in vertexarrays call. going above this limit will results in speed loss (sometimes quite a bit)
IIRC the tnt2’s limit was 4096 for both

I am a bit confused…

I don’t get any sppedup of the glLockArraysEXT when i use it with my glDrawArray repeated 5 times.

The test scenario

1 .Setup vertex buffers. Only use float vertices and matrix weights. Use test data with 4095 vertices

  1. Lock buffers.

  2. use glDrawArrays to render the same buffer data 5 times.

  3. unlock buffers.

Now why don’t I get any speedup using locked buffers ?? I mean. The glDrawArrays only need to transfer the vertices once…

Matt ? Any comments…

drawarrays! so theres no shared vertices.
try drawelements
also try the quake3 format IIRC 4fV 2fT 2fT 4ubC even though nvidia said other formats will work (in the start this was the only one that did) from my testing a year or so ago i still found the quake3 format gives the best results.
check also oldish performance pdf’s at the nvidia website something along the lines.

The problem is that I have no time to create shared vertices. The buffers can not be indexed…

Won’t the glLockArrays accellerate multiple drawarrays after each other with a locked scope ?

LockArrays will help on non-T&L cards, and maybe help a little bit on some T&L implementations (such as 3dlabs?) but it doesn’t help you on GeForce hardware, AFAIK.

Also, are you sure that you’re vertex throughput bound? If you are, then perhaps making the vertices smaller (shorts instead of floats) will make it go faster. If you’re not vertex throughput bound, then the size of your window will have a great impact on speed.

Anyway, for the best vertex throughput, you have to use GL_NV_vertex_array_range on nVIDIA hardware. Them’s just the ropes.

LockArrays is still better than not using it. (It lets you get the same benefits as DRE.) The only reason to use LockArrays over DRE is if you plan to render multiple passes with exactly the same arrays, i.e., multiple DE’s within a single lock.

  • Matt

As you can see in my explanation , I am using it for multiple passes within the same lock. However why don’t I get any speedup. Is it because I use vertexWeights ?

You said DrawArrays – we completely ignore LockArrays for DrawArrays.

  • Matt

Why ?

If i send the same array data multiple times, you must be able to use AGP mem or simillar to accellerate the same data ???

I beleeive it is vital in cases when you are unable to use indexed geometry but you want to use multpass or send same data multiple times to accellerate drawArrays !!

The what can I do if I wan to resend the same buffers multiple times but using different model matrixes ,to get higher speed ?? I am not able to use idexed geometry…