Vertex Array optimisation.

gaby · December 18, 2000, 10:44am

I have noticed that the most efficient method to have the best performances on all HW is to use display list. Is it true ?
If yes, I’ve made an efficient LOD or Multi Resolution Mesh system : it’s efficient because it permit to reduce triangle number with small computation. But I’ve seen that small code using display list are displaying a lot more triangles than my engine. It’s logical, because they break the bus bottleneck and use efficiently T&L.
My hope is to reach the same triangle count under my MRM algo. My MRM need to send separate triangles to OGL.
My question is how can y do to send separate triangle with a faster method than setup vertex array (glVertexPointer(),glTexcoordPointer…) and build index array to send to glDrawElements ?
I’ve understand that when building the Display List open GL is using on board memory to store vertices data. How to build such HW vertex array to use with a per triangle display ?

Gabriel RABHI / Z-OXYDE / France

system · December 18, 2000, 7:30pm

glVertexPointer() and friends (and then using
glDrawElements()) is the fastest way to send
data to the card. However, if you use the
NV_vertex_array_range extension, you can
allocate memory in AGP memory, or on the card,
which you can then reference using glXxxPointer().

A word of warning, though: because this memory
is not cacheable, it’s very important that you
fill it sequentially so the write combiners
can take on the task that the cache no longer
does.

gaby · December 19, 2000, 12:44am

1-I don’t understand what is the pb of caching. How must I do ?
2-Is this the most efficient way to store vertices data directly on the card memory ? This will be optimized as a display list ?
My question is simple : in display list I think the driver is building compact one bloc information for each vertex, that contains all components, which is more cache efficient. While we use separate on board arrays, coord, tex coord, normal, color are stored at different places for each vertex, no ?
So, this is not the fastest method to optimize the data access ? What about compiled arrays (what is it) ?

Thanks in advance for your explanations !

Gaby

system · December 19, 2000, 6:02pm

NV_vertex_array_range will not build a
display list. However, chances are that
display lists use NV_vertex_array_range
(or the underlying primitive) in their
implementation.

If you want to store data coherently as one
stream, then you can use the stride argument
for glXxxPointer(). However, the GPU cache in
HT&L cards like the GeForce2 is sufficiently
N-way to allow you to store the different
streams in different locations with no
performance impact (at least that’s what I’ve
been led to believe).

As far as writing data sequentially to the
memory allocated through NV_vertex_array_range,
do a web search on “pentium write combiner”
and/or “non-cacheable memory” to find some
reference data; it’s a pretty big subject.