VAR perfomance - need some help

I have a problem. Ive used NV_fence NV_vertex_array_range extensions in the same way as it is used in NV OGL SDK (basic var demo), but perfomance only a little greater comparing to OGL VARs. Ive used wglAllocateMemoryNV for an object with 20000 triangles and I`ve got only 320 fps on GF3 and 220 fps on GF2Ti. Is it Ok?

It semms that I`ve cheked everything, but I think that perfomance could be greater.

Anyone can say smth on that?

6.4megatris/s on geforce3. whats wrong with this? how much do you want? (you can browse this forum about such topics, there are posts how to get the last out of the VARs…)

Don’t be upset about getting “only” 6.4 MTris/sec if you’re already running at 320 fps.

Instead, try throwing more and more polygons at the card, until the framerate drops at least below 100 fps, and then measure your triangle rate again. If it’s still low, THEN you can start looking for ways to improve it.

– Tom

yeah, how much fps you get when you draw nothing?

Just a thought, but make sure you’re not using VAR memory to store the indices of your triangles - this kills performance. Keep the indices in cached system memory - ie. m_pIndices = new unsigned short[2000]
or whatever. To reiterate, don’t use memory provided by wglAllocateMemoryNV for indices, use it for every other array but indices.
Sorry if you know this already.

Hi all!

To get the difference you have to submit more triangles. Try it again with 200.000 triangles.


aliasman, you should try a scene which has a fillrate need near to zero, so output for example tons of terrible small cylinders with an extreme amount of vertices per cylinder, because the Gf3 is far more fillrate limited than everything else. Also get your framerate down to 50 or less, because also with tripplebuffering you’ll anytime reach an FPS limit and get wrong results through this.
Another thing I detected is that as less vertices can be culled as less the improvement of using AGP mem shows effect. Try videomem (1.0), this should in general be far faster than AGP and system memory and I got on my snaily 600 MHz Athlon 23 Million polygons on my Gf3 with VAR, 20 Million on my Radeon 8500 with VAO.


You really don’t want to be using video memory instead of AGP in “real life” because in video memory, your polygon data will fight with your texture data for bandwidth, whereas putting geometry in AGP and texture data in video memory makes them share nicely.

Also, triangle rate IS dependent on the size of your vertex data. If you’re sending four floats for position, three floats for normal, a longword for vertex color, and four floats for two sets of texture coordinates, your budget is 48 bytes per vertex. If you send three SHORTs for position and nothing else, your budget is 6 bytes per vertex, and memory throughput won’t get in the way as much :slight_smile:

Last, to make sure you’re not fill OR SETUP limited, try glCullFace( GL_FRONT_AND_BACK );

Culling front and back doesn’t ensure that you are not setup limited. It only ensures that you are not limited by anything past setup.

  • Matt

Originally posted by jwatte:
If you send three SHORTs for position and nothing else, your budget is 6 bytes per vertex, and memory throughput won’t get in the way as much :slight_smile:

Then it truly is a small world, jwatte!
(approx 32768 units in radius)

P.S. I know you were joking


I’m not joking. Most meshes will render fine with 16 bit resolution. You make up the positioning by sending the modelview transform matrix as floats.

Really, if you want benchmark-style numbers, you have to be REALLY conscious about how you push that data.

And Matt’s right, culling is of course done in set-up, so CullFace() is not a good way to figure out whether you’re setup bound or not. Perhaps if you can add one instruction to your vertex program and there’s no change in speed, you’re set-up limited. Doesn’t seem likely to happen for most real shaders, though :slight_smile:

I see what you mean…mmm, this is food for thought.