Measuring Vertex Cache Size

I’m working on a program that measures various aspects of your OpenGL implementation’s performance. One thing we’re trying to measure is the size of the vertex cache. Currently I’m trying to do this by iteratively rendering indexed vertex arrays, where the independent variable is the number of vertices referenced by the indices. So first we try using indices of value 0 only. Then indices with values in [0, 1]. Then [0, 2]. And so on. The dependent variable in the simulation is the vertex processing rate. The notion is that once we begin to exceed the size of the cache, you’ll see a noticeable performance drop-off. Unfortunately, it looks like performance is constant, independent of the number of vertices referenced (even up to 1024). We’ve tried more expensive per-vertex processing states, such as 8 spot lights and specular materials, in the hopes of making a vertex cache more useful, but that didn’t change anything. It looks like my Radeon 9800 may optimize out glLoadIdentity() in the modelview and do full multiplication on matrices loadde with glLoadMatrixf(identitymatrix), but I’m still not getting a nice performance plateau and then drop-off.

Does anyone have any ideas as to how I could go about measuring this?


There is no guarantee that glLoadMatrixf(identitymatrix) won’t be optimized.

Use some other matrix.

Step 1 : assume cache size X vertices
Step 2: render stuff that uses the cache perfectly
Step 3: collect FPS into database

and keep doing that assuming cache size is X vertices, where X is 1, 2, 3, 4, 5, …

Make the window small or cull away all the polygons by rendering them facing away from you.

You could throw in a non-identity texture matrix into the test.

That’s exactly what I’ve been doing, but the measured results have been totally bizarre. On all nvidia cards I’ve tried (QuadroFX 1000, GF3, GF4MX), it doesn’t seem to matter how many vertices you hit… On my Radeon, the only difference I’ve been able to see is if I use glLoadMatrix as opposed to glLoadIdentity. It doesn’t matter (in this particular case) what is in the loadmatrix call. I agree that drivers could optimize away identity matrices no matter how they’re specified, though.

Is there any vendor documentation on exactly how vertex caches work?

Found these two lines in nvidias tristrip-library:
//GeForce1 and 2 cache size

//GeForce3 cache size

Originally posted by namespace:
[b]Found these two lines in nvidias tristrip-library:
//GeForce1 and 2 cache size

//GeForce3 cache size
#define CACHESIZE_GEFORCE3 24[/b]
That`s interesting and sort of make sense, since the cards of the late 90s had a v-cache size of 4 and some 8.
We can assume the FX generation has 32.

aegis, you might want to look at one of DX tools called MeshViewer (or something like that)
It comes with the DX SDK, and you can have it do some tests for you, then optimize the mesh.

You mean what the cache scheme is? What the cache line size is?
I don`t think any company documents this.

Are you locking your vertex arrays? I seem to remember that the vertex cache doesn’t get used unless the array is locked (and indexed).

You could also try VBOs.