Vertex Array Range performance blues

I am trying to get nVidia’s VAR extensions to work but I don’t see much of a performance change compared to the traditional glDrawElements usage. I don’t know what I am doing wrong here.

I am using the XP 40.72 WHQL drivers (OpenGL version 1.4.0) and have a GF4 Ti 4200 running on a P4, 256 Meg RAM

briefly, what I do is:

  1. At startup:

i) Allocate memory and setup with:
varmemory=wglAllocateMemoryNV(size,0.0f,0.0f,0.5f); // size,0,0,1 almost no effect

ii) One time setup:

  1. For each frame:

i) Construct dynamic geometry in standard memory (ie: console characters). In this case, each “character” is simply 2 triangles from 4 vertices arranged in 0,1,2,2,1,3 index order (like a quad).

ii) Copy the block of data from i) to an unused portion of the “varmemory” block allocated in step 1) above (in a sequential manner).

iii) Set up the pointers:

The data is all 4 byte aligned, the stride is 36

iv) Draw the geometry:
(Blending is enabled, 1 texture only, “elems” is of type int and is in standard memory. I am not using GL_INDEX_ARRAY.)

I am not mixing any other rendering calls with the VAR extension so the GL_VERTEX_ARRAY_RANGE_WITHOUT_FLUSH_NV has no effect.

The results are the same whether windowed or full screen - about 100 fps with 10K triangles. (Actually, that’s only 1 million tris which is not great in either case). I would assume there should be some larger difference with the VAR extensions if I were doing things correct.

PS: I am working on the console first - this is in ortho mode but I did not think that would affect the performance.

Any help would be greatly appreciated.