Immediate Mode vs Vertex Arrays vs VBO

I’ve bumped into a little performance oddity (?) when testing transfer rates on my GF3 (56.72 drivers), with immediate mode coming ahead of vertex arrays in a situation that should be T&L limited… Any explanations?

“Immediate mode” loop looks like:

   
   glBegin(GL_TRIANGLE_STRIP);
   for i:=0 to indices.Count-1 do begin
      k:=indices.List[i];
      glNormal3fv(@normalsList[k]);
      glVertex3fv(@verticesList[k]);
   end;
   glEnd;
 

“Vertex array” code looks like:

   
   glEnableClientState(GL_VERTEX_ARRAY);
   glEnableClientState(GL_NORMAL_ARRAY);

   glVertexPointer(3, GL_FLOAT, 0, vertices.List);
   glNormalPointer(GL_FLOAT, 0, normals.List);

   glDrawElements(GL_TRIANGLE_STRIP, indices.Count, GL_UNSIGNED_INT, indices.List);
 

Exactly the same vertices, normals and indices are used in both cases. There are 130k triangles sent to the driver, all are very small on screen (I tested by scaling everything down to 1 pixel, framerate was similar), about 30-40% of the triangles aren’t rendered due to culling.
Vertex lighting is active, with a single light (but turning it off doesn’t change the shape of the results).

“Immediate Mode”: 122.5 FPS
“Vertex Array”: 115.6 FPS

(Wrapping “immediate mode” in a display list gains about 0.3 FPS)

When using a STATIC_DRAW VBO, initialized once, I gets 123 FPS, but when respecifying the data every frame (to be in the same configuration as the other two approaches), framerates plummets down to 30 FPS (STREAM_DRAW) or 38 FPS (STATIC_DRAW), and less for other modes.
(data in the lists wasn’t accessed or altered at all during the tests, so I’m not getting a read from AGP or Video memory).

Any idea what could be the limiting factor?
In the final use case, those vertices and normals will change every frame.

The problem with VBO and respecifying the data every frame sounds like a problem with parallelism.

It could be that when you overwrite a VBO after the rendering call you force the GPU to finish rendering at least up to the point where the data isn’t needed while with normal vertex array or immediate mode the data is copied to another buffer before rendering so you can overwrite the old data without syncing the GPU.

At least I think this sounds like a logical explanation. Can anyone with more insight into driver internals confirm/deny this?

>At least I think this sounds like a logical explanation.

Indeed, and I’ve just tried, by set of VBO being specified in one frame, and then used in the next frame… this increased the framerate from 38 FPS to 39.5 FPS… so there is something else holding VBOs (and regular vertex arrays) back. :confused: