VBO slower than immediate mode when Lighting is ON

EG1 · September 12, 2003, 5:26am

I’ve encountered what looks to me like a performance oddity: with vertex lighting ON, immediate mode rendering is slightly faster than VBO or BuildList, with lighting OFF, the situation reverses.

By “immediate” I mean specifying triangles with glBegin/glNormal/glVertex/glEnd, while “BuildList” use a build list made from the “immediate”, and “VBO” uses straight indexed vertex arrays and VBO buffers.

Here are the figures I get (GF3, Det 45.23, AXP 1800+) for 28000 triangles in a tristrip (all visible, none gets culled), only one omni light is in the scene:

Lighting ON:

immediate mode : 220 FPS
buildlist/VBO : 200 FPS

Lighting OFF: (aka glDisable(GL_LIGHTING))

immediate mode : 340 FPS
buildlist : 520 FPS
VBO : 510 FPS

If the triangle rate with lighting OFF looks not too bad, with lighting ON, the VBO/BuildList performance is somewhat depressing… Any idea why VBO would perform slower than immediate mode calls when the only difference is lighting being ON?

TheSillyJester · September 12, 2003, 7:15am

Try on a ATI board or with Detonator 44.03.

al_bob · September 12, 2003, 7:31am

How are you creating your VBOs? How are you managing them? How do you do your rendering?

Show us some code, or at least, give us more details than fps from some unknown program.

EG1 · September 15, 2003, 3:38am

The VBO code is the vanilla one, initialized with

glGenBuffersARB(1, @vboVerticesBuffer);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboVerticesBuffer);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, vertices.Count*SizeOf(TAffineVector), vertices.List, GL_STATIC_DRAW_ARB);

glGenBuffersARB(1, @vboNormalsBuffer);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboNormalsBuffer);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, normals.Count*SizeOf(TAffineVector), normals.List, GL_STATIC_DRAW_ARB);

glGenBuffersARB(1, @vboIndicesBuffer);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndicesBuffer);
glBufferDataARB(GL_ELEMENT_ARRAY_BUFFER_ARB, indices.Count*SizeOf(Integer), indices.List, GL_STATIC_DRAW_ARB);

and executed with

glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboVerticesBuffer);
glVertexPointer(3, GL_FLOAT, 0, nil);
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vboNormalsBuffer);
glNormalPointer(GL_FLOAT, 0, nil);
glBindBufferARB(GL_ELEMENT_ARRAY_BUFFER_ARB, vboIndicesBuffer);

glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_NORMAL_ARRAY);

glDrawElements(GL_TRIANGLE_STRIP, indices.Count, GL_UNSIGNED_INT, nil);

and finally the immediate mode code is

glBegin(GL_TRIANGLE_STRIP);
for i:=0 to indices.Count-1 do begin
k:=indices[i];
glNormal3fv(@normals[k]);
glVertex3fv(@vertices[k]);
end;
glEnd;

Note: “classic” vertex arrays and VBO have the same exact performance as soon as there is a light ON, i.e. slower than immediate (despite the fact that “immediate” makes thousandths of calls).

knackered · September 15, 2003, 3:48am

Is that Visual Basic?
If it is, then maybe there’s something going wrong because of the way the pointer parameter is abused in VBO…? Just a thought…don’t know jack about visual basic, but I gather it doesn’t use pointers, so maybe it’s interface with a c dll gets messed up in this extension.

EG1 · September 15, 2003, 6:12am

>Is that Visual Basic?

That is Delphi code, interfaces OpenGL in exactly the same fashion as your C code, with pointers etc. - though from experience I guess my ‘for’ loop is compiled more efficiently than your C equivalent

Btw, on the “classic” vertex array performance, I’ve an addendum: as long as no VBO call of any kind has been made, performance is similar to “immediate” (and even slightly faster). Once VBOs have been used, the performance of “classic” vertex arrays matches that of the VBOs when lighting is ON (ie. slower, even though the VBOs have been disposed of).
With lighting OFF, classic vertex arrays are faster than immediate, but not as fast as VBO (as can be expected).

knackered · September 15, 2003, 6:34am

jesus, why mess with perfection.

system · September 15, 2003, 7:43am

Could it be your lighting?
Nvidia wants you to NOT use 2 sided lighting, otherwise you hit a software path.
I think that`s it and the rest should be entirely hw accelerated.

The next suspect is the driver. I think the newer ones are better tuned for the FX cards at the cost of older ones.

Korval · September 15, 2003, 9:17am

This is just a guess, but perhaps nVidia’s hardware doesn’t line integer indices? Try using shorts and see what happens.

EG1 · September 16, 2003, 12:17am

TwoSidedLighting isn’t used, and faceculling on/off has no impact of performance (as expected, all triangles being visible).

Shorts gave me no performance delta, same exact performance figures
(the actual meshes can end up with more than 64k vertices in a chunk, so short indices wouldn’t have been a convenient solution anyway)

I’ve made another test (not willingly at first, but results were interesting): I fired a proggy that ate 100% of CPU time (a math calculations thing), then started the bench. Immediate mode performance dropped to about 170 FPS each time, while VBOs went down to 50 FPS… meaning that VBOs are tranformed/lit on the CPU side???

Well, gotta wait for next driver release and hope for an improvement…