ATI: Problems with VBO

[This topic was originally posted on rage3d, but I didn’t get any reply, so I’m afraid I have to ask it here again.]


I’m running into problems with VBO on a Radeon Mobility 9000 and Cat 3.4 (modded to work with M9). I switched from VAO to VBO in my App, but the performance is not what I expected. So I decided to go for a simpler test case:

I’m getting the expected performance with VBO in most cases, but I’ve got one case where performance is disastrous.

If I use a vertex program that uses generic attributes and VBO ( glVertexAttribPointerARB) I’m getting frame rates that definitely indicate that something is going absolutely wrong. When I switch to a VP that uses conventional attributes (like vertex.position and uses glVertexPointer, glColorPointer) the performance is fine.

Here’s an excerpt of the vertex program and the code that does not work right:

PARAM mvp[4] = { state.matrix.mvp };
MOV result.color,vertex.attrib[3];
DP4 result.position.x, mvp[0], vertex.attrib[0];
DP4 result.position.y, mvp[1], vertex.attrib[0];
DP4 result.position.z, mvp[2], vertex.attrib[0];
DP4 result.position.w, mvp[3], vertex.attrib[0];

glBindBufferARB( GL_ARRAY_BUFFER_ARB, vboHandle);

glVertexAttribPointerARB(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (char*)0);
glVertexAttribPointerARB(3, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (char*)12);

glDrawElements(GL_TRIANGLES, n, GL_UNSIGNED_SHORT, &indices[0] );

This code works right:
PARAM mvp[4] = { state.matrix.mvp };
MOV result.color,vertex.color;
DP4 result.position.x, mvp[0], vertex.position;
DP4 result.position.y, mvp[1], vertex.position;
DP4 result.position.z, mvp[2], vertex.position;
DP4 result.position.w, mvp[3], vertex.position;

glBindBufferARB( GL_ARRAY_BUFFER_ARB, vboHandle);
glVertexPointer( 3,GL_FLOAT, sizeof(Vertex), (char*)0 );
glColorPointer( 3,GL_FLOAT, sizeof(Vertex), (char*)12 );

glDrawElements(GL_TRIANGLES, n, GL_UNSIGNED_SHORT, &indices[0] );

For testing purposes I’ve hacked a small test program that you could (if you wanted to) download on

The console (which may be hidden behind the app window) prints the current fps and the mode used for rendering. With the space bar you can switch between standard vertex array, VBO and VAO. With ‘a’ you switch between using glVertexAttrib and standard arrays (glVertexPointer) (Keys must be pressed in the app window - not in the console).

My numbers are:
~8 fps for standard arrays and conventional attributes,
~42fps VBO for VBO and conventional attributes
~40 VAO for VAO and conventional attributes

~6 fps for standard array and generic attributes
~0.2 (!) for VBO and generic attributes
~40 for VAO and generic attributes

Any help is welcome!


Unfortunately I have to confirm this. I tried different ways to specify tangent, bitangent and normals (all contained in GL_STATIC_DRAW VBOs) for use with a vertexprogram:

  1. B,T,N in generic attribute arrays 9,10,11
    -> not working with VBO at all (nullpointer exceptions)
  2. N via glNormalPointer(), B and T in attribute arrays 6,7
    -> very slow, transformed models don´t get lit properly
  3. N via NormalPointer(), B, T as texcoords 1 and 2
    -> faster, transformed models still have lighting problems
  4. N,B,T as texcoords 1,2,3
    -> fast, everything ok

The vertex program was properly adopted for each setup, in fact just the 3 ATTRIB lines at the top of it.
It almost looks like the vertex.normal attribute has some problems?! Btw, I´m using the ARB_position_invariant option. Drivers are catalyst 3.4.

Sounds like a driver bug to me. I think you should send a sample app + bug description to

What happens if you send everything via attribute arrays?

I didn´t test it. But the original posting sounds like he´s sending everything as generic attribute. Please try his testapp! It really shows the slowdown!

A little OT, but why do you bind buffer 0 after you render the object?

Yes when I use generic attributes in the vertex program I’m using only VertexAttribPointer to pass the arrays. Conversely when using conventional attributes in the vp the standard vertexpointer etc. are used to set up the arrays.

[If I understand the spec correctly mixing conventional and generic attributes generates undefined results. My understanding is that this might also be the case if you use e.g. vertexpointer and use vertex.attrib[0] though I wouldn’t bet too much money on that.]

Thanks skynet for verifying my results, what card did you run it on? I was afraid it was due to the modded driver (I’d prefer to have current official drivers but ATI is discriminating notebook users for pretty unclear reasons … uuuups, sorry for this rant)

Originally posted by Ostsol:
A little OT, but why do you bind buffer 0 after you render the object?

That’ll “unbinds” the buffer. I do that too, only I use GL_NONE (even though that’s still 0).

Originally posted by NitroGL:
That’ll “unbinds” the buffer. I do that too, only I use GL_NONE (even though that’s still 0).


Just because I wanted to simulate the usage in a bigger app I decided to set and clear the state completely between calls to DrawElements, thus I’m disabling VBO (by binding 0). Admittedly it doesn’t make much sense in this case, but the performance is pretty much the same if you bind and unbind only when switching modes and not for each call.

[Edit: Guess I’m a bit too slow with my answers. Thanks NitroGL!]

[This message has been edited by stefan (edited 05-21-2003).]

I use the official 3.4 catalyst drivers on a R9700 non-pro card.

Btw, as far as I understand the spec, its allowed to mix conventional arrays and attribute arrays, as long as you don´t specify or touch aliasing attributes at the same time via both ways I tried it out to be sure, the vertex program fails for instance, if you access vertex.attrib[0] and vertex.position inside one program.

skynet, I believe you’re right. The specs mentions in issue (5):
“Applications that mix conventional and generic vertex attributes in a single rendering pass should be careful to avoid using attributes that may alias”
That’s pretty clear…

Originally posted by NitroGL:
That’ll “unbinds” the buffer. I do that too, only I use GL_NONE (even though that’s still 0).

GL_NONE only works because it happens to be 0; the spec clearly says that 0 is the name of the non-existent buffer object, not that “NONE” is. So, while they are practically the same, using 0 instead of GL_NONE is the technically correct approach.

I must say I’m impressed: In less than a day I got a reply from ATI devrel.
They confirmed the bug and let me know that it’s already fixed (and “should” be included in the next public catalyst).

Using 0 for glBindBuffers means using System memory Buffer, it’s mean you may use generic vertex arrays