VBOs and ATI

When I disabled VBOs on my ATI 2400 HD Pro test card I got about a 1000% increase in framerate, even when drawing a repeating mesh.

Note to self: never use VBOs on ATI hardware.

Also, glGenMipmapsExt() does not seem to work on ATI cards. However, their latest drivers seem to have fixed a lot of glsl problems, so I am calling off the attack on AMD headquarters.

Just of out of curiosity, what are use using instead of VBOs ? Display Lists ?
What was the vertex format you used to fill the VBOs ?

I think it should be time for me to have another look at my ATI test cards !..

Vertex arrays.
The terrain patch I was testing with was just a vertex position buffer, with an interchangable attribute buffer. Making the attribute buffer a vertex array instead of a VBO gained about a 10% increase. Replacing the position buffer with an array resulted in an exponential gain in speed.

The rendering routine was like this:
Set buffers/arrays
change attribute buffer/array
change attribute buffer/array
change attribute buffer/array
change attribute buffer/array
unset buffers/arrays

The fact that the position array was so much faster was surprising to me, because it is being rendered over and over, something that VBOs are very good for.

What is the format of your position array VBO ? Maybe the format isn’t natively supported by the card and the driver has to copy it back to system memory, convert it and send it back to the GPU each time you use it, that would explain the bad performances.

In which case the driver would be retarded. Then again, it’s ATI drivers we’re talking about here…

When you say “change attribute buffer” Leadwerks, you mean you have bunch of static attribute buffers, and just change the active one? Or do you dynamically change the buffer?

ATI cards require all elements to be 32bit aligned and have size which is multiple of 32 bits (e.g. properly aligned 4 ubytes are fine, 3 ubytes are not). Otherwise the driver must convert the data during drawing which is very costly when VBOs are used.

In which case the driver would be retarded. Then again, it’s ATI drivers we’re talking about here…

At the time the driver allocates the memory (glBufferData) it has no idea what kind of data will be stored in it. glDrawElements() is the first opportunity for the driver to check what kind of data streams it has to fetch. And in case of non-hw-supported data formats the only thing it can do is to copy back to sysmem, do a conversion on the fly and then copy back (into some temporary buffer). Nvidia suffers the same problems. Just make sure to use common, hw-supported data formats and everything will be fine with VBOs. Just aother tip: to actually make sure that the VBO is allocated in VRAM, use GL_STATIC_DRAW as usage hint.

Oh wouldn’t it be lovely to be able to check these sort of things. Like a glGetWarning() for really obvious performance-issues or something? :stuck_out_tongue:

It is. :cool: NVIDIA offers that with the PerfKit:

NVIDIA cards don’t slow down when theres no 32 bit alignment?

They probably do as well.
I always use 4 byte aligned data anyway. And most of the time everything as floats which never hits this 3ub snag or other unsupported formats.
If you want colors as unsigned bytes use 4ub.
The only other thing which is slightly ok, is normals as signed shorts, but that has issues with representing some values exactly.

He said the buffers were static, in which case it is retarded of the driver to not keep the converted buffer. But that’s just my humble opinion :slight_smile:

When I was hit by this alignment issue few years ago, Nvidia did not have that limitation. From the threads that appear from time to time on this forum it seems that this did not change.

Interpretation of the VBO depends on setting of the arrays at time the draw call is issued. This can change between calls even if the content of the buffer did not. You might even render from one buffer with several different settings and only some from those settings might be incompatible while other can fit within hw limitations.

Unless one specific combination of setup and vbo is simply identifiable by the driver (like is the case with display lists), there is imho no effective way to handle such caching.

The position array is 12 bytes, so that is not an issue.

The only time I have run into the 32-bit alignment problem was when I was using RGB byte values for a color array. I simply switched to using RGBA values.

I DO use shorts for the attribute array, and there is an odd number of vertices in each patch, but using a vertex array instead of a VBO makes only a small difference with the attribute array. Switching these values to floats would mean a gigantic increase in memory usage.

The shorts are supported however there must be 2 or 4 of them in one element (vertex attribute).

I do not know what is current state of things however on some older hw, mixing of arrays from system memory with arrays from video memory (VBO) in single call caused performance issues. The hw required that all vertex data are present in the same type of memory so the driver had to copy the data as necessary.

Affirmative. That holds true today. Don’t mix attributes in standard vertex arrays with attributes in VBO arrays or you get a performance penalty.

It could easily do some “thinking” when it has to convert a buffer, and figure out that perhaps it would be smart to keep the converted buffer as well as the original. That is, “hey I’ve had to convert the exact same buffer X times in the last N frames, perhaps I should keep the converted one as well”.

Anyway, given the current state of things, I can’t really blame them for not working their ass off to optimize the bazillion different combinations that OpenGL allows for.

Vertex attribute buffer
Position buffer
12 FPS

Vertex attribute array
Position buffer
15 FPS

Vertex attribute array
Position array
100 FPS

The attribute array changes for each render (single call) but the position array remains the same, for dozens of draw calls.