Many VBOs vs. single VB for dynamic meshes


I am writing a planet renderer using around 800 regular grids to store different terrain patches at various LOD. As you are moving around, the patches get recycled and populated with new data. Right now, I am using one large VBO and then update individual patches using glBufferSubData. I have read conflicting reports about whether it would not be better to use 800 different, albeit much smaller VBOs and then update each with glBufferData. On the one hand, I heard that glBufferSubData causes synchronization issues. But on the other hand, switching between many smaller VBOs also causes performance problems.

What’s the best thing to do here?


800 VBOs sounds like a lot. This page explains

Ok thanks, I will just leave it in one large VBO then. What I haven’t done yet is interleaved the vertex attributes. I have Normal, Tangent and 2 additional attributes (slope, elevation) for each vertex. What kind of perfomance improvement am I to expect with interleaving? Has anyone run some experiments with that ?

Yes, but it’s some time ago. On nVidias G70 hardware, interleaving was faster. But note, if you are using BufferSubData, your are forced to update all interleaved attributes. If just one attribute changes, interleaving might not be the best choice.


All attributes change for every patch, because it’s a completely new piece of terrain. So, I will give interleaving a try. Just browsing the forum though I noticed that some people say the interleaved data should be aligned to 32 bytes. Did I get this correctly? Is there is a reason behind? Somebody on gamedev said that he had no improvement unless aligned to 32 bytes.

I have 12 float attributes per vertex which adds up to 48 bytes. That means I would have to waste 16 bytes per vertex! Probably not worth it then.


I current have huge problems with dynamic VBO:s on Radeon cards. I do a lot of GUI rendering and for this I have a Vertex Buffer Pool from which the rendering code is allocating from. This pool consist of a double buffered 4 MiB VBO and after each frame I swap VBO and reset the offset to 0 (which is increased after each alloc). Everything is 32 byte aligned.

I’ve found that glBufferSubData is infinitely slow in ATI hardware, seems like every time you update just a small part it updates the whole buffer (takes like 50 ms, proportional to the size of the VBO). However, I’ve found that glMapDataRange is fine. So I map a range and do a memcpy.

I’ve still a performance overhead of 5-7 ms per frame compared to regular vertex array. The actual performance overhead isn’t in the scope of glBufferSubData/glMapDataRange but rather in the associated glDrawElements using this VBO.

And NVIDIA doesn’t seem to have a problem with this at all.

EDIT: I finally found the cause… I was using int16 for vertex positions and texcoord.

This is for ATi cards, and iirc it was true for the HD2x00 and older cards; not mentioned afterwards. The reasoning was that their pre-VS cache is 32byte/line and probably not sophisticated enough hardware to fetch across lines.
I haven’t done benchmarks for optimum VTX-size on ATi cards, but on nV ones it’s really never a problem IME (all possible vtx layouts execute equally fast, just that interleaved ones are 5%-10% faster on GF7x00 and older cards).

So you tested this on G80 up and there’s no difference anymore?


Yes, but it’s possible the benchmark was too synthetic. I was mainly targeting to find-out whether different vtx-data formats (ushort, short, float, half, etc) in different component sizes (1-4) would make any difference with static and streaming vbos; the conclusion was that regardless of interleaved-vtx-size and formats/components, performance is predictably identical, and thus those formats/components were really supported by the vtxattrib-fetching hardware (and not converted on-the-fly by drivers).

Just a side node: ATI seems to be very sensitive about alignment. Make sure, that vertex data is at least 4-byte aligned (i.e. dont put float-data on non-4 byte aligned adresses/offsets).
I recently fixed such a bug in our app where we mix ushort index data and vertex data in one VBO. Some vertex data ended up on 2-byte aligned adresses. The rendering became horribly slow and everything looked broken :slight_smile:

If I use one VBO, how do I find out the maximum VBO size in GPU memory I can create?

You create a big VBO and then call glGetError()

So there is no straightforward way of knowing how much GPU memory for VBOs I have available?

How much memory you have is probably going to vary based on how much memory your GPU has and how much memory you’re using for other things (primary framebuffer, FBOs, textures, shaders, PBOs/TBOs/other buffer objects/etc., what other GL programs are running at the time, and possibly even what your window manager is doing, etc.)

Here’s a thread going on determining the amount of total GPU memory: link