Many VBOs vs. single VB for dynamic meshes

Jochen · August 14, 2009, 1:12pm

Hi,

I am writing a planet renderer using around 800 regular grids to store different terrain patches at various LOD. As you are moving around, the patches get recycled and populated with new data. Right now, I am using one large VBO and then update individual patches using glBufferSubData. I have read conflicting reports about whether it would not be better to use 800 different, albeit much smaller VBOs and then update each with glBufferData. On the one hand, I heard that glBufferSubData causes synchronization issues. But on the other hand, switching between many smaller VBOs also causes performance problems.

What’s the best thing to do here?

Cheers
Jochen

system · August 14, 2009, 2:53pm

800 VBOs sounds like a lot. This page explains http://www.opengl.org/wiki/VBO_-_more

Jochen · August 14, 2009, 3:12pm

Ok thanks, I will just leave it in one large VBO then. What I haven’t done yet is interleaved the vertex attributes. I have Normal, Tangent and 2 additional attributes (slope, elevation) for each vertex. What kind of perfomance improvement am I to expect with interleaving? Has anyone run some experiments with that ?

CatDog · August 14, 2009, 3:20pm

Yes, but it’s some time ago. On nVidias G70 hardware, interleaving was faster. But note, if you are using BufferSubData, your are forced to update all interleaved attributes. If just one attribute changes, interleaving might not be the best choice.

CatDog

Jochen · August 14, 2009, 9:19pm

All attributes change for every patch, because it’s a completely new piece of terrain. So, I will give interleaving a try. Just browsing the forum though I noticed that some people say the interleaved data should be aligned to 32 bytes. Did I get this correctly? Is there is a reason behind? Somebody on gamedev said that he had no improvement unless aligned to 32 bytes.

I have 12 float attributes per vertex which adds up to 48 bytes. That means I would have to waste 16 bytes per vertex! Probably not worth it then.

Jochen

Sunray · August 15, 2009, 12:15am

I current have huge problems with dynamic VBO:s on Radeon cards. I do a lot of GUI rendering and for this I have a Vertex Buffer Pool from which the rendering code is allocating from. This pool consist of a double buffered 4 MiB VBO and after each frame I swap VBO and reset the offset to 0 (which is increased after each alloc). Everything is 32 byte aligned.

I’ve found that glBufferSubData is infinitely slow in ATI hardware, seems like every time you update just a small part it updates the whole buffer (takes like 50 ms, proportional to the size of the VBO). However, I’ve found that glMapDataRange is fine. So I map a range and do a memcpy.

I’ve still a performance overhead of 5-7 ms per frame compared to regular vertex array. The actual performance overhead isn’t in the scope of glBufferSubData/glMapDataRange but rather in the associated glDrawElements using this VBO.

And NVIDIA doesn’t seem to have a problem with this at all.

EDIT: I finally found the cause… I was using int16 for vertex positions and texcoord.

Ilian_Dinev · August 15, 2009, 6:24am

This is for ATi cards, and iirc it was true for the HD2x00 and older cards; not mentioned afterwards. The reasoning was that their pre-VS cache is 32byte/line and probably not sophisticated enough hardware to fetch across lines.
I haven’t done benchmarks for optimum VTX-size on ATi cards, but on nV ones it’s really never a problem IME (all possible vtx layouts execute equally fast, just that interleaved ones are 5%-10% faster on GF7x00 and older cards).

CatDog · August 15, 2009, 9:47am

So you tested this on G80 up and there’s no difference anymore?

CatDog

Ilian_Dinev · August 15, 2009, 11:09am

Yes, but it’s possible the benchmark was too synthetic. I was mainly targeting to find-out whether different vtx-data formats (ushort, short, float, half, etc) in different component sizes (1-4) would make any difference with static and streaming vbos; the conclusion was that regardless of interleaved-vtx-size and formats/components, performance is predictably identical, and thus those formats/components were really supported by the vtxattrib-fetching hardware (and not converted on-the-fly by drivers).

skynet · August 16, 2009, 3:52am

Just a side node: ATI seems to be very sensitive about alignment. Make sure, that vertex data is at least 4-byte aligned (i.e. dont put float-data on non-4 byte aligned adresses/offsets).
I recently fixed such a bug in our app where we mix ushort index data and vertex data in one VBO. Some vertex data ended up on 2-byte aligned adresses. The rendering became horribly slow and everything looked broken

sammie381 · August 16, 2009, 10:47am

If I use one VBO, how do I find out the maximum VBO size in GPU memory I can create?

system · August 16, 2009, 11:42am

You create a big VBO and then call glGetError()

sammie381 · August 16, 2009, 11:48am

So there is no straightforward way of knowing how much GPU memory for VBOs I have available?

Dark_Photon · August 17, 2009, 4:52am

How much memory you have is probably going to vary based on how much memory your GPU has and how much memory you’re using for other things (primary framebuffer, FBOs, textures, shaders, PBOs/TBOs/other buffer objects/etc., what other GL programs are running at the time, and possibly even what your window manager is doing, etc.)

Here’s a thread going on determining the amount of total GPU memory: link