VBO Performance Strategy

Originally posted by Korval:
I seem to recall that ATi’s drivers at the time of HW2’s release had some issues with the game. Did this have something to do with their VBO implementation at the time, and did ATi correct the problem?

The initial VBO support from ATI had some issues with dynamic buffers, but ATI fixed them before we shipped. When we shipped we weren’t aware of any issues with their current drivers at the time. Still, at first, a couple users claimed they needed to run Homeworld2 with the -noVBO command line parameter to disable VBOs in order to play the game, but ATI seems to have identified and fixed most of those cases now.

I have looked into Star Wars K.O.T.O.R executable and found some extension names.
There is VBO among them. But there are VAR and fence too…

[BTW, first game for both XBox and PC which doesnt use DX Graphics?]

We use VBO if available and well supported (doing graphics driver version look-up); else we use VAR/fence if available; else we use vertex arrays. We previously did display lists, but there’s too many problems with them, and they use more memory than vertex arrays.

Regarding LOD, if you do progressive meshes with sliding windows for LOD, then you need to keep all the verts in a single buffer – and you can’t even do progressive meshes at all using display lists.

I’ve found that the OpenGL support for the tier 1 hardware vendors is great, for the tier 1 integrated vendor it’s great, too (although the performance is … integrated), but the OpenGL support on the bottom 20% of the market is so bad that we can’t run on those chips. (They crash within 3 seconds, typically, and I’ve never gotten a reply from any of the e-mails I’ve sent to Taiwan headquarters or US sales about the problems)

That’s one of the unfortunate advantages of the DirectX driver methadology. By taking on so much of the code-work themselves, Microsoft makes writing relatively bug-free DX drivers easy, as long as the DX component in question is relatively bug-free. OpenGL, especially at the higher-level extensions (glslang, VBO, ARB_fp/vp), makes writing implementations orders of magnitude harder than it used to be.

Originally posted by forgottenaccount:
The initial VBO support from ATI had some issues with dynamic buffers, but ATI fixed them before we shipped. When we shipped we weren’t aware of any issues with their current drivers at the time. Still, at first, a couple users claimed they needed to run Homeworld2 with the -noVBO command line parameter to disable VBOs in order to play the game, but ATI seems to have identified and fixed most of those cases now.

right now, i have installed the new ati drivers (cat 4.1) over my (cat 3.9). (Because since this driver is released, our hotline gets strange crash-reports on ATI hw…)

And suddenly, ALL dynamic accesses to VBO’s are screwed up!
it looks like, the ATI drivers don’t care if a currently rendered object is mapped or not.
the bugs look like the VBO memory is updated DURING rendering.(as a ex-driver-developer i know how this error look like…)
static geometry is fine. there are also strange performace drops, every 5th frame or so.

my VBO implementation works perfectly with cat3.9 and all detonators/forcewares…so i don’t think my implementation is wrong.

until cat3.9 i really thought, ati’s driver quality improoved in the last years… but now: how can such a bug pass the beta-test!!!

I also have noticed that problem, catalyst 4.1 is screwing up my dynamic stencil shadows (implemented using VBO) from time to time. I have mailed ATI´s devrel already, but they don´t seem to “believe” it,since I´m not able to send a Testapp proving it. (Can you do?) Also downloading from VBO is screwed up from time to time. ATI knows this for a rather long time now, but they didn´t do anything about it in the last few driver versions :-(. I think, they were too busy implementing those extraordinary useful “Smartshader Effects” :wink:

Originally posted by Korval:
OpenGL, especially at the higher-level extensions (glslang, VBO, ARB_fp/vp), makes writing implementations orders of magnitude harder than it used to be.

OK, but eventually your drivers should improve and become less buggy.
But what company are we talking about? Volari is the only one that supports the new ext.

I’ve noticed a strange problem with Cat 3.10 and 4.1
When I use VBO for rendering one of my objects multiple times, it only renders the first time.

Did anyone else experience this?

v-man: i have no problems with static geometry at all.

i did some tests and i detected that the glMapBufferARB function causes most problems. when i replace all glMapBufferARB calls with according glBufferDataARB calls, everything looks ok, but the performace is still questionable (1/2 of the speed of my non-VBO path).

AdrianD,
What problem did you see with glMapBuffer?

I have static geometry. Once I upload, I never change it.

To be more clear, my algo looks like this.

RenderObjectX();
RenderObjectX();
RenderObjectX();
RenderObjectX();
SwapBuffers();

Also, I tried to use generic attrib functions and still I get the same problem with 4.1

I have a single VBO for vertex, normal, texcoord, tangent, binormal. It looks like it can’t access normal and all the rest. As if the offsets are invalid.

Did anyone see that one at?

I’ve used VBO on Catalyst 4.1, using both static, dynamic, and streaming draw, and I get none of the artifacts you’re describing. I put all the data in one buffer for static geometry, and position/normal in one (streaming) buffer, texture/color in another (static) for soft-skinned geometry.

I saw no performance difference between STATIC_DRAW and DYNAMIC_DRAW, though. I’m pretty sure I’m CPU limited at that point on a Radeon 9700, but I’d expect DYNAMIC_DRAW to reduce available memory bandwidth for the CPU (coming out of AGP), whereas STATIC_DRAW might come out of VRAM and thus not load the memory bus. Oh, well, not too much to worry about; it seems to run fine. It also works fine on NVIDIA with series 5x.x drivers.

the problem with glMapBuffer is, that the driver does not check if this buffer is currently in use(rendering) while i am locking&updating this buffer.(according to the specs the driver should do it, or give me another valid piece of memory.ie.by making a copy)
and because of that, i can sometimes(depends on scene size/polygoncount) see that some meshes are rendered with the vertices of the previous frame, and some with the current.(even in a multipass algorithm where i am first uploading all geometry and then draw it: the pass for the first light is still from the last frame and the second light draws the correct vertices)

i do not have any problems with generic attributes, but i detected that you can’t use any generic attribute you want.
if you want to mix generic attributes with standard attribute bindings, you have to make sure that you don’t use the same generic attributes which are mapped on standard attributes (0…5 and the texture bindings)
ie.:you can’t use texcoord[0] and generic attribute#8 because they are mapped to the same data, and the generic attribute overrides the texcoord)
in my app, i bind my normal to the standard normal array, and the tangent and binormal are generic attributes 10&11. this works without any problems.

when i am talking about a performace loss, i also mean this compared to the previous driver version.
in some polygon-intensive demos i don’t get my optimized av.230FPS(only around 130FPS)

but i have also expirienced some speed improovements in other parts of the driver. ie. my vertex-programm-based extreme-crowd-rendering-example
(up to 10000 moving, animated objects - 20 objects are renderd at once using batching) is up to 50% faster and much more stalbe than before.(it seems, that statechenges for vertexprogramms are faster now…)

Originally posted by AdrianD:
i do not have any problems with generic attributes, but i detected that you can’t use any generic attribute you want.
if you want to mix generic attributes with standard attribute bindings, you have to make sure that you don’t use the same generic attributes which are mapped on standard attributes

at least that wasnt a surprise. if i remember the specs correctly it even states that the vertex-programm should be refused if its refering to an attrib as standard and generic. in the end i guess the only advantage of generic attributes is that its less confusing than passing some tertiary color as normals (that and you have more).

talking about vbo: is anyone else having horrible performance when using anything but floats in a vertex buffer? i could accept that it “doesnt like” vertex position as unsigned bytes, but when it would also start to crawl when passing colors from a vb as unsigned byte it started to get a little annoying. i can understand that floats are better, bigger, strong… eh, have better precision and the hardware is probably doing everything with floats anyway… but why should i have a 48mb vertex buffer if i only need 12mb? and the extra conversion really shouldnt turn 1200fps into 40fps, especially since even standard vertex array was running at 1200fps (with bytes and floats alike).

just when i thought the worst issues would be strange bugs like not allocating video memory below a certain size or (more understandable) above a certain size (though even when there would be more than 10 times as much free).

maybe someday soon they will decided how vbo should behave and what limitations they should have. that would be less troublesome than some new vbo issues with every new driver.

Jared,

how are you storing your data in the VBO?
Maybe you can put the color in another VBO. Separating the float from the ubyte may help … but i could be wrong. Are you saying using float works OK or you didn’t try yet?

Maybe we should have a generic compression for everything instead of just textures.

In my case
vertex is attrib 0
normal is attrib 2
tex is attrib 8
tangent is attrib 9
binormal is attrib 10

I’m not mixing generic and conventional for sure. I did a small glut test and it wasn’t working either.
It would be nice if someone can send me their source code or exe. I would like to see something that works.

Originally posted by V-man:
Jared,

how are you storing your data in the VBO?
Maybe you can put the color in another VBO. Separating the float from the ubyte may help … but i could be wrong. Are you saying using float works OK or you didn’t try yet?

i tried pretty much every combination i could think of but the frustrating result: as soon as i use something else but floats its killing performance. vertices and color were already seperated, in two buffers, in the same buffer with offset, interleaved etc… especially that moving it from va to vbo would make such a difference is weird when the data itself isnt changed. maybe i should try it with 4 unsigned bytes, just in case its some kind of alignment problem (though in that case ints should work at least)

[This message has been edited by Jared (edited 02-01-2004).]

talking about vbo: is anyone else having horrible performance when using anything but floats in a vertex buffer? i could accept that it “doesnt like” vertex position as unsigned bytes, but when it would also start to crawl when passing colors from a vb as unsigned byte it started to get a little annoying. i can understand that floats are better, bigger, strong… eh, have better precision and the hardware is probably doing everything with floats anyway… but why should i have a 48mb vertex buffer if i only need 12mb? and the extra conversion really shouldnt turn 1200fps into 40fps, especially since even standard vertex array was running at 1200fps (with bytes and floats alike).

That’s somewhat normal, and the significant framerate drop is expected. Here’s why it happens.

If the hardware can’t handle a certain vertex format, the driver must convert this data into something the hardware can handle. But, it can’t do this at upload time; it has to wait until each render time, and do it for each render call. Since the data may be in AGP or video memory, this provokes very slow read requests, thus further slowing down the process. Reading from AGP is pretty slow (uncached), but reading from video is excruciatingly slow, as it’s going over the PCI bus (AGP goes 1-way: to the card. Data from the card to the CPU has to go across the PCI).

Now, a more important question is, what hardware are you using? My 9500 can handle unsigned bytes just fine for colors (and all other vertex attributes). Lower-end cards (say, a GeForce 2 or less) may not be able to natively handle bytes for colors, thus provoking a conversion. The same might go for low-end Radeon cards too.

Originally posted by Korval:
Now, a more important question is, what hardware are you using? My 9500 can handle unsigned bytes just fine for colors (and all other vertex attributes). Lower-end cards (say, a GeForce 2 or less) may not be able to natively handle bytes for colors, thus provoking a conversion. The same might go for low-end Radeon cards too.

a radeon 9800 currently with cat 4.1. maybe its time for a little driver safari. or a creative method for vertex compression.

Sorry sorry sorry!
I forgot to mention that this is not VBO related, cause I already tested the standard VA path and I get identical results.

However, the exact same problem appears for both VBO and VA when using generic attributes.

For example, in a simple GLUT code, if I have this for the normals or whether I comment it, I get the same result :

glEnableVertexAttribArray(2);
glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, VertexSize, VBO_NormalAddress);

I have no idea what’s going on. I’m going to do more testing I guess.

OK, I did a bit more testing and this is what I found.

Using fixed pipe with generic vert attrib doesn’t work. Is this the way it is suppose to be?
When doing that, and then enabling PP, it ****s up everything.

Just using PP with generic and also using generic in VP and FP works.
Using PP with generic, but using conventional with VP and FP screws it up.

This is bad. There should be a simple clear cut document that explains these pitfalls.

The fixed function pipe is not guaranteed to work right with generic vertex attributes. The standard says “there might be aliasing, or there might not”.

This ALSO means that if you specify using VertexPointer, NormalPointer, etc, then you have to read them using the named bindings, not the generic aliases, in the vertex program.