Weird VBO perf

I recently reinstalled win2k on my system and the performance of my app dropped dramatically when using VBO, it’s almost half
the speed it was.
I made a simple test application to try a few VB formats under optimal conditions, here are some results:

Just verts: 83M tris/sec
V+T: 37M tris/sec
V+N: 33M tris/sec

The latter two scores are decisively worse than what my app was doing before the reinstall.
It isn’t a general performance degradation, Tom Nuydens’ VBO demo actually improved from 44M to 53M tris/sec.

The system is an Athlon XP 2.6, nforce2, GF 5800 ultra, Detonator 44.03.
The VBOs are STATIC_DRAW and they do still improve performance considerably. Without VBO there isn’t any noticeable difference from before the reinstall.

Any ideas why I might be getting these results?

Did you install the nForce unified driver for you motherboard?

Wild guess, but did you remember to disable VSync after your reinstall?

– Tom

All the drivers are installed fine and vsync is certainly not it.
I’ve been banging my head against the wall for a week over this and I really just can’t think of anything.
It seems to me that it’s exclusively a matter of VB formats. I think I’ll try playing around with some vertex programs and see if I can find anything out from them.

It isn’t a general performance degradation, Tom Nuydens’ VBO demo actually improved from 44M to 53M tris/sec.

This tells me that your re-install gave you updated nVidia drivers. Your app was likely misusing the VBO extension before, and therefore runs more slowly as their implementation has become more stabalized.

I know it doesn’t make much sense but I was using the same drivers. I had also used some earlier driver that didn’t expose the extension string but worked anyhow, the performance with the two was identical.
Before the reinstall the performance would vary between two considerably different levels, both much higher than now though.
It’s still strange how the performance between just verts and verts+texcoods (which is too slow) is so different, anyone had similar results?

The only difference in configuration might be the win2k service packs, this version is SP3, I don’t know what it was before. But…

I still want to try track it down to something more specific.

I’ve found the vertex format to have a huge impact on the speed as well. For my current app I’m using 32-byte vertices and it effectively runs twice as slow as my older VBO demo (the one you tried), which uses 16-byte vertices. Both apps perform far worse than what I can reach with VAR.

To be specific: the card is a Quadro FX 1000. The VBO demo on my site does just under 30 MTris/sec, my current app with the larger vertices does about 14 MTris/sec, the VBO demo modified to use VAR reaches 85 MTris/sec. For comparison, I can squeeze upwards of 40 MTris/sec out of my Radeon 9000 with VBO, and the vertex format makes little difference.

– Tom

Hmm… This VBO seems rather sporadic. I read that it was supposed to be as fast as VAR, I dropped support for VAR long ago so I can’t really compare.
I don’t really want to waste time on this but polygon throughput is extremely important in my current work so I think I’ll have to investigate. I’ll post whatever useful results I get.

Tom, did you compare VBO with VAO on your radeon? To my surprise I found that my app is 30% faster with VAO (M9 Cat 3.5). This is really surprising since both extensions are quite similar (in contrast to VAR vs. VBO).

Originally posted by stefan:
Tom, did you compare VBO with VAO on your radeon? To my surprise I found that my app is 30% faster with VAO (M9 Cat 3.5). This is really surprising since both extensions are quite similar (in contrast to VAR vs. VBO).

Hmm. NVIDIA says the same and some people measure other.
It would be nice if vendors make things true they say…

Csiki

I fished out an old var demo and that does 60M tris with lighting, that’s about what I was getting before the reinstall… It really seems like the problem is with VBO but how could it have previously been faster? Surely it doesn’t have external dependancies.

I don’t think eliminating workload with vertex programs would be much use after all, the performance seems almost proportional to the vertex size. with 16 bit I get performance just above Tom’s demo. Previously I would outperform it considerably with 24 bit verts. Tom’s demo is now faster and my apps much slower…

Tom, those are ugly results for VBO on your Quadro. Do you use video memory with VAR? What’s it like with AGP?

I’m hoping this situation will improve with future drivers.

I don’t see any difference between VBO and VAO on my ATI card. Their VBO implementation still needs work as well, though – it can’t handle large amounts of data. I haven’t figured out the exact number, but it definitely can’t handle 100 MB of vertex data on my machine (regardless of how many VBOs I spread the data across). We could debate about whether or not this is a valid restriction, but in any case it should generate GL_OUT_OF_MEMORY and/or return a null pointer when you allocate and map the buffers. Neither of which are the case – it just crashes and burns on the first glDrawElements() call.

As for NVIDIA, I can’t remember if I used AGP or video memory for my comparisons between VAR and VBO, but I probably tried both and picked the fastest one. I’ll check tonight if there’s actually any big difference between the two.

– Tom