VBOs and ATI

What if you render using the VBO position buffer only?

So if one buffer is used by several different setups you will have it in the memory several times. Additionaly you need heuristic to determine when to release those caches. That heuristic can still interfere in unexpected way with different program having the same issue and different usage pattern. Such program then might have spikes of low framerate in seemingly random situations.

Leaving it slow so the developer will notice it and update its program is easier and more reliable.

Here’s a test with meshes, which are a but more conventional rendering:

1000 oildrum meshes, 256 polys each.

VBOs:
21 FPS

Vertex arrays:
19 FPS

So it may be that the way I was rendering terrain was a unique situation.

And in case you haven’t, what if you use the same usage hint for both position and attribute? Since someone mentioned it was a killer to have the buffers in different memory, perhaps that will help.

Imho that won’t be so different from what it is now, where you get constant low framerates in seemingly random situations (hw/driver combinations). Given that you have a bazillion different ways of issuing geometry in OpenGL, I’d prefer it if the driver was a bit smart, since, after all, it’s written by someone who knows the hardware.

Why should the driver be “a little bit smart”, only because the one who writes the OpenGL app is “a little less smart”.

It doesn’t make sense to optimize the case, that the developer does stupid/wrong things.

Leadwerks: Do you have your indices in a VBO too (preferably as unsigned shorts) ? Because with VBOs you should get much better speed compared to conventional arrays. At least with position-data only.

Little hint: Your first posts starts with a rant about how bad ATI is, but you haven’t really given THAT much information about how you set up the pipeline. From my experience ATI is just as good as nVidia with VBOs, so i would assume your use-case is just inefficient / rare. But with the given information i can’t really tell.

Jan.

My post does not start with a rant about how bad ATI is. I said that their latest drivers fixed a lot of problems they had before. I am very happy with ATI right now.

It seems that with conventional mesh rendering, VBOs offer a slight advantage on ATI hardware, and the way I am rendering terrain works better with vertex arrays. I am not complaining, I am just fine-tuning performance for ATI cards.

I think I know what might be going on. When you use the unsupported format for one from the arrays, the driver will switch to fallback. That fallback is likely created to be compatible with the OGL specification while being simple to implement (even if it is slow). One likely implementation matching what you see is that the fallback in all cases simply creates buffer big enough to contain required number of properly aligned vertices utilizing proper format and then there is one big loop which will for each vertex fetch data from original arrays and store them into the new buffer. This would mean that as long there is at least one array in unsupported format, everything will be copied into that new buffer (and you will pay for the readback of the positions from the VBO).

The logical conclusion for that would be that using the supported format might not only restore speed to the VBO path, it might also increase speed (or decrease cpu consumption) for the standard memory arrays. Of course I might be wrong in my assumption however you should really check what happens when you use aligned number of shorts for the additional streams.

While in this case the situation might appear random at first. It is consistent. If you render the mesh with specific setup, it will be slow (as documented by ATI) as opposed of it being slow only when you look at it for less than three frames and after you did not see it for one minute.

What I prefer is having a nice paper written by the same people describing what you should and should not do to get the best performance. It might be caused by the fact that I was bitten by the Nvidia driver reoptimizing the GLSL shaders when a vec4 uniforms changed.

In which case it shouldn’t be slow because you only render 3 frames. And anyway, if it always does that, it’s consistent too so :wink:

Having three frames rendered at something like 5fps is not nice. You are right, in that sense it is consistent, however determining that it is caused by driver hiding use of unsupported vertex format might be not easy.

Do let me know how I should know what is the stupid/wrong way to render various things on ATI/NVIDIA/Intel (taking any driver specific issues into account). Aside from some general hand-wavy docs, I haven’t found anything that could really help me in that area.

For instance, where is it mentioned that if I want to upload a dynamic texture each frame, it’s MUCH faster to do it “the old way”, compared to PBO, on my 7800 GT / Windows XP / forceware from around a year ago. IF the texture is 256x256, that is (and only then).

Well if the rest of the frames were rendered @ 10ms each (ie 100fps), the average would still be 43 fps. Sure it’d probably look slightly jerky, but it wouldn’t be a constant 12fps (as in the OP’s case).

True. Then again, if it’s that easy to do now, why does the OP still have a problem? :wink:

Yeah I know he hasn’t shared too many details, I’m just saying that currently it is SO easy to fall into a trap and get bogged down, especially on some platform you don’t have direct access to.

I guess what I’m really trying to say is that I can’t wait for OpenGL 3 to be released :slight_smile:

It might be only jerky or it might be a serious problem depending on when it happens. For example in my problem with the GLSL compiler the rendering was stalled for more than one second when setup of dynamic lights changed to new configuration (many shaders with many uniforms, I was able to warm-up them on the ATI however not on the Nvidia). This might be bearable from time to time when you look at the scene however the problem was that most changes of dynamic lighting happened to be during gunfight when such stall is not acceptable.

True. Then again, if it’s that easy to do now, why does the OP still have a problem? :wink:

Maybe because he did not change the format to supported one yet.

Yeah I know he hasn’t shared too many details, I’m just saying that currently it is SO easy to fall into a trap and get bogged down, especially on some platform you don’t have direct access to.

I know. It happened to me more than once. Sometimes for very stupid reasons :o

Yeah I can see how that can be annoying. Though I would say that in that case, the driver was retarded as well. :slight_smile:

If it had been a bit smart, it would have first determined if it was any point to optimizing it (ie the uniforms are static enough), and if so, compiled the optimized version in the background, replacing it when done.

But I get your point :slight_smile:

Yeah, well still waiting on the Linux version that was promised back in Nov '07.

It’s a shame because the NVPerfSDK for Linux released back in Sept. '06 was really nice.