Yay! ARB_vertex_buffer_object supported in ATI's Catalyst 3.4 drivers!

Originally posted by Korval:
I have to ask: why are you computing the L&H vectors on the CPU? Isn’t that what ARB_vertex_program is for?

Yes vertex programs can compute it, but it has to be computed every time you render the model. If you can render the model multiple times when the L&H vectors don’t change, you’d better compute them into the CPU once and then reuse the values.

You will say that L&H vectors change very often since it depends on either the camera, the light, or the objects which all can move, and you’d be right. But even in that case it is still interesting to store the vectors on the CPU side, especially for multipass rendering.

Just a quick note on VBO sizes. We have made changes in the driver that should improve cases with massive numbers of VBO’s in the next couple driver releases.

I still however do not advocate microscopic VBO’s, as they will likely be somewhat inefficient. As someone mentioned above, memory management takes resources and typically has allocation granularities. I doubt anyone writes their app to dynamically allocate every chunk of memory individually, typically block allocations are used when applicable.

-Evan

Will anything be done about the 32 MB limit that one runs into? Even with VBO I still cannot create a single vertex array of more than 32 MB or several arrays whose total is greater than 32 MB. Even with such a limit, shouldn’t any data that cannot fit be pushed over into AGP memory? As it is, this does not appear to be the case. . .

Originally posted by ehart:
[b]
Just a quick note on VBO sizes. We have made changes in the driver that should improve cases with massive numbers of VBO’s in the next couple driver releases.

I still however do not advocate microscopic VBO’s, as they will likely be somewhat inefficient. As someone mentioned above, memory management takes resources and typically has allocation granularities. I doubt anyone writes their app to dynamically allocate every chunk of memory individually, typically block allocations are used when applicable.

-Evan[/b]

That’s good to hear, Evan. Of course, I agree with the advice on microscopic VBOs. The gl*Pointer() validation overhead becomes dominant below some threshold - this is true even if you have a single VBO with lots of microscopic arrays in it. You should really coalesce microscopic arrays – or render in immediate mode.

Thanks -
Cass

Originally posted by velco:
[b] No new GNU/Linux drivers though … again

~velco[/b]

There are new Linux drivers for ATi cards here: http://www.schneider-digital.de/html/download_ati.html

They don’t support VBO though unfortunately, but they are much better than the previous release.

This is true for D3D, but not OpenGL. Draw calls are expensive in D3D because they are done in kernel mode. GL calls are in user mode, and therefore relatively lightweight.

I always had assumed that the paper was discussing a hardware problem, not a software/API one. Granted, until I read that paper, I had assumed that a glDrawElements call, without indices in AGP/video memory, would only do a quick index copy to AGP, add a few tokens to the command stream to the card, and return. Which, of course, didn’t account for the problems with D3D’s calls. I figured that, for reasons that would require intimate hardware knowledge, that there needed to be some explicit synchronization event or something of that nature.

Hmmm… this changes much…

Specifically it is an implementation detail that is invisible to the user, and would therefore never be specified.

What I would like to have is consistent performance. Regular vertex arrays do give consistent performance… consistently slow.

I would much rather see the driver throw an error or something than have it page VBO’s out to system memory. Why? Because that does me little good.

One of the primary purposes behind extensions like VBO is to prevent that system-to-AGP/video memory copy that takes place with regular vertex arrays. Now, you’re basically saying that VBO may, or may not, prevent that copy. It all depends.

It would be very nice if there was an explicit way to let the driver know not to page out VBO’s to system memory.

In any case, you want to give the driver the opportunity to lay these things out in memory the best possible way.

The best possible way to lay things out is to put all static VBOs into video memory and all non-static ones into AGP. Putting either into system memory does precious little for performance.

If you can render the model multiple times when the L&H vectors don’t change, you’d better compute them into the CPU once and then reuse the values.

I don’t know about that. By computing them on the GPU:

  1. you save the bandwidth of sending them. That’s 6 less floats, or 24 fewer bytes, per-vertex. This bandwidth could go to more texture fetches

  2. you get to have more consistent performance. The worst-case of the CPU approach is (likely) less optimal than the worst-case for the GPU approach. Obviously, the best-case CPU is better than the best-case GPU (for vertex T&L, not transfer). Indeed, the worst-case GPU is the same as the best-case GPU. So, while you may be getting less performance, you’re, at least, getting consistent performance per-frame. Which is often better than having the sometimes-good/sometimes-bad performance.

  3. you don’t have to create dynamic or streaming VBO’s. They can all be purely static data. And, therein, lies the possibility for greater vertex throughput (or, at least, more vertex bandwidth).

  4. you get more time on the CPU for those sorts of tasks.

  5. GPU’s get faster faster than CPU’s. As such, relying now on shader performance makes things easier in the future.

Will anything be done about the 32 MB limit that one runs into? Even with VBO I still cannot create a single vertex array of more than 32 MB or several arrays whose total is greater than 32 MB.

That, I find to be completely unacceptable. I can live with only being able to allocate, at most, a few thousand VBOs, but not being able to allocate more than 32 total MB of memory? No, that is just unacceptable and must be rectified.

[This message has been edited by Korval (edited 05-20-2003).]

Originally posted by Korval:
That, I find to be completely unacceptable. I can live with only being able to allocate, at most, a few thousand VBOs, but not being able to allocate more than 32 total MB of memory? No, that is just unacceptable and must be rectified.

I just tried allocating a 32MB VBO, and it works just fine. 64MB crashed the GL ICD, but that’s because it ran out of memory (my app uses a LOT of texture memory).

Edit:
That’s with the latest 3.4 drivers BTW.

[This message has been edited by NitroGL (edited 05-20-2003).]

Originally posted by NitroGL:
I just tried allocating a 32MB VBO, and it works just fine. 64MB crashed the GL ICD, but that’s because it ran out of memory (my app uses a LOT of texture memory).

First of all it shouldn’t crash your ICD, it should report GL_OUT_OF_MEMORY.

Second, that indicates a big difference between ATI’s implementation and NVIDIA’s. For laughs, I tried to allocate VBOs in a loop to see how far it would go, and it kept going without GL errors until Windows told me it was going to resize the swap file to make room

– Tom

Originally posted by Tom Nuydens:
First of all it shouldn’t crash your ICD, it should report GL_OUT_OF_MEMORY.

It does, but my program doesn’t stop on out of memory errors. It crashes when I try to get a pointer to the memory (my program = dumb imp ).

Originally posted by NitroGL:
It does, but my program doesn’t stop on out of memory errors. It crashes when I try to get a pointer to the memory (my program = dumb imp ).

Ah, good news for the rest of us

I still think it’s illogical not to let VBOs spill over into system memory, though. Did ATI_vertex_array_object exhibit the same behaviour? Does it work if you allocate those 64 MB in several small VBOs instead of a single large one?

– Tom

[This message has been edited by Tom Nuydens (edited 05-21-2003).]

Originally posted by Korval:
GPU’s get faster faster than CPU’s

Yup you’re right. I’m not telling you that overloading CPU is always a good thing. I’m just explaining why it is not such a bad idea in certain cases.

Just a quick follow-up to my earlier post. I believe the size cap issue is also resolved for future drivers.

-Evan

Originally posted by ehart:
[b]
Just a quick follow-up to my earlier post. I believe the size cap issue is also resolved for future drivers.

-Evan[/b]

Ah. . . that’s great new, too! Thanks!

I believe the size cap issue is also resolved for future drivers.

Didn’t you guys just release a driver last week? Considering that it “is resolved” currently, how is it not in current drivers?

just a guess : drivers that were released last week got out of the dev dept several weeks ago; they had to get through all the QA and WHQL thingy before being released to the public. Issues mentioned here were not fixed at that time.

WHQL takes time unfortunately.

Edit: kehziah beat me to it.

[This message has been edited by Humus (edited 05-21-2003).]

Didn’t you guys just release a driver last week? Considering that it “is resolved” currently, how is it not in current drivers?

I really hope we don´t have to wait another TWO months until new drivers get out. I do not need WHQL very much, since it doesn´t check for OpenGL(-Extensions) compatibility anyway. Releasing non-WHQL drivers (even beta ones, like those infamous “leaked” detonators from nVidia) would help us and ATI. They get faster feedback and we get faster fixes!

Originally posted by ehart:

I still however do not advocate microscopic VBO’s, as they will likely be somewhat inefficient.

How small is microscopic?

Pete