Gaps between DX10 and OpenGL 3.2

a sign that an API is well documented.

API documentation dictates behavior, not performance. Performance changes based on hardware vendors and other things; it cannot be enforced.

If you ever try to build an app that can guarantee no performance glitches due to texture uploads, shader changes, state changes, etc. you will feel the pain of this “black art”

The differences in performance that are being discussed here are fairly small. Unless you’re pushing the hardware to its limits (and nowadays, that’s a lot of hardware to be pushing), it’s generally not going to make a visual difference to the user.

API documentation dictates behavior, not performance. Performance changes based on hardware vendors and other things; it cannot be enforced.

Strictly speaking this is true, but in practice the story is much, much different.

Firstly an API is supposed to have the developer naturally use the “fast” path for the hardware, this is complicated by the fact that GL is supported by several hardware vendors (for consumer alone there are atleast 3 such vendors for GL3: nVidia, ATI and S3). However, for a fair number of extensions, performance expectations are given, i.e for example texture buffer object texel fetching is supposed to be faster than a texel fetch from a texture, etc.

Also, considering that the IHV’s contribute heavily to the GL3 spec, it is not unreasonable for the spec to include performance expectations or usage hints. By providing usage hints, software developers can get an idea of what they should do to hit the “fast path”, and hardware vendors can potentially optimize their drivers for the expected usage patterns.

Do you know why streaming with buffer objects is such a minefield? Usage hints.

The definition of the usage hints tells you what to do to achieve maximum streaming performance: use one of “stream” hints, and map the buffer to upload your data. Such a simple thing, and yet, that may or may not get proper streaming performance. It all depends on the implementation.

Usage hints are a bad idea. Different implementations will implement the hints in different ways, leading to the same problem that the hints were trying to solve: you have to test on every hardware to see if you’re getting the best possible performance.

Err, I was misunderstood, by usage hint, I did not mean an API entry point or additional arguments of how an object will be used, I mean:

Usage hint: state in the API how a set of functions is most likely to be used.

On the other hand that using streaming buffer objects is a minefield because of the usage hint in mapping and creating buffer objects has a potential communication fault: The spec does not state sufficiently clearly how those usage hints mean are expected to be used, as such driver implementers and software developers need to guess. The hints for buffer object creation seem to be pretty well spelled out, but the glMapBufferRange just gives some properties of the expected behaviour of mapping. What would be useful for both developers and implementers is an expected usage pattern: for example how to do streaming well with the API, by stating how the API expects to do streaming, then implementers and developers can see what is expected, if anything this is an example where in the spec or in a companion doc would help.

The spec does not state sufficiently clearly how those usage hints mean are expected to be used, as such driver implementers and software developers need to guess.

Oh no: the spec is very clear about what each of the 3x3 combination of usage hints mean for how the user should use the buffer. The only one that could be considered slightly unclear is DYNAMIC, and that’s due to the question of when something deserves to be STREAM vs. DYNAMIC or STATIC vs. DYNAMIC.

the glMapBufferRange just gives some properties of the expected behaviour of mapping

The only thing that is unclear is whether implementations will properly utilize these values. For example, do you need to use DrawRangeElements for the invalidate range flag to work? Will the implementation even bother with invalidate range, instead just blocking until any part of the buffer is no longer in use?

What would be useful for both developers and implementers is an expected usage pattern: for example how to do streaming well with the API, by stating how the API expects to do streaming, then implementers and developers can see what is expected, if anything this is an example where in the spec or in a companion doc would help.

Driver developers will implement whatever ID does for streaming. What any such companion performance hint guide says is irrelevant next to making the next ID game run fast.

Oh no: the spec is very clear about what each of the 3x3 combination of usage hints mean for how the user should use the buffer. The only one that could be considered slightly unclear is DYNAMIC, and that’s due to the question of when something deserves to be STREAM vs. DYNAMIC or STATIC vs. DYNAMIC.

um look what I wrote:


The hints for buffer object creation seem to be pretty well spelled out, but the glMapBufferRange just gives some properties of the expected behaviour of mapping

Also,

The only thing that is unclear is whether implementations will properly utilize these values. For example, do you need to use DrawRangeElements for the invalidate range flag to work? Will the implementation even bother with invalidate range, instead just blocking until any part of the buffer is no longer in use?

Well, if the driver does not utilize these values, that is naughty of the driver, it is supposed to use them, right?

Driver developers will implement whatever ID does for streaming. What any such companion performance hint guide says is irrelevant next to making the next ID game run fast.

Ouch. So, whatever ID does, the driver writers follow? I find that kind of too cynical to believe, especially since ID games are all, even their upcoming RAGE, GL 2.1, not 3.x. Following the same logic, does Apple then optimize their GL for Blizzard games?

“even their upcoming RAGE GL 2.1”

And even that is not for sure anymore.

Well, if the driver does not utilize these values, that is naughty of the driver, it is supposed to use them, right?

The driver is supposed to make the program as a whole fast. It is quite possible that collating the data necessary to make the invalidate range flag work causes each draw call to be slower.

I find that kind of too cynical to believe, especially since ID games are all

You were not an OpenGL programmer during the days of compiled vertex arrays. For many drivers and driver revisions, there was only one compiled format that gave good performance: the one Quake 2 (or 3?) used. All others gave horrible performance.

While that particular nightmare is long over, OpenGL driver optimizations are still focused on what ID does.

Apple then optimize their GL for Blizzard games?

I’m sure Blizzard games spend as little time in AppleGL as possible.