Official feedback on OpenGL 4.4 thread

The 4.4 updates to the man pages are live now, again thanks to Graham Sellers.

There appears to be some weirdness with Javascript showing up at the beginning of the page, at least for me in Chromium 27, but the actual content beneath that looks OK. We’ll work on the weirdness, probably something to do with updated Docbook stylesheets.

Looking at the links into D3D10 docs that Alfhonse provided one sees that the usage enumerations and flags for mapping in D3D10 are far fewer than what is found in OpenGL. The main difference, I guess, is that in D3D10 (and 11) is that MS-Windows does the memory management where as in GL it is the IHV that does this job. This is a guess.

Now my 2 cents on the memory thing. Essentially the idea is that GL is supposed to manage memory for a developer. This is why we have all these hints and weird things drivers do in either(or both) using the hints or the application behavior. In all honesty, I think that sucks.

On the other hand, lets say we want an API available so that one can manage the memory more directly one-self, like in console development. The issue here is then that different boxes have very different memory architectures. The most obvious being UMA vs discrete memory. But even in that case there are subtleties about caching behavior and so on.

And now I propose something quite heretical. I propose that each IHV makes a buffer object extension that exposes these subtleties and allows a developer to make choices and the driver just obeys. Not only will each IHV need to make the extension, they will also need to make their extension (or another extension) for different memory configurations of their hardware (for example AMD APU’s vs AMD discrete cards). A GL application would then be written as follows: check which of the memory extensions it supports is available and use that one, otherwise use the current buffer object interface. Going further, texture data should be more easily manipulated too where the texture swizzle format can be queried (or even specified) so that one can stream texture data more directly.

But still, my suggestion is not all that great either. From the developer point of view, that means more code.

one sees that the usage enumerations and flags for mapping in D3D10 are far fewer than what is found in OpenGL

D3D provides 4 usage bits and 2 mapping type bits. ARB_buffer_storage provides 2 mapping type bits and 4 usage bits. Now yes, OpenGL does offer more valid combinations. But OpenGL is also exposing new functionality that D3D doesn’t: PERSISTENT and COHERENT allow you to map a buffer while it is in use. Which is always expressly forbidden in D3D. If you take those away, then the valid combinations have approximately equal descriptive power compared to D3D’s stuff.

Unless you were talking about the old usage fields.

I propose that each IHV makes a buffer object extension that exposes these subtleties and allows a developer to make choices and the driver just obeys.

Let’s ignore the obvious flaws in this idea (the large array of different memory types resulting in an explosion of extensions to cover them all, the fact that such information might be considered proprietary and therefore held secret, etc). Let’s just take this at face value.

What you seem to forget is this: there aren’t that many consoles. And with them, they don’t offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that’s about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.

Every generation of PC hardware would have its own extension. Within generations, there would be different extensions too. An HD5xxx with GPU memory would use a different architecture from an embedded HD5xxx chip. An HD7xxx chip would have to use a different memory architecture from the 5xxx chip, even though they still have the GPU paradigm.

Given all of this myriad of choices… what is the chance that the vast majority of game developers will always and consistently pick the right one for their usage pattern? For every piece of hardware? Do you expect most game developers to sit down and find the optimal memory arrangement for each one of their usage patterns, for every piece of hardware that exists?

Lastly, I would remind you that D3D works just fine under this paradigm. Is there some performance being lost? Possibly. But it’s a sacrifice, and it isn’t terribly much of one at the end of the day. Especially considering the alternative…

What you seem to forget is this: there aren’t that many consoles. And with them, they don’t offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that’s about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.

Sighs, let me make it more clear: I am talking about the next generation consoles that are to be in consumer’s hands this fall and winter: the PS4 and Xbox One. There you will find each has a unified memory architecture, but with differences on how that memory is handled. XBox One (I think) has a huge cache where as the PS4 has really fast main memory, GDDR5. However, depending on how it is allocated and such it can behave differently with respect to caching behavior, i.e. writes by CPU are immediately observed by GPU, and so on. On the PC now there are boxes with more unified memory magicks going on (namely AMD’s unified memory jazz it has made a big deal about which is what is driving the PS4 and I suspect also the XBox One).

Going over to mobile, effective memory management is really important. Currently, through GL, it is basically “hope for the best”. This usually translates to the data is static and not streamed from the CPU at all. So, even though memory is unified, the lack of an API prevents intelligent use of that feature. Moreover, good usage is going to be sensitive to the details of the memory architecture: nature of caches, etc.

What you seem to forget is this: there aren’t [i]that many consoles. And with them, they don’t offer [i]that many different choices. The 360 uses a completely unified memory architecture, so it offers [i]no choices at all (maybe cached vs. uncached memory accessing, but that’s about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.

Every generation of PC hardware would have its own extension. Within generations, there would be different extensions too. An HD5xxx with GPU memory would use a different architecture from an embedded HD5xxx chip. An HD7xxx chip would have to use a different memory architecture from the 5xxx chip, even though they still have the GPU paradigm.

Given all of this myriad of choices… what is the chance that the vast majority of game developers will always and consistently pick the right one for their usage pattern? For every piece of hardware? Do you expect most game developers to sit down and find the optimal memory arrangement for each one of their usage patterns, for every piece of hardware that exists?
[/i][/i][/i]

There are a few obvious bits and I freely admit I wrote too soon. Firstly, it need not be each IHV makes their own extension, but rather each memory model architecture would have it’s own extension. This though can quickly go to a generic interface which we already have. What is wanted is the ability to specify how and where the memory is allocated and how it is cached, etc. Right now all we have is the ability to provide hints and hope that an application’s behavior is recognized by a GL implementation. That is not engineering, that is crossing one’s fingers and hoping for the best. In an ideal world, a manual memory management extension suite would expose the issues: unified vs non-unified (the latter basically means all communication is though the PCI-bus then), where as the former then has further joys: nature of caching of GPU and CPU (or for that matter if the cache is some how shared) and on and on. It won’t be pretty to do, but if one wants control, then one needs to know what one is controlling.

It is absolutely true that a given IHV will then have different hardware have different memory management extensions, essentially one for each memory architecture. Yes, this sucks. D3D has an edge here because (I believe) Microsoft wrote the memory handler, not the IHV’s thus the same flags give the exact same behavior for gizmos with the same memory architectures across IHV’s.

Right now though, the current system is bad. We now have two different ways to specify the intent of using a buffer object, the new way and old way. However, all we are specifying is intent and in return given guarantees. We still have buffer object ouji board and we have that because the requirement that the same API works on all hardware means that we can never specify what is to do.

I am not advocating that an application must use such an extension suite, but I am advocating giving a developer an option. Improper memory management configuration can eat a massive hole though bandwidth and performance quite easily.

As for mobile, where this really is a huge big deal: I believe we will see GL4 (not just GLES3 and GL3) in the mobile space soon (atleast the next generation Tegra and more if NVIDIA has licensees of it’s TEGRA GPU magicks to other SoC folks). Memory management is a big deal and it will get worse quite quickly soon I think. As an example, the sparse texture jazz is to some degree about an application performing limited manual memory management. This will get worse; the want for GPU and CPU to use each others data is going to become a bigger and bigger issue and GL right now is still trying to get by with the client-server model (which is fine for non-unified memory situations) but loses so much capability in the unified land.

It might be my idea is nuts, but I kindly suggest that when we discuss this to try to brainstorm ideas on how to make this memory management issue better rather than just shooting down others. This is a dead serious issue, the hardware has capabilities not at all exposed by the API and these capabilities are a really big deal.

Is it possible to clarify the wording in 4.4 core spec, section 8.21, page 251:

  	 				 					For texture types that do not have certain dimensions, this command treats those dimensions as having a size of 1. For example, to clear a portion of a two- dimensional texture, use [i]zoffset [/i]equal to zero and [i]depth [/i]equal to one. 

  			[i]format [/i]and [i]type [/i]specify the format and type of the source data and are inter- preted as they are for TexImage3D, as described in section 8.4.4. Textures with a base internal format of DEPTH_COMPONENT, STENCIL_INDEX, DEPTH_STENCIL require depth component, stencil, or depth/stencil component data respectively. Textures with other base internal formats require RGBA formats. Textures with in- teger internal formats (see table 8.12) require integer data. 

These paragraphs appear conflicting to me: the first implies that you can pass GL_TEXTURE_2D for <type>, but the second states that <type> is interpreted according to TexImage3D which doesn’t support GL_TEXTURE_2D.

What is the valid behavior here?

OpenGL is one of the most used library.
So please, design your API better.

Stroustrup talks about what I mean in this video (20mn39sec)
see channel9 : GoingNative-2012 : Keynote-Bjarne-Stroustrup-Cpp11-Style