What you seem to forget is this: there aren’t that many consoles. And with them, they don’t offer that many different choices. The 360 uses a completely unified memory architecture, so it offers no choices at all (maybe cached vs. uncached memory accessing, but that’s about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.
Sighs, let me make it more clear: I am talking about the next generation consoles that are to be in consumer’s hands this fall and winter: the PS4 and Xbox One. There you will find each has a unified memory architecture, but with differences on how that memory is handled. XBox One (I think) has a huge cache where as the PS4 has really fast main memory, GDDR5. However, depending on how it is allocated and such it can behave differently with respect to caching behavior, i.e. writes by CPU are immediately observed by GPU, and so on. On the PC now there are boxes with more unified memory magicks going on (namely AMD’s unified memory jazz it has made a big deal about which is what is driving the PS4 and I suspect also the XBox One).
Going over to mobile, effective memory management is really important. Currently, through GL, it is basically “hope for the best”. This usually translates to the data is static and not streamed from the CPU at all. So, even though memory is unified, the lack of an API prevents intelligent use of that feature. Moreover, good usage is going to be sensitive to the details of the memory architecture: nature of caches, etc.
What you seem to forget is this: there aren’t [i]that many consoles. And with them, they don’t offer [i]that many different choices. The 360 uses a completely unified memory architecture, so it offers [i]no choices at all (maybe cached vs. uncached memory accessing, but that’s about it). The PS3 has a split memory architecture, so there are two choices. If one way is too slow, you try the other way and it works. And it works on every PS3 that has existed and ever will exist.
Every generation of PC hardware would have its own extension. Within generations, there would be different extensions too. An HD5xxx with GPU memory would use a different architecture from an embedded HD5xxx chip. An HD7xxx chip would have to use a different memory architecture from the 5xxx chip, even though they still have the GPU paradigm.
Given all of this myriad of choices… what is the chance that the vast majority of game developers will always and consistently pick the right one for their usage pattern? For every piece of hardware? Do you expect most game developers to sit down and find the optimal memory arrangement for each one of their usage patterns, for every piece of hardware that exists?
[/i][/i][/i]
There are a few obvious bits and I freely admit I wrote too soon. Firstly, it need not be each IHV makes their own extension, but rather each memory model architecture would have it’s own extension. This though can quickly go to a generic interface which we already have. What is wanted is the ability to specify how and where the memory is allocated and how it is cached, etc. Right now all we have is the ability to provide hints and hope that an application’s behavior is recognized by a GL implementation. That is not engineering, that is crossing one’s fingers and hoping for the best. In an ideal world, a manual memory management extension suite would expose the issues: unified vs non-unified (the latter basically means all communication is though the PCI-bus then), where as the former then has further joys: nature of caching of GPU and CPU (or for that matter if the cache is some how shared) and on and on. It won’t be pretty to do, but if one wants control, then one needs to know what one is controlling.
It is absolutely true that a given IHV will then have different hardware have different memory management extensions, essentially one for each memory architecture. Yes, this sucks. D3D has an edge here because (I believe) Microsoft wrote the memory handler, not the IHV’s thus the same flags give the exact same behavior for gizmos with the same memory architectures across IHV’s.
Right now though, the current system is bad. We now have two different ways to specify the intent of using a buffer object, the new way and old way. However, all we are specifying is intent and in return given guarantees. We still have buffer object ouji board and we have that because the requirement that the same API works on all hardware means that we can never specify what is to do.
I am not advocating that an application must use such an extension suite, but I am advocating giving a developer an option. Improper memory management configuration can eat a massive hole though bandwidth and performance quite easily.
As for mobile, where this really is a huge big deal: I believe we will see GL4 (not just GLES3 and GL3) in the mobile space soon (atleast the next generation Tegra and more if NVIDIA has licensees of it’s TEGRA GPU magicks to other SoC folks). Memory management is a big deal and it will get worse quite quickly soon I think. As an example, the sparse texture jazz is to some degree about an application performing limited manual memory management. This will get worse; the want for GPU and CPU to use each others data is going to become a bigger and bigger issue and GL right now is still trying to get by with the client-server model (which is fine for non-unified memory situations) but loses so much capability in the unified land.
It might be my idea is nuts, but I kindly suggest that when we discuss this to try to brainstorm ideas on how to make this memory management issue better rather than just shooting down others. This is a dead serious issue, the hardware has capabilities not at all exposed by the API and these capabilities are a really big deal.