Memory management

We have some vendor extensions but we need an ARB memory management framework. I want to be able to query my framebuffer objects to work out how much VRAM they take. Infant it would be nice to give gl a object handle ( vbo if, buffer object I’d, ect) and gl should tell me how much VRAM is used as the storage.

OK, so how do you define “VRAM”? As things like Llano, Sandy Bridge, and the like start coming online, you’re going to see more and more decent GPUs that share memory with the CPU. So how exactly do you decide how much “VRAM” a texture takes up that isn’t stored in VRAM?

Also, any query you make about VRAM usage is immediately out of date with a multitasking OS.

VRAM is shared by all apps, the window server, etc. Your objects could be evicted at any moment.

Yes, there are all sorts of reasons why queries of GPU mem consumption/availability are not perfect/right/true/divine/accurate for at least N milliseconds, etc…

But let’s back up a step and answer the root question being asked here. How do folks reasonably prevent their apps and games from stuttering and slogging like crap (or worse, being killed outright by the OS) when they are about to run out of memory? Or (more proactively) how do they adjust how their game functions up-front so that the game performs well in the face of varying amounts of GPU memory?

The whole Fusion/Sandy Bridge/IGP point doesn’t hold water. If all-the-sudden your GPU memory pool is all of CPU memory, then it can just return how much CPU memory is available.

The whole multiapp thing doesn’t either. It’s basically assumed when you are running a GPU-intensive game that you are not running 2 such games at the same time. If you do, then yes, it might just slog royally.

So please, let’s address the root question, not just hand-wave GL app memory allocation as too hard or irrelevant.

It should be possible to return a size of a GL memory object and where the object resides, be it either VRAM or GTT* or both. This feature looks like something that can be added to a driver very easily, since all drivers know the sizes of all objects anyway.

The fact that a GL memory object can be evicted at any moment doesn’t matter. We are only interested in the memory stats at a given time, so I don’t see a reason this functionality cannot be turned into a new GL extension.

  • GTT is system RAM directly accessible to a GPU, being part of GPU address space. GPUs can use GTT (or CPU RAM, as you say) since ever. Old hardware like Radeon 9550 had 64MB VRAM and 64MB GTT. Todays hardware usually has about 512MB in addition to VRAM.

Here’s a question.

Since you feed all the data to OpenGL, since you create all of these objects, then you must at the very least know how much room you asked for.

Obviously, pure state objects (FBOs, VAOs) likely don’t take up GPU memory (and if they do, it’s negligible). And for tiny objects (programs, etc), this is pretty meaningless; it’s unlikely that your shader data will be taking up that much room.

If you call glBufferData(… 64000 …), you asked for 64,000 bytes of data. Under what conditions would you expect the hypothetical “query size of object” function to return a number substantially different from 64,000? Sure, it’ll be bigger, but it will also be entirely proportionate to 64,000. The absolute largest discrepancy you might get is rounding to the nearest 16KB, but more likely it rounds to the nearest page size (4KB).

So this feature would be completely useless for buffer objects, since you already know how much buffer object memory you’re using.

Something similar happens for textures (and renderbuffers). You know how big of a texture you asked for. You know how many mipmaps you asked for. Therefore, you already have a reasonable approximation of how much memory they take.

The only case where this isn’t true for textures is if OpenGL decided to give you a different internal format from the one you asked for. Like you asked for GL_RGBA4 and it gives you GL_RGBA8 for whatever reason. This is certainly a possibility.

But, if I recall correctly, this is something you can test for. You can query exactly how many bits each texture takes up, by using GetTexLevelParameter with GL_TEXTURE_RED/GREEN/BLUE/ALPHA_SIZE.

Sure, there will be a discrepancy, likely along page boundaries just like for buffer objects. But this gives you a pretty reasonable estimate on the size that everything takes, does it not? So why is this request necessary to estimate the usage size of GPU memory?

More important than that, usage doesn’t really mean anything unless you also know the capacity. Which is not something this proposal asks for. And if capacity includes “GTT memory”, then it wouldn’t be very helpful. Because you can blow past VRAM in terms of the size of data you’re actually using, but still be below the overall capacity you have available to you.

At the very least, if you’re going to propose a memory management solution, make the proposal comprehensive, not something as poorly thought out as “I want to ask for the size of objects, even though I already know their size and can’t really use that information anyway.”

The fact that a GL memory object can be evicted at any moment doesn’t matter. We are only interested in the memory stats at a given time, so I don’t see a reason this functionality cannot be turned into a new GL extension.

It does, to varying degrees. If the function has to lock the GPU thread in order to properly execute, it could have highly unfortunate effects on performance. At the very least, it isn’t something you want to do per-frame.

Also, what would happen if you call it for each object that you’ve created? If the system is rebalancing what’s in VRAM at the time (possibly due to rendering that’s still going on), then by the time you’re done, your idea of what is taking up memory and what isn’t is completely wrong.

You are right about buffer objects, but you are not right about textures and renderbuffers. The alignment rules are much more complicated there and these are different for each generation of GPUs. With some texture layouts, the alignment is either 32x1 (i.e. the image is made up of blocks of this size), or 32x16, or even 256x8 (this is an extreme case for some hardware and one-channel 8-bit textures with specific tiling configuration). For POT textures, you can usually estimate the size precisely if they are not too small. For NPOT textures, you can only guess the alignment rules. Now for it not to be so simple, there are additional rules based on the type of textures. On some hardware, if NPOT textures are mipmapped, each mipmap is aligned to POT in memory. It doesn’t end there. Another example is cubemaps on current ATI GPUs. If the cubemaps are mipmapped, all mipmaps except for the first one are aligned to 8 faces in memory. Now who would have guessed that? Estimating the size of a texture in memory is a non-trivial job.

Of course, a comprehensive proposal would be more useful, but I am just saying what is possible and what is not. And querying the size of a texture in memory is totally feasible. Querying the current memory domain (VRAM or GTT) is not so obvious, see below. However, if we had to introduce the concept of VRAM and GTT to OpenGL, it would break the current abstraction (i.e. renderbuffer, texture, and buffer domains, like in ATI_meminfo), but it’s not that the current abstraction in OpenGL is any useful nowadays.

Sizes of buffers/textures are constant over their lifetime, so there is no locking needed, because it is not racy. Querying the domain (VRAM or GTT) is indeed racy as it changes over time. However, if you watch the domains in real-time, meaning that you query them every frame, it doesn’t matter for your eye whether you see a change in the current frame or the following one. It’s both useless if you get a wrong answer or you get the right answer but the domain is changed right after your query. Since knowing the domain is not damn important for your application to work and it might change pretty much anytime, there is no need for locking either. You wouldn’t see a difference in practice anyway.

What we cannot be sure about is whether OpenGL drivers know where their buffers and textures reside. This information might be abstracted away from the userspace 3D driver and only the in-kernel GPU memory manager might know that.

Just would like to put my 2 cents here too: I think Alfonse’s argument is total pants as you need to know how much total VRAM is available to begin with, but the “fun” does not stop there, as segmentation is an issue too… at any rate NVIDIA and (I think) AMD do provide an extension to query the amount of memory “available”, the NVIDIA one is: GL_NVX_gpu_memory_info.

Additional issues: hardware video decode, that is bound to take some memory… and one can mix hardware video decode with GL (there is GL_NV_vdpau_interop ) and bits over in EGL/GLES land too allow for decoding to a hardware managed resource that can be shared with GL. Also along those issues, mixing OpenCL with OpenGL (or for that matter CUDA with GL). Additionally, if an application is windowed, then you can bet the OS and window manager are taking some resources too. Simple point being, the driver is likely to be able to give a more accurate (though potentially coarse answer) to the issue… after all one does not want the exact number of bytes available, but rather some rough numbers to decide what detail level resources to use. The idea of asking a user “How much VRAM does your dedicated card have?” and attempting to guess how much one is using (directly and indirectly) is absurd.

The only point of defense for Alfonse, and this is paper-thin: the GL API originally was so that this kind of issue was to be hidden, but even that mentality was kind of dropped a long, long time ago: glAreTexturesResident and glPrioritizeTextures (both disappeared from GL3 core though).

I could have sworn there was an AMD/ATI one, but I cannot remember/find it… was I wrong?

You are right about buffer objects, but you are not right about textures and renderbuffers. […] Estimating the size of a texture in memory is a non-trivial job.

How off would your estimates be in these cases? Would they be off by more than, say, 5%? What’s the worst-case?

Another question that is quickly raised by this is: what do you do if one platform is bigger than another?

If you’re using padded estimates (my suggestion), and you’ve built a particular budget of data sizes, the estimate will be platform neutral. That is, they’d be sufficiently padded so that it would fit in all platforms equally.

However, if you’re doing as you suggest, which is getting 100% accurate, platform specific numbers, how do you deal with determining a particular budget for your data size? Is it just a guess-and-check method, where you test everything on various different platforms and see if it fits? Or is it some kind of dynamic solution, where you avoid doing something if you go past a certain memory footprint?

http://www.opengl.org/registry/specs/ATI/meminfo.txt

If you mostly use power-of-two textures, the estimates should be nearly equal to what is really allocated by a driver. NPOT and RECT textures are the problem. Their size might be off by more than 50% for small textures and more than 5% for large ones, with varying numbers in between. For a driver, it’s usually beneficial to sacrifice some memory in exchange for better performance due to a cache-friendly tiling scheme with a not-so-nice padding. And anything cache-friendly saves bandwidth. I am nearly sure all drivers do this, because performance is what matters the most.

I don’t understand the question. There is always GTT in case your data doesn’t fit in VRAM.

I don’t understand the question. There is always GTT in case your data doesn’t fit in VRAM.

The ultimate purpose of all of this is, as Dark Photon said, “How do folks reasonably prevent their apps and games from stuttering and slogging like crap (or worse, being killed outright by the OS) when they are about to run out of memory?”

If we assume that running out of VRAM is where the “stuttering and slogging like crap” comes from, then it is important to ask how you intend to use this information to correct the problem. Unless you have a dynamic solution, platform-specific texture sizes are of little value, because they’re platform specific.

I don’t think reading textures or vertex buffers from RAM causes any “stuttering and slogging like crap”. It just consumes PCIe bandwidth and fluently decreases framerate provided a GPU cannot hide latencies, no big deal. You’d be surprised how often GTT is used in practice. It is sometimes faster not to copy data which changes every draw operation to VRAM at all.

The rendering may stutter if a driver decides, let’s say, to optimize shaders at runtime or do some other crazy stuff.

I don’t think reading textures or vertex buffers from RAM causes any “stuttering and slogging like crap”. It just consumes PCIe bandwidth and fluently decreases framerate provided a GPU cannot hide latencies, no big deal. You’d be surprised how often GTT is used in practice. It is sometimes faster not to copy data which changes every draw operation to VRAM at all.

I’m not sure what “fluently decreases framerate” means, but that sounds suspiciously like dropping frames. Which is something that you as a developer of a high performance application do not want to have happen.

However, if what you have said is entirely true, if you can effectively allocate and use GPU objects without significant performance issues up to the point where OpenGL forcibly prevents you from allocating more… what is the point of any of this? Why do you care, for example, how much padding the GPU adds to a texture? How is that going to affect your program in any way?

In short, if you don’t think fitting all of your currently used assets into VRAM is important for performance, then why do you need to know how big they are?

I care about performance as well as you. Knowing whether my textures are in RAM or VRAM and how large they really are is important for performance analysis on one hardware-driver combo. It’s purely for debugging purposes. Using RAM for texturing doesn’t stutter, it’s just a little bit slower.

It’s purely for debugging purposes.

That’s really the question.

Theoretically, your artists are working with a particular budget of texture and vertex count. If you’re optimizing rendering, and you find that this budget doesn’t work on one platform vs. another, isn’t it too late to fix things then? Lowering the budget would require the artists to make some hard choices.

Wouldn’t it make more sense to set more conservative budgets from the beginning, so as to ensure that you simply will never run into this problem to begin with?

It’s generally easier to increase a budget than to shrink it.

I originally posted the suggestion because I found my engine was unreliable when tracking memory usage. Tracking the allocation of every asset and framebuffer is non trivial and prone to error. For example when creating a texture it’s fairly easy to estimate VRAM for a pot texture during initialisation. But what happens during runtime if Mipmaps are generated. This won’t be tracked. Also various texture compression schemes are non trivial to get accurate information on (size). Also tricky ones would be multisample textures and renderbuffers with depth or stencil of various types.
What I thought I’d like is the ability to pass GL the object ID and get gl to lookup the VRAM allocation for it. Yes my request is rudimentary but I’d like something simple to aid in the debugging process so that my engine has a clean and simple way of querying every object to get memory sttaistics.

I forgot to also mention that a texture may be requested as RGB but internally the driver may actually create a RGBA texture ( nVidia does this). The means that my attempts to track memory usage are doomed to failure and could be 25% out in the worst case. That’s quite some margin of error!

For example when creating a texture it’s fairly easy to estimate VRAM for a pot texture during initialisation. But what happens during runtime if Mipmaps are generated. This won’t be tracked.

By the OpenGL specification, what happens is exactly what would happen if you had uploaded those mipmaps yourself. Since you are the one who called “glGenerateMipmaps” and the spec is very clear on what mipmaps get generated from this call, there’s no ambiguity.

Also tricky ones would be multisample textures and renderbuffers with depth or stencil of various types.

While multisample buffers do live inside that bubble of “implementations do whatever” that multisampling lives under in OpenGL, the sides of depth and depth/stencil formats are not exactly hidden from you. If you get a DEPTH24 buffer, there are pretty much two options: 24bpp, or 32bpp. And 32 is more likely for alignment reasons. If you get DEPTH24_STENCIL8, there isn’t even a question.

I forgot to also mention that a texture may be requested as RGB but internally the driver may actually create a RGBA texture ( nVidia does this). The means that my attempts to track memory usage are doomed to failure and could be 25% out in the worst case. That’s quite some margin of error!

You can query what the actual internal format of the texture is.

Furthermore, let’s say that NVIDIA does this and ATI does not for some particular RGB/RGBA format. What exactly do you plan to do about it if you detect it?

That’s what I don’t understand about this “debugging” thing. You’re talking about platform-dependent “bugs.” The only “fix” is to establish a more conservative budget. And a more conservative budget is something you can establish without knowing exact numbers. Indeed, if you ever run into these “bugs,” correcting them is going to highly annoy your artists. If they’ve been working within a budget you gave them, they’re not going to be happy about having to do a lot of work because you gave them a bad budget.

No hardware supports GL_RGB8 to my knowledge. The same for DEPTH_COMPONENT24 (which is D24X8). Some (or all?) DX9 hardware does not even support GL_RGB16, GL_RGB16F, and GL_RGB32F.