pBuffers vs renderbuffers (FBO) - memory usage

Would I be right in thinking that pbuffers & renderbuffers both exist in ONLY vidmem?

What I would like to do is utilise video memory, but WITHOUT affecting system ram. Is this possible?

An ideal solution would be volatile textures (which have no system ram counterpart), but this is not possible right now - all textures reside in both system & video memory. Are you listening ARB?

However, it occured to me that pBuffers & FBO renderbuffers (according to the spec) exist in pbuffer/framebuffer memory, and in the case of pbuffers ARE volatile in that they can be lost with a mode change. This implies that there is no copy kept in system memory.

Is my thinking above correct? Or am I way off the mark.

What I’m doing is basically a colour managed image editing app, which supports dual monitors. There will be an original image in memory, a serious of adjustments, and a final image. The final image data then needs to be converted into each monitor colour space before being displayed. Using GDI is simple - just convert on the fly when blitting to the screen. However, it would be ideal if I could use OpenGL to store the converted image data for each monitor - but without impacting system ram. I could then also move the colour management to fragment shaders at a later stage.

Any thoughts or comments would be much appreciated.

Regards
Mark

You can never really be sure it does not affect system ram, and openGL provides no apparent way to do this (though i don’t know about openGL 3.0), it may be the case that it doesn’t when using render buffers and possibly p-buffers, though probably not texture buffers.
Either way it’s always best to assume it does and work around that.
You could always write a test app.

OSes these days have VRAM virtualized. Therefore, the drivers have to have a place to pageoff the pBuffer or renderbuffer when there is enough VRAM pressure to force a pageoff. The drivers may implement this in different ways where one may pre-allocate the backing store or one where this allocation is deferred until the pageoff is forced. Depending upon the design of the driver, objects may be pre-allocated if the backing store is going to be mapped into the task’s address space. If the object has no case where it would benefit or be possible to be directly accessible by the task, then that object may have the backing store allocation deferred until pageoff. By having the backing store, one can also potentially get the fallback case of allocating the object in GART because VRAM is too full to allocate anything else. So, with modern PCs and OSes, there is almost no possibility for objects to ONLY exist in VRAM. Many clients may be using the GPU besides your GL app, so the driver has to be able to pageoff your objects to make room for the objects needed by Quake 4, modo, a window manager or whatever else.

We’re listening.

OSes these days have VRAM virtualized.

Um, which OS’s? Certainly not WinXP. So relying on VRAM virtualization is likely not practical for the near future (3 years or so).

We’re listening.

Apple providing an extension doesn’t quite amount to the ARB listening. However, last information on GL 3 suggested that they would offer a parameter for various GL objects that would allow you to select whether the object has a backing store or not.

However, last information on GL 3 suggested that they would offer a parameter for various GL objects that would allow you to select whether the object has a backing store or not.

I really hope so. I’ve been treading water, so to speak, with a commercial app - delaying it hoping we’d have GL3 specs (and sample drivers) by now. It’d very simple to specify GL3 for the optimum user experience, as opposed to GL2.1, or GL2 + extension, or GL1.x + this, that and the other. It’s very difficult for a consumer to understand.

I agree about virtualising video ram too - vista is yet to gain critical mass, and even if it does you can switch off dwm (at least in fullscreen mode).

My real desire for video memory only, though, is that in many cases there are machines out there running vista with 1gb of ram - vista takes over 600mb. These machines often have 512mb of video ram which is difficult to access without using the pagefile for something.

arekkusu… you work for apple then?

you can switch off dwm

I doubt that this will turn off VRAM virtualization. That’s a driver-level thing, and I don’t think they force IHVs to write several versions of their drivers.

OS X has had it since pretty much day one. Microsoft now has it as well in Vista.

I don’t see how they are going to keep a backup in RAM for FBOs without effecting performance. I’m pretty certain there is no backup in RAM, if the FBO is lost, the driver can recreate it but the pixels would either be garbage or some specific values.

Probably, the only thing they keep in RAM is that “some FBO exists with so and so attachment” which probably consumes 100 bytes of RAM per FBO.

Well, my ‘guess’ is that any textures that are attached ARE stored in RAM (at least RAM is allocated to hold them if the driver needs to do some swapping), but renderbuffers (colour buffers) are not. This would tie in with pbuffers which are volatile in nature.

I think it boils down to the conceptual difference between framebuffer memory & texture memory. Textures are swappable, framebuffers aren’t. Unless you virturalise VRAM…

However, all this is pure speculation on my part, and I may be well off the mark here.

As an aside, up above I mentioned turning off dwm. By doing this you ease pressure on VRAM - it’s far less likely that vista will force the graphics driver to swap out MY image data for the OS’s. And it’s not like I expect someone to be playing quake while running my app, which is a photo management/editing package.

Mark

Just to clarify what I WANT to do. For example, part off my app allows the user to browse photo thumbnails, up to 252*168 in size. This would take up ~165kb - multiply this by a thousand and that’s a lot of memory, and system RAM will be precious. By using a volatile texture these thumbs would only impact VRAM, and take negligable CPU resources to display. Now, we know this isn’t possible (yet?), but if renderbuffers are only in VRAM, I can DrawPixels into the backbuffer etc. The CPU will be doing a lot of other work, so anything I can offload will result in a significant performance increase overall.

Yes.

No, but we have to start somewhere. Remember the traditional path to get features promoted to the core?

On an OS with properly virtualized VRAM:

FBO attachments are just textures. They have to be virtualized like any other resource. Renderbuffers can be implemented as just another texture, albeit perhaps with different storage layout depending on the POT/NPOT stride/twiddle/mipmap capabilities of the renderer.

Surfaces in general (window, pbuffer, renderbuffer) need to be virtualized. If you dirty a surface and another process wants to use all available VRAM during its timeslice, you need to evict, then restore when the first processes returns. If your performance tanks due to surface paging, your system VRAM is overcommited for the (set of) application(s) running on it.

“Purgeable” resources allows an app the option to declare “don’t page off my dirty surface, I’ll recreate it as needed.”

arekkusu… Thanks for that. I only wish there was an ARB version - whether or not VRAM is virtualised (vista vs XP), it’s the perfect solution. So I’ll say again… are you listening ARB? :slight_smile:

One more question. When you do page out VRAM, does it go into pre-allocated RAM, or does the OS allocate on demand? I’d guess that vista would be similar.

Sorry for asking seemingly windows specific questions, but they’re not really - I’m just after some insight into driver memory management wrt fbo renderbuffers/pbuffers.

Many thanks
Mark

It all depends on the type of object as to whether it is paged off to preallocated RAM. Mark, you say that you do not expect somebody to be playing a game while running with your app but the driver cannot be sure of this. What happens when the computer is put to sleep? Some things are turned off and quite possibly one of these things is your video card. So, the driver will need to page off EVERYTHING so that when you open the laptop, everything pops back up the way it was. It pages everything back on and voila. For the app you are talking about, why could you not have a preallocated number of textures, enough to fill up the screen once or twice over, then TexSubImage into those as new images come into view that are not already in a texture?

The game Enemy Territory Quake Wars uses this same concept for its whole Mega Texture system. Textures are allocated up front and then as a new tile comes into use, a texture from the free list is given to that tile. The tile’s data is then TexSubImage’d into the corresponding texture. As long as the video card is not facing too much VRAM pressure, paging will not occur. You’re system memory usage is then also bounded by the number of textures that are preallocated. Just be sure to stay on the fast path for recreating these textures so that any kind of orphaning does not come into play.

The one downside to the Apple extension is that it requires a flush every time you want to unpurge any object. This is again due to the nature of virtualized vram and nothing being definite until the kernel actually processes the commands.

Yes.

No, but we have to start somewhere. Remember the traditional path to get features promoted to the core?

On an OS with properly virtualized VRAM:

FBO attachments are just textures. They have to be virtualized like any other resource. Renderbuffers can be implemented as just another texture, albeit perhaps with different storage layout depending on the POT/NPOT stride/twiddle/mipmap capabilities of the renderer.

Surfaces in general (window, pbuffer, renderbuffer) need to be virtualized. If you dirty a surface and another process wants to use all available VRAM during its timeslice, you need to evict, then restore when the first processes returns. If your performance tanks due to surface paging, your system VRAM is overcommited for the (set of) application(s) running on it.

“Purgeable” resources allows an app the option to declare “don’t page off my dirty surface, I’ll recreate it as needed.”

[/QUOTE]

I thought virtualization meant that not all of the mipmap chain needs to be in VRAM in order to sample from some mipmap level. So, I don’t see the relevance with backing up a FBO to RAM.
If for every frame you render to your FBO, then you unbind it, and the driver has to copy the texture and the mipmaps back to RAM, this sounds like a performance loss.

I suppose it is the same issue with dynamic VBOs.
glMapBuffer is not likely giving you a address to VRAM.
When you unmap the buffer, the driver will later on copy to VRAM.

I thought virtualization meant that not all of the mipmap chain needs to be in VRAM in order to sample from some mipmap level. So, I don’t see the relevance with backing up a FBO to RAM.

The reason that you need a backing store for images and VBOs is because video memory is (on pre-Vista Windows OS’s) volatile. That is, if your application no longer has input focus, you are given no guarantee that your stuff is still there.

Virtualizing VRAM means that you are guaranteed that your stuff will be there. So there’s no need to keep a copy around in main memory.

Virtualizing VRAM means that you are guaranteed that your stuff will be there. So there’s no need to keep a copy around in main memory.

Question aside: Is there any hardware yet that offers truly virtualized vram? Hasn’t this been a promise of DX10… I don’t see it realized yet :frowning:

Is there any hardware yet that offers truly virtualized vram?

What do you mean by “truly virtualized”? The kind of virtualization under discussion is an OS/Driver-level thing; it’s got nothing to do with the hardware.

I meant the hardware solution: The on-chip VRAM is just seen as (small) cache while the application sees a virtual VRAM of say 16GB. The hardware transparently manages the finegrained upload of missing pages (that have a small size, like 4-64kb or so), either from system ram or disk.

It would make software-side “virtual texturing” like Id’s megatextures (and the sucessor system in Rage) much easier to implement, if not superfluous.