Texture Memory Usage

Hey there,

I wondered if anyone would know, how is texture memory is calculated when generating a texture? I have a limited amount of memory available and not all of it can be for textures. I figured when a texture was created it would just be the number of pixels multiplied the bytes per pixel. Does it make sense that a texture 1024 x 1024 that is RGBA is not (1024x1024x4) bytes in memory?

The reason for all this is I am generating images from a much large image on disk. Two threads are running and one chips the image into RAM (just a big matrix) and the other renders my chips into textures. To show a smooth movement, my Textures are larger than the display so while opengl shifts my vertices my image chipper is making the next image in RAM. When OpenGL reaches a limit, the image is turned into a texture and we move on.

That’s the scenario, in case anyone wants to suggest something other than textures for this image movement. It seemed much better than drawing the image each time.


I wondered if anyone would know, how is texture memory is calculated when generating a texture?

There is no way to be sure. It’s all implementation-defined. You can guesstimate it, but that’s the best you can do.

In general, it’s:

#pixels * bpp * 1.33 if mipmapped

What Korval said. Also…

Re your mention of “limited amount of memory available” and “RGBA8” (which is HUGE!: 4 bytes/texel!), sounds like you’re in need of some serious space savings.

How does 0.5 or 1 byte/texel sound? If I were you I’d explore using DXT compressed formats (aka S3TC). Read up on DXT1, DXT3, DXT5, LATC1, and LATC2. If you need more than 0%/100% alpha, then look at DXT5; else look at DXT1.

If you end up using one, you can query the application buffer size required to hold it (which should be pretty close to what it consumes in GPU memory) from the driver after you create it:

glGetTexLevelParameteriv( bind_target, level, 

However, you’re gonna basically know the answer after you read up on these formats (DXT1/LATC1=0.5 byte/texel, DXT3/DXT5/LATC2 = 1 byte/texel). This is exact for resolutions that are multiples of 4, and a slight underestimate for those that aren’t.

If you don’t use a compressed format and don’t really need the alpha, at least consider RGB5 (2 bytes/texel – aka RGB565) for 50% space savings, or RGB8 (3 bytes/texel) for 25% savings on NVidia GPUs more recent than NV40/GeForce 6800 (earlier cards actually used RGBA8 internally for RGB8 textures – see this link – a little old though very useful for estimating GPU/driver texture memory consumption for GeForce 6 and earlier cards – wish they’d update it).

Some starter links:

Wow, now that is interesting. I will certianly read up on those formats, perhaps compression is an option. My little application is running over many images on disks, so some are layered on top of each other (and combining them all into one texture is not an option ; )

Not sure if RGBA is different than RGBA8, but we’re just using RGBA. Very simple images, it’s the other stuff that’s complex! : )

A major factor would be how fast an image can be turned into a texture. Right now there is a little studder when moving, and it happens right when our image (in RAM) is turned into a texture. It would seem to me that compressing an image and then creating the texture, even if the texture is less memory, would take longer.

Thanks for the ideas and I’d gladly welcome more. Applications are more familiar than this OpenGL stuff : )


There’s a studder because your image data needs to be transferred over the AGP,PCI, or PCI express bus when downloading the image data from client side to server side. When using compressed textures there’s a performance penalty caused by the actual compression algorithm. You could turn your image data into compressed textures and then download the compressed image data to system RAM so that the next time you run the application you can upload the compressed image data which will increase performance because less data needs to be transferred over the bus.


Or use a offline compressor to compress the images on disc: it is slow, but provides very good quality. This oe it good: http://developer.nvidia.com/object/texture_tools.html

If you specify GL_RGBA for the internal format parameter to glTexImage?D (3rd param), I think you get the driver default which I believe is GL_RGBA8 – which is 8-bits for each channel, 4 bytes/texel total.

A major factor would be how fast an image can be turned into a texture.

Unless you’re dynamically generating the texel values, you wouldn’t do this at run-time. You’d do this in a tool, and then you’d read the compressed texture off disk and upload it to the GPU directly in compressed format. Then compression speed becomes less relevant because it’s not happening at run-time. The DDS format is one common format used to store compressed texture on disk.

For off-line compression, Simon Brown’s Squish works great. If you need even more speed, check out Real-time DXT Compression. If you need even more quality than RGB DXT can support, check out Real-time YCoCg-DXT Compression.

Right now there is a little studder when moving, and it happens right when our image (in RAM) is turned into a texture.

Yep, as -NiCo- said, this is the driver shuffling the texture from driver CPU memory into GPU memory for the first time. If you pre-compress your textures off-line to DXT1, making your textures 8X smaller, that’s 8X less data for the driver to push over the PCIx bus, so much less likely to break frame.

In practice to avoid this kind of frame breakage due to behind-the-scenes driver texture shuffling when areas with lots of texture come into view the first time, you sometimes need to pre-render with your textures before they come into view to force the driver to get off its duff and push your textures over to the GPU. It’s not enough to hand the texels to OpenGL. You must force the driver to render with it to force the GPU upload.

Another factor that could be contributing to your skippage is if you’re handing the data to the driver at run-time in a format which is non-optimal for the GPU. For instance, if you hand an NVidia driver RGB8/RGBA8 data in an RGB/RGBA external format, it’s gonna have to swizzle it to BGR/BGRA format which reduces your effective GPU upload bandwidth.

It would seem to me that compressing an image and then creating the texture, even if the texture is less memory, would take longer.

Oh yeah, you don’t want to be DXT compressing an image at run-time if you can help it, especially if the image could have been pre-compressed and stored on disk. Pre-compress them. Besides elimating run-time compression overhead, with pre-compressed textures you eat less bandwidth pulling them off the disk and pushing them across the PCIx bus, and you get better compression quality because you can spend more time on it when it’s off-line.

Oh man, this sounds harder to optimize than I thought!

Good call on the CPU to GPU transfer, that makes sense. I was thinking it was a simple exchange, but moving all those layers at once probably just takes some crunching to finish.

Won’t be able to compress anything before hand, it all has to be at run time. That has been the cause of many woes before, but I’m dealing ; ) Is there such a compression that is less optimal but not such a runtime hit? Even if it’s not 4bytes/texel to 1byte/texel, any gain might be worth trying.

glTexImage2D is my call. I’m using a type for each pixel that is not quite an RGBA type. It’s a little structure that holds 4 byte values (unsigned chars), though technically it’s a just a chunk of memory, that “works” when I put a chunk into the texture call. Could converting to OpenGL’s type really take that much time/processing? I do need the alpha, at least for all but the bottom layer.

Thanks for all the tips, guys. (Got any more? ; )


Packing to RGB565 is cheap and easy. Conceptually, take the high 5 bits of red, high 6 bits of green, high 5 bits of blue, bit-shift them all together into a single unsigned short, and store. Then upload to OpenGL using a GL_RGB5 internal format and GL_RGB/GL_UNSIGNED_SHORT_5_6_5 external format/type. No alpha, but you get RGB in 2 bytes.

Allegedly there’s also an RGB5_A1 and an RGBA4, which are pretty much the same but steal bits from the color to give to an alpha channel. So that’s a simple encode too. And 2 bytes/texel.

Short of that, consider DXT1 or DXT5. Don’t re-invent the wheel; just pick up van Waveren and Castano’s work here and here for starters, and use their code. Based on their stats (see Results sections), you should be able to compress a 1024x1024 texture in ~0.5-1.0ms on the CPU for DXT1 or ~0.7-1.5ms for DXT5 (add 33% to that for MIPmaps). If that’s not fast enough, you can allegedly get a pure compression speed-up on the GPU, but probably not worth it as you pay a hefty fine in PCIx upload time (several ms per megatexel).

Could converting to OpenGL’s type really take that much time/processing? I do need the alpha, at least for all but the bottom layer.

Well RGB565 is basically the same form as your base RGB8 base layer data: discrete texels. Just throw away a few bits of RGB precision, chuck the alpha you’re not using, and you’re there.

Ditto for RGB5_A1 and RGBA4, except don’t chuck the alpha, so those are simple encodes too. If your app miraculously didn’t need any more color/alpha precision than that, these might be sufficient for your top layers.

However, DXT-compressed textures are a different beast requiring more encoding work. Unlike the above formats where texels are encoded and stored separately, with DXT they aren’t: 4x4 blocks of texels are. For the RGB side for instance, the compressor has to split your image into 4x4 blocks, and for each block, come up with the two best colors to represent the entire block by. Then what’s stored in your texture are those two best colors, along with a 2-bit value (0%/33%/66%/100%) for each texel in the block which describes where that texel is (approximately) along a line between those two points. Alpha in DXT5 is handled similarly, but with 3-bit interpolants.

…by the way, I’m assuming in all this that none of your texture layers are monochrome. If so, that simplifies things dramatically.

Thanks again everyone. I’m looking into using the RGBA4 or RGB5_A1 formats to speed up texture transfer to the GPU and to reduce texture memory. I think the color will survive : )

My question is how to store my image data. Currently I’m using 4 unsigned bytes so I pass along RGBA and GL_UNSIGNED_BYTE to my texture call. If I were to use this new format, would I still use RGBA and then GL_UNSIGNED_SHORT? I’m thinking so but want to make sure. I’d take the highest order bits for each color when I pack my short, right?

And another thing to clarify with you all. I could change my alpha values using RGBA4 but alpha is either on or off using RGB5_A1, is that correct? Some image need to have a variable alpha, others to not, so I guess I’d want as much color precision as I can get if alpha is not necessary.

Thanks again, let me know when you can.


Here’s some info on what combinations of internal format/type/format give you hardware accelerated transfers to (nvidia) GPUs.

internal           type                  format


Info on how to pack the pixel data is explained in section 3.6.4 of the OpenGL spec.

Thanks! The RGBA4 is working pretty well, and it certainly sped things up!

In regards to the packing of pixel data, how are the pixels unpacked? Worded differently, if OpenGL has the highest four bits of a color, how does it fill the lower 4? With zeros? A mirror? With ones?

The reason I ask is because I’d like full alpha (255) but am not sure how OpenGL interprets my value of 0x000F. I was the alpha component to be displayed as 0xFF but am worried it is displayed as 0xF0. I suppose any other way to guarantee full alpha, without using RGB, would be good too. I want it to be adjustable, but also want to cover the full range.



Dunno where in the spec it says (or if it does), but you can easily run an emperical test. Fill a texture with 0xFFFF, turn off ALPHA_TEST, LIGHTING, etc. and blast it onto the screen with REPLACE texture mode. Use xmag or similar to capture the pixels and see what they are. My guess is it replicates the high 4 bits into the low, so you should see an RGB color of FF FF FF.

Section 3.6.4

Conversion to floating-point

This step applies only to groups of components. It is not performed on indices.
Each element in a group is converted to a floating-point value according to the appropriate formula in table 2.9 (section 2.14). For packed pixel types, each element in the group is converted by computing c / (2^N − 1), where c is the unsigned integer value of the bitfield containing the element and N is the number of bits in the bitfield.

Oh man, I think I still need some help understanding this.

A problem I ran into is thus: There is an image that I clear to an obnoxious color (orange) and when I fill with RGBA values and textures, it looks fine. When I create my image with unsigned_short_4_4_4_4 values and display using RGBA4, some pixels are clear and I can easily see that terrible background color!!!

So the question is, did I do that or might OpenGL? It’s possible I did not fill a value correctly and just completely skipped that pixel, so it stayed the background color. Is there a case where OpenGL would loose a value due to the texturing conversion? It does not make sense that it would happen, seems that any pixel of data will be turned into a pixel of texture, especially if my alpha stays high at 255 on my side.

Thoughts? And thank you very much again for all the replies, it is much appreciated : )



Problems with pixel store alignment and packing perhaps ? That can bite sometimes.
Check opengl pitfall “7. Watch Your Pixel Store Alignment” here :

Long story :