Texture Compression

Anyone have any clue as to why my call to

   glCompressedTexImage2DARB(target, 0, format, 
            pTex->width, pTex->height, 0, pTex->size, pTex->pixels);

might be failing? I know the border has to be set to 0, and it is…but I’m getting an error return of GL_INVALID_OPERATION.

I’m trying to compress the image using
GL_COMPRESSED_RGBA_S3TC_DXT5_EXT if that helps or means anything to anyone.

the target is GL_TEXTURE_2D.

heres what i do,
one thing to watch out for is mipmaps, cause the smallest block is 4x4 pixels (not 1x1)

int size = (w * h);	
if ( bt.compression_used )
if ( size<16 ) size=16;
fread( (pixels+total_size), sizeof( GLubyte ), size, file );
glCompressedTexImage2DARB( GL_TEXTURE_2D, i, GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, w, h, 0, size, (pixels+total_size) );
total_size += size;

To clarify, you still have 2x2 and 1x1 mipmaps, but they are one block large, so it still contains 4x4 pixels, but only the 2x2 or 1x1 upper left pixels are used. Also make sure the size parameter is correct.

Are you trying to upload pre-compressed data, and it fails? Then one of your parameters is not right.

Or are you trying to make the driver compress data that you have, which is un-compressed? If so, then you should use TexImage with the COMPRESSED internal format and NULL data, then use TexSubImage() with RGB external format, and the driver will compress for you (although it’ll be slow and poor image quality).

The image data is uncompressed; so, I’ve got uncompressed image data in system memory -
pTex->pixels - and I’m trying to upload it to the board compressed. The call to glCompressed… should compress the image data on upload should it not?

sorry, I’ve not worked with compressed textures before…It may be very well possible I have no idea what I’m talking about.

EDIT: jwatte -
I tried what you suggested, here’s the call


before the call I bind the texture that I’m trying to compress…
anyway, this call returns GL_INVALID_ENUM.
???, any ideas?

glCompressedTexImage2D is for uploading precompressed texture data. If you want the GL to compress it, use glTexImage2D with the internalFormat set to a supported compressed format. Make sure to use a format supported by the driver:

glGetIntegerv(GL_NUM_COMPRESSED_TEXTURE_FORMATS, & formatCount);
glGetIntegerv(GL_COMPRESSED_TEXTURE_FORMATS, formatArray);

I get really tired of seeing apps that assume if GL_ARB_texture_compression is supported that they can use GL_COMPRESSED_RGBA_S3TC_DXT1_EXT. :frowning: It is completely valid for a driver to advertise GL_ARB_texture_compression (or GL_VERSION >= 1.3) but not support any compressed formats. The spec was written to specifically allow that.

If you don’t want to bother determining which compressed formats are supported, you can use GL_COMPRESSED_RGBA (or one of the other generic compressed formats) and the driver will pick one for you. You can query the texture to find out what the actual format is. This is useful if you’re going to readback the compressed texture with glGetCompressedTexImage.

Okay, don’t take this the wrong way, but this is basic OpenGL. The internalFormat (the 3rd parameter to glTexImage2D) is the format you want the texture to be on the card. The format / type (the 7th and 8th parameters) describe the format of the texture data you are passing in. You want the data to be GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, but you have it as an array of unsigned bytes representing RGBA texels.

glTexImage2D(GL_TEXTURE_2D, 0, GL_COMPRESSED_RGBA_S3TC_DXT5_EXT, width, height, 0, GL_RGBA, GL_UNSIGNED_BYTE, ptr_to_uncompressed_texture);

idr, thanks.
that helped.


I’d like to use glCopyTexSubImage2D to copy data to a texture. I’ve a pool of >50 textures and wonder if it’s possible to let the GPU compress the data and then write it to the texture? (to save VRAM)

Can I instantiate my textures like this:

and use something like that:
glCopyTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, 0, 0, m_poolTexWidth, m_poolTexHeight);

to get a compressed texture? Will the GPU compress it, or is it passed to the driver/CPU.

Is there a better way to have a huge amount of textures, but save VRAM?

Really? Was there a need to unearth a thread 6 years old? You couldn’t just create a new thread?

Will the GPU compress it, or is it passed to the driver/CPU.

Implementation dependent, but I’m guessing that it’ll be done on the CPU.

Is there a better way to have a huge amount of textures, but save VRAM?

I’m not sure what you mean. CopyTexSubImage copies pixel data from the framebuffer to the given texture. Both the framebuffer and the texture are in VRAM, so I don’t know how this is saving you anything.

It’s true that I have never used GL_COMPRESSED_RGBA_S3TC_DXT5_EXT before to have read this thread … :slight_smile:

As a lot of others things … :frowning:

Exist something like ?


Something that can handle easily a 3D texture of 2D IPBBPBB… compressed slices of pictures (where the Z dimension is the time) and permit to directly share compressed data between the RAM and the VRAM ?

What is the “best but standardised internal format” for video display and sharing betwen the GPU and the CPU ???


Exist something like ?



Something that can handle easily a 3D texture of 2D IPBBPBB… compressed slices of pictures (where the Z dimension is the time) and permit to directly share compressed data between the RAM and the VRAM ?


S3TC, and all the variations thereof, are formats designed for a specific purpose: fast texture access. The decompression algorithm is both braindead-simple and very localized. You can easily decompress any 4x4 block of the texture, and doing so only requires exactly 64 or 128 bits. It requires only some table accesses and integer math to decompress the images. It is regular, fast, and easy to implement in hardware.

JPEG is not. It, like MPEG and such, are designed for a different purpose. You cannot easily decompress a section of a JPEG image; you pretty much have to do the whole thing. JPEG requires high-end math to decompress.

Of course, JPEG is a better image compression format in terms of overall quality. But hardware for doing texture accesses from JPEG compressed images would be very complex, expensive, and slow. The formats used for compressed textures are those that are designed to be implemented in hardware, not things designed for the convenience of the user.

Yann, sometimes it’s really insightful to read through glext.h (the latest version). I.e, in this case you should Ctrl+F for “compressed” there ;).
The glext.h contains all extensions, thus shows the complete functionality that could be available at the moment.

First, thanks for your replies!!! :slight_smile:

My policy is to:
a) search the forum for existing threads
b) if there are no search results, open a new thread

Is it considered as a bad manner, to add something to an older thread? - As long as the topic fits IMHO it’s better to group things which belong together into a single thread… anyhow… :whistle:

I understand that both, the framebuffer and the texture consume VRAM. The problem is, that the textures consume too much!

Background: I’m working on a video effect plug-in which gets called 50 times per second, and gets a handle to a texture (which is already on the GPU). Video frames are stored in the texture. My effect should delay the video, thus I copy the current texture to a buffer (in VRAM) and read-out a frame of the past to be processed now.
The question is how to maximize the amount (buffer) of frames without copying them to RAM (as copying to RAM performs very bad)
My thought was to compress those textures in the buffer. Not sure whether that makes sense or not.

Maybe there is no solution to this specific problem?

3D graphics is a topic that is so rapidly changing, that bringing up such an old thread is usually seen as bad practice, because people unfamiliar with the thread will start reading at the beginning, wondering why it is full of outdated information, before they realize that the thread is very old.

On other forums about other things that might be handled differently, but in this case it is better to create a new thread, if the threads that you found are older than, say, 6 months.

IF you find a topic that is very old, but still comes very close to your problem, insert a link to it in your post, so that people see that there has been a discussion about it before, but it is clear that everything it contains might be very outdated.


The question is how to maximize the amount (buffer) of frames without copying them to RAM (as copying to RAM performs very bad)

No, the question is why are they textures to begin with? If your intent is to process these frames in some way on the CPU, then they should just be RGB data stored in main memory. It’s a waste of time to upload the image data after decompression, only to download it, modify it and then re-upload it.

If you’re trying to use a shader to process the image, then uploading it to a texture is the right way to go. Otherwise, don’t do it. Just leave it in main memory until you are ready to draw it.


I’m make some tests about something like DXT1 but adapted to the YCbCr color space instead of the RGB color space (cf. with 16bits yuv844 instead of rgb565 colors)
=> the compression is always 8:1 but the quality seem “more good/realistic”
(I haven’t test with yuv655 or others yuv 16 bits possibles formats, but I think to find another yuv 16 bits format that is better in a short time)

I think this is principaly because the grey gradient is very more “visible by the eye” that the color shift when we interpole between C0 and C1 in C2 and C3.

The Y part of the C0 and C1 colors are really easy to find
=> they are the minimal and maximum values of the current 4x4 bloc in the Y plane from the YUV picture that is generated by libavcodec or v4l for example
(the 4:2:2, 4:2:0 or 4:2:1 format is not important for the Y plane because the Y plane is always the same, only Cb and Cr planes are “compacted” between this differents formats)
==> with MMX instructions, this is really very very fast

My actual problem is about to find the best CbCr line from what I can interpole the UV part of C0 and C1 in C2 and C3
=> it’s like interpolating a rainbow :frowning:
==> but I think this can be possible by interpoling alternatively Cb and Cr instead of to test to interpole a rainbow :slight_smile:
(and this not add additionnals bytes into this new DXT video texture format …)

After finish to resolve this problem on I frames (cf. frames that are not dependants from next or previous frames) that are “YUV DXTed”, I think to begin to handle the P and B frames cases for “really” compress the video texture :slight_smile:
(my goal is to have something between a 12:1 and 25:1 compression with this “DXT like video texture format” for really reduce the need of memory for netbooks/PDAs that are really limited in memory)

Alphonse, I want to handle a lot of textures/frames into the same shader … otherwise this is too easy because I have already it :slight_smile:
(cf. two 3D textures generated by successives 2D frames from AV1 and AV2 , where the r dimension on 3D textures is an index into the time, cf. a timestamp)
=> it’s more or less 50 frames that I want to handle into this shader for to can direcly handle in the shader a mix between two videos and for a time of about one second before to have to reload the AV1 and AV2 “frames texture packs”, cf. GOPs generated by libavcodec or v4l, into the GPU
==> this give a latence of the number of pictures on the Group Of Pictures but multiply possiblities about streaming / compression / decompression / specials effects into the shader …
(with a GOP of 4, I find that it is not too perceptible, but with a GOP of 8 or more this begin to be really perceptible … and GOPs in video files are generally very more that 8 :frowning: )
===> a GOP of numerous MPEG/JPEG/AVI/V4L video frames is not really the same thing to handle than one or two littles and independants RGB frames in the OpenGL point of view (but only for this instant, I think …)
====> but the fact that this can give superior quality with very less of RAM/VRAM memory occupation and %CPU utilisation give to me a lots of goods reasons for to continue my research about it :slight_smile: :slight_smile:

Personnaly, I find that the RGB color space is very bad for handle video pictures … the YUV/YCbCr color space is really more adapted for video textures
(and with the luminance/chrominance embeded in it, we can easily adapt the video stream exactely such as what we can make color/intensity adjustments with potentiometers on a TV …)

And I want a memory location where I can modify something with the CPU but where the GPU have a direct access
(for to bypass the CPU->GPU memory transfert if possible … I think that the AGP memory or something like this is certainly the more adapted for this)
=>but grouping multiples consecutives RAM =>VRAM memory transferts into only one seem to me a very efficient way for to cache this too


Best depends on your application. Is you data SDR or HDR? Are you requiring use of GL or are other GPU APIs OK? What format are you coming from? What are your performance constaints? Is quality, memory, or speed more important?

If you’re requiring use of GL, best for space/bandwidth is of course the GPU-supported compressed texture formats such as DXT1 and DXT5 (ringing in at a mere 0.5 and 1.0 byte/texel, respectively). You can store std RGB color space in these, or store alternate color spaces in DXT5 such as YCoCg for better quality.

However, if you’re coming from already compressed MPEG or MPEG-like video (especially something like h.264) and aren’t insisting on GL, you’ll likely get much better perf using a library like NVidia’s VDPAU or XvMC to feed video to the GPU. MythTV for instance uses these for GPU-assisted video playback, when enabled and available.

Thanks Dark Photon :slight_smile:

I have now see VdpVideoSurfaceGetBitsYCbCr, VdpVideoSurfacePutBitsYCbCr and others VdpOutputSurfacePutBitsYCbCr funcs specs, this seem to be about what I want :slight_smile:

Where can I find a complete but simple and fonctional sample/tutorial that use this in C/C++ ?

Because the pseudo-code seem cool, but I don’t really know how to compile it in gcc or g++ :slight_smile:
=> I want to read frame by frame a videofile (the file format can be .avi, .mpg, .mov or /dev/video for example) and output each image in a “compressed but standardised and user-friendly internal format” into a queue on memory (so, where I can easily and fastly decompress one picture, modify it and rewrite it in a compressed format, all this “on the fly”).

One thread fill a frames queue when it read a videofile and we have multiples others threads that can read this frames queue (and/or make a mix between multiples queues in input, and output this mix into another pictures queue, or display directly it on numerous and various 3D OpenGL shapes that are video-texture mapped).

For this instant, this is only for SD resolutions (CIF, QCIF and others 4CIF) on 1 to 32 bpp surfaces (B&W to RGBA8 with the YCbCr format between), but I’m for to have support for 9CIF/16CIF or HDR versions such as 1920x1080 or more in multiples views and in float or double formats too :slight_smile:

I want to display/stream “not too slowly” something like four or five audio/video streams/files (or a lot more like dozens if this is possible) on a little netbook such as a eeepc, an iPhone or a PDA …

At this instant I can only handle two ot three littles video streams on my eeepc, but with a lot of difficulties (I have to volontary loose somes frames for to have something that work) and CPU/RAM consommations that are really too hights
(and my PDA doesn’t seem to like this when I test multiple video streams displaying with it and this work perhaps/certainly with the iPhone but I haven’t found the time to work about this implementation :frowning: )

But on other side, I can already handle more than a dozen of littles videos streams in // on various CoreDuo plateforms (such as recents PCs, iMac or Mac Mini) with V4L(2) and/or libavcodec, so I find that it’s not as bad as it :slight_smile:
(on the iMac plateform, I can already for example fill the HD screen with a lot of SD avi/mpeg/raw streams and resize/zoom/scroll/rotate/mix/… independantly each video stream display in real time … but I haven’t the /dev/video support for the webcam with MacOS because this seem to be a “Linux only” feature)

And I dream about that this can work “very well and speedly with HD contents” on a very little computers farm (with two or three CoreDuo plateforms for example), from the client/server and network point of view too :slight_smile:


I’m not responsible for uploading the texture to the GPU. It’s done by the host application, my plug-in just gets a handle to a TEXTURE_2D and that’s it. And you are right, the plug-in is mainly a shader, which for example blends the last 50 frames (50 textures). And no, I do not want to download the textures to RAM to process them on the CPU, not at all! - But I’m looking for a clever way to increase the maximum number of frames (textures) stored at the graphic card. And while I was looking for a solution I found a post about texture compression, but I have no idea if I can draw the “current” texture (the one I got the handle to) to the framebuffer, and then copy it to a compressed texture on the GPU.

@Jan: I got the point, and agree. I’ll change my policy :wink: