The maximum compression ratio you will get is 4:1, as far as i know, at least for RBG(A) data. For monochrome textures, i am not absolutely sure, but better than 6:1 is definitely not possible.
S3TC compressed textures will actually render faster, than uncompressed texture, due to less bandwidth requirements and hardware-accelerated decompression.
Uploading S3TC compressed data is possible without problems. I never tried downloading the compressed data, but i am pretty sure that is also possible.
So, only your first point is not possible. Maybe you could tell us, why that is a “requirement” for you. I don’t think it is possible to implement such a thing yourself through shaders. But others might know more about that.
You are right. So that might actually be a solution for him. But if i remember correctly, DXT1 has such a strong quality-loss, that i would only consider it for low-quality modes (comparable with 16 Bit textures). But of course that depends on his needs.
But if i remember correctly, DXT1 has such a strong quality-loss, that i would only consider it for low-quality modes (comparable with 16 Bit textures). But of course that depends on his needs.
The only difference between DXT1 and DXT3/5 is that DXT1 only supports a 1-bit alpha channel. Otherwise, the colors are compressed in exactly the same way as for DXT3/5. So if you have a texture for which a 1-bit alpha is reasonable (or that has no alpha at all), DXT1 is not just perfectly acceptable, but should be exactly what you use.
As for the quality loss, modern off-line compressors (ie, not uploading an uncompressed texture for GL to compress for you) for DXT formats produce quite reasonable results. You wouldn’t want to use it for bright, colorful things, but they work pretty well for realistic images. Also, the larger the texture, the lower the apparent quality loss.
I had test the “ARB_texture_compression” & “EXT_texture_compression_s3tc” extensions,but I found below problem, when I use “glCopyTexSubImage2D()” to copy a buffer,the cpu usage is rised observably,the internal format of buffer is RGBA;Is driver use CPU to do texture compression?I need realtime texture compression like decompression,so is it possible?
however older Nvidia hardware (GeForce 3 and to some degree GeForce 4) decompress the DXT1 textures in worse quality than the DXT3/5 ones.
At this point, that’s hardware so old that it doesn’t bear worrying about. Certainly not to the point of doubling my compressed texture sizes to deal with it.
when I use “glCopyTexSubImage2D()” to copy a buffer,the cpu usage is rised observably,
Why would you possibly expect otherwise?
First, any copy whether the source or the target are uncompressed and the other one is compressed (or compressed with different formats) will need to either compress or decompress the image data. That requires the CPU. There’s no getting around it, because no IHV is going to bother extracting the DXT decompression logic from the texture unit and making it available elsewhere.
Second, even if the formats are both the same, DXT is a block-based format. You can’t simply do a server-side memory copy of the data unless both the source and destination rectangles are 4-pixel aligned.
Third, quite simply, nobody copies compressed textures. It’s an unexercised piece of code, and therefore unoptimized. Most people simply have no need to copy bits of compressed textures around.
I need realtime texture compression like decompression,so is it possible?
That sentence makes no sense; compression is not like decompression.
And you are getting realtime texture compression. Drivers will compress your textures as fast as you could expect them to. And since no IHV is going to bother to put any form of texture compression into their hardware, you’re getting software decompression.
On nVidia’s website there is a piece of code (i think from id-Software), that does DXT compression using MMX/SSE, that is said to be the fastest you can get. id-Software is supposed to be using it in Enemy Territory to dynamically compress their mega-texture data, before uploading it to the GPU.
About Geforce 3/4 class hardware: That was the hardware i worked with, when i found DXT-1 to be extremely low-quality. Glad to here that’s better now.
Yeah, trouble is I believe you lose net by using the GPU due to the PCIx x16 upload bandwidth when using the GPU just for texture compression (unless you’re generating textures on the GPU of course). Not to mention requiring a GeForce 8 card.
If speed is critical, suggest you just use this or this on the CPU and be done with it (as Jan suggested). If quality is more important, if you don’t want to code anything, or definitely if you’re doing this off-line in a tool (as you typically should, unless your engine is generating the textures procedurally), just use squish.
…maybe once we have on-board Fusion/Larabee, use of GPUs for DXT compression will be more practical.
Did you means until now the texture compression is not supported by hardware?My application render a scene in a pbuffer,for some reason it need to display the rendered image in a window,because the pbuffer’s RC and window’s RC is not created in a same thread,so the “wglShareList()” is invalid,I must download the image from pbuffer,and upload it to window’s RC;Because image quality is not critical for window display,so I seek a hardware-accelerated texture compression/decompression extension for reducing the download/upload bandwidth.Now I’m trying implement YUV420 algorithm in GPU using cg,it only get 3/8 compression ratio,but I hope the compression ratio can be more(1/4,or 1/8),so how to do it?
It’s still not supported, as in built-in to OpenGL drivers and available for use transparently (AFAIK. Uploading uncompressed textures and asking GL driver to compress it on the fly still isn’t fast). What I mean is that until the most recent generation of GPUs (for NV, GeForce 8), the GPUs didn’t even support features such as integer operations needed to make DXT compression on the GPU possible (again, AFAIK). And to do this DXT compression on the GPU, you need additional code beyond your graphics API.
My application render a scene in a pbuffer,for some reason it need to display the rendered image in a window,because the pbuffer’s RC and window’s RC is not created in a same thread,so the “wglShareList()” is invalid,I must download the image from pbuffer,and upload it to window’s RC;Because image quality is not critical for window display,so I seek a hardware-accelerated texture compression/decompression extension for reducing the download/upload bandwidth.
Since this is on the same machine, sounds like you may want to restructure things such that you can use server side copies such as glCopyTexSubImage2D and glBlitFramebuffer. Then PCIx bus bandwidth shouldn’t be an issue.
I should qualify this. The GPU timings look sweet, but when you factor in 1-2GB/sec PCIx upload for your uncompressed texture, it takes a big bite out of your gain. E.g. uploading a 2048x2048 RGBA8 texture @ 2GB/sec that eats ~8ms out of your frame time. Add compress time and you’re up to 12ms. If you’re running 60Hz, that’s 75% of your frame-time spent right there on one texture. So you stage it over multiple frames to avoid breaking frame…
OTOH, if you throw this on an idle CPU core, you probably don’t care that it takes 2.5X longer than the GPU route (~30ms). It’s not tying up your GPU and cutting into your rendering time.