Texture Compression

Alfonse_Reinheart · January 25, 2010, 10:42am

And while I was looking for a solution I found a post about texture compression, but I have no idea if I can draw the “current” texture (the one I got the handle to) to the framebuffer, and then copy it to a compressed texture on the GPU.

You can do that with OpenGL. But you’d be wrong to believe that it would all be handled on the GPU.

Compressing a texture with S3TC (or most other compression formats) is non-trivial. As far as I know, shaders don’t exist that can do it. So when you tell OpenGL to copy from the framebuffer to a compressed texture, it will likely download it to main memory, run the CPU compression routine, and re-upload it to the texture. Seeing as how this is probably not what you want, I would advise against it.

Yann_LE_PETITCORPS · January 25, 2010, 1:40pm

I admit that texture compression is certainly a task that cannot now handle shaders (for the instant …)

But I’m sure that this is certainly not the case for the decompression side
=> somewhere, this is only one type of color indexing … and with a very very little colormap of only 4 colors

For the compression side, I don’t think that this can really add a big number of news transistors for next GPUs because we have only to find the maximum and mimimum reds, greens and blues components of 16 colors and make some comparaisons …
=> something like “goods olds” MMX registers can make this very efficiently …

So, hear in 2010 that the DXTs compressions/decompression is a really very difficult task to make on hardware seem to me something like a big mistake …

And the fact that DXTn are really low complexity algorithms compared to JPEG, MPEG or others MJPEG formats (that are “relatively old” video formats and implemented in hardware since a very long time) give me a lot of assurance for to think this

Please, don’t loose time for to say “no, it’s not possible” , we prefer to hear “it’s certainly possible, BUT this is really very hard to implement”
=> the shortest path is often the best …
(but ok, sometimes this is the longest to traverse … before to make the bridge for a lot of others persons that cannot traverse it without )

So, in fact the problem is only to find a very fast (cf. in real time or very near to this) compressor that can handle a DXT output
=> I don’t know why but I think that it exist in the word a lot of guys that can give help for to have this

If a shader can access individualy 16 texels in the same texture, I think that it can make the compression … in uniforms that we can retrieve in RAM after the execution of this shader with a bloc of 4x4 texels …
(but on other side, a CPU implementation can certainly to be more speed because it haven’t to pass arguments in/out with the shader memory space)

I begin to think that my dream/delirium about “YCbCr DXTed” textures can to become a reality in a near future
(and it haven’t the problem of interpolation with three differents colors in the 4x4 block …)

For to be simple, the idea is to make something like a DXT but only on 8 bits values (and not rgb565) and that work independantly for the Y, Cb and Cr planes
=> this can be “easily” decoded into a fragment shader and I think that the encoder isn’t too hard to make …
(but I see already a very big problem with this because we loose the linear interpolation between blocs of 4x4 texels, so to have something such as a “interpolated mosaic” at the end )
(but on another side this is already a specific problem of the DXT compression and this don’t seem too problematic )

@+
Yannoo

mark_ds · January 25, 2010, 5:45pm

http://developer.nvidia.com/object/real-time-ycocg-dxt-compression.html

LR123 · January 26, 2010, 1:32am

Thank you for all your answers! You helped me a lot!

Yann_LE_PETITCORPS · January 26, 2010, 12:53pm

Thank Mark DS,

Your link is really good and I have find a lot of code/samples/new ideas in it

This confort me to the fact that the RGB colorspace isn’t the best for the compression/decompression of pictures/videos

This seem very nice but I don’t want to loose the 4:2:0 compression in the way

But I think that this is not too hard to add to the YCoCg sheme, because this seem exactely the same thing that what I make for to handle Y, Cb and Cr planes in my shader, cf. only some “scale/decal” with texcoords for to access the good plane.

And I see one YCoCg to RGB conversion formula in this linked page, so I don’t think that it’s really too hard to have a formula for a direct YCbCr to YCoCg conversion
(so, the 4:2:0 compression is not necessary loose …)

Note that I don’t like to work with only a diagonale in a 3D colorspace, I prefer the possiblilty to work with a “true but little/reduced” colorspace, not only a “gradient colorline”

With one “diagonale/interpolation” per component, this can form a sort of “curved triangle” if we think colors such as 3D vectors (cf. x,y,z r,g,b, y,u,v, y,cb,cr and y,co,cg are alls 3D vectors) that are not obligatory linears and/or perpendicals …
(the DXT compression scheme can only handle a line in the 3D RGB colorspace)

In alls cases, if the RGB colorspace don’t seem to be very used in the video domain, it’s certainly not for nothing

But this work always only for handle intra-pictures …
=> it’s now time to think about the inter-pictures algorithm for to really have a good compression ratio
(interlacement technics, subpictures/mosaics and others bi-directionals pictures can help a lot for this)

==> I have already something that begin to work and that use the standard PAL/SECAM interlacement/frame (cf. 50 Hz to 25 fps) scheme for to handle two successives YCbCr video pictures for the price of only one
(I have now my GOP, of only two pictures it’s true, but it’s already the beginning of the implementation of my “GOP dream/delirium” )

===> this make already a 4:1 compression without any visual artifacts (and with a temporal interpolation between this two frames if we want more/less fps) when we compare it to basics/simples RGB successives frames (and the DXT1 compression is only 8:1 with a lot of visuals artifacts and any inter-pictures features …)

@+
Yannoo

Yann_LE_PETITCORPS · February 2, 2010, 2:39pm

Hi,

I begin to have something that work and have the same 8:1 compression rate such as DXT1 but with a quality that seem to me really better with photographics, statics and animated pictures.
(my implementation is always too slow for to handle a video in real time at 25/50 fps, but I think resolve this problem in a relatively short time because my code doesn’t use MMX/SSE instructions for the instant).

This is something like a monochromatic version of DXT1 but that handle a YCbCr 4:2:2 packed format with 4x4 blocs constitued of two YCbCr 6:5:5 colors (minimals and maximals Y, Cb and Cr values where Y, Cb and Cr components are totaly independants) coupled with a 4x4 1 bit/pixel array generated by a “Floyd-Steinberg like” error diffusion algorithm for the Y part and two arrays of 2x2 2bits for the Cb and Cr parts.
(cf. 8 bytes for 16 pixels)

I can per example handle without any problem :

one or more pixels that are only shades of blues
one or more pixels that are only shades of green
one or more pixels that are only shades of red
one or more black pixels
one or more white pixels
and a mix of all this “independantly” for each pixel in the 4x4 bloc of course

Where the DXT1 compression can only handle 4 colors that are in “a line between two colors” …

So now, I begin to play with my “GOP of 8 YCbCr 4:2:2 DXTed frames that have only the size of one RGB24 picture”
(and with the PAL/SECAM interlacement, I think easily extend this for to double the number of frames in my compressed GOP with a very small visual difference)

But on other side, I have loose the high quality for the zoom/resize on the display window and/or when I project the video texture on various and animated 3D shapes

But I think to come back to a multiplanar format for that texture units hardware can make the bilinear interpolation without any penalty as before
(but ok only with the reducted minimals/maximals YCbCr values, not for the arrays of bits)
=> how can I handle differents interpolations schemes into the same texture (but without the use of multiples texture units) ???

@+
Yannoo

Yann_LE_PETITCORPS · February 11, 2010, 12:21pm

I think to have found 3 news DXT compressions algorithms

They works with a bloc of 16 pixels (4x4) such as DXT1/2/3/4/5/6

But this is computed in the YCbCr color domain, not in the RGB domain …

And the input/ouput is already in a 4:2:0 precompressed and planar format
=> this give somes levels of mipmaps for free with this DXTed version …

My algorithm use zigzag + dithering + error diffusion methods for to convert independently the Y, Cb and Cr planes from 8 bits to 1 or 2 bits

minimums YCbCr components => 2 bytes (6:5:5 format)
maximums YCbCr components => 2 bytes (6:5:5 format)

YYYY
YYYY
YYYY
YYYY 1bit/pixel => 2 bytes

CbCb
CbCb 1 bit/pixel => 4 bits

CrCr
CrCr 1 bit/pixel => 4 bits

This give only 7 bytes for each bloc of 16 pixels
(2 colors of 2 bytes + 2 Y bytes +1 Cb/Cr byte)
=> the compression ratio compared to plain RGB is near of 7:1
==> so DXT7 seem me a good name for a 7:1 compression algorithm

Another way is to use 2 bits per Y sample for to have a better scale of intensity => DXT8 (8 bytes for 16 pixels)
(exactely such as the DXT1 but this is better because this is really the intensity that is interpoled, no a line between two colors)

We can too add one or two anothers bits/planes per pixel for to handle the alpha/transparents pixels => DXT9 (9 bytes for 16 pixels)

And/or use more than 1 bits per pixel for the Cb/Cr plane for a better color quality.

If we compile all this, this make 12 bytes for 16 pixels for a really great quality
But the compression ratio is very bad with “only” 4:1 …

And this is only for the intra-picture compression in real time …
=> the inter-pictures compression in GOPs is certainly for soon

I’m really happy because my good old EEEPC 701 can now work with YCbCr 4:2:0 DXted and mipmapped HD textures
(not in real time for the instant, but I think that some MMX/SSE asm and/or vertex/fragment shaders optimisations can easily resolve this temporal problem …)

@+
Yannoo