OpenGL compressed wavelet texture support ?

The_Little_Body · April 15, 2011, 9:39am

Hi,

I have a little benchmarked my local wavelet compression/decompression algorithm

The local wavelet transform work at about 60 MegaPixels/Second (900 x 675 = 607500 pixels in 0.01 seconds on a P4 3 Ghz)
(this can to be a lot optimised because it’s a pure C integer only implementation, cf. no asm/MMX/SSE inside)

The local wavelet untransform as about the same performance of 60 MegaPixels/Second

=> so, in this state, my local wavelet transform already work in realtime on a 800x600 window and “no too bad” on a 1280x1024 window
(note that the local wavelet transform has 8 levels of mipmaps embedded on it for free …)

The (un)zigzag, (un)quantization and (un)RLE funcs seem to have yet betters performances (but the clock() function that I used for the base of my benchmark cannot handle very very fast timings)
=> as with memcpy the value returned by clock() after a call to [Un]Zigzag(), [Un]Quantization() or [Un]RLE() functions is the same than before

And effectively ,the thing that kill alls performances was … the ZLIB compression
(it’s too slow from one to two orders for a realtime video refresh at 50 fps or more, and the latency is really too big)

For example, on a grey 900 x 675 picture = 607500 bytes, it take 0.47 s for to compress 371531 bytes to 186197, so work at only 1.239 MegaPixels/s
(the “zigzag => quantization => Harr 8x8 transform => seuil => zeroize smalls values => rezigzag => RLE” pipeline have already reducted the 607500 bytes to 371531 bytes (61%) on 0.02 seconds and the next compression by ZLIB give 186197 bytes (compression of 51%) on 0.47 seconds, for a final compressed size at about 30% (186197 bytes) of the original 900 x 675 grey picture of 607500 bytes)

=> more than 99% of the time of my local wavelet compression is spend on the ZLIB compression !!!

I have tested with differents values for ZLIB (Z_BEST_COMPRESSION … Z_BEST_SPEED) but it’s the same thing : ZLIB compression is “really too slow” (for my needs of course …)

On the decompression side, ZLIB seem very fast, because it take only 0.02 seconds for to decompress 286197 bytes to 371531
(unrle + Harr 8x8 untransform + unzigzag cannot be benchmarked because too integred into the decompression routine and because of my clock() problem too, but they seem too really very very fast)
=> I know that the huffman compression scheme and likes aren’t symmetricals, but I’m really surprised of this very big dissymmetry …

Please note that without the ZLIB/RLE compressions/decompressions, alls computations are already made on a small cache of 8x8 bytes + somes tables of 8x8 bytes for the (un)zigzag(s), (un)quantization and thresholding

So, it’s why I have now begin to work about an algorithm that have a fixed ratio compression for each 8x8 transformed Haar bloc

For what I have found until now, it’s something like DXT but with differents bits per levels for the 8 levels/mipmaps on a 8x8 transformed bloc :

8 4 3 3 2 2 2 2
4 4 3 3 2 2 2 2
3 3 3 3 2 2 2 2
3 3 3 3 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2

the first value of the 8x8 transform use 8 bits (it’s like “the intensity average on the entire 8x8 patch”)
the second level of the 8x8 transform use 4 bits (they are something like “the average intensity derivatives”)
the third and fourth levels use 3 bits (I can’t easily explain on some worlds what they are )
the fourth last levels use 2 bits (they are something like “details on the 8x8 patch”)

2x4 bits for the begin and the end intensity (can be used by the first level for a “bigger/more precise” colorspace, and always used for the others levels)

=> this use 160 bits (20 bytes) for to store 64 pixels … so a fixed compression a about 30% (31.25%)

I have too think about schemes that uses less bits per levels (75 to 98 bits for the 8x8 patch), but they use only one bit on each of the last four levels and this seem me “really too limited”

The_Little_Body · April 15, 2011, 11:29am

On //, I have begin to study how this local 2D 8x8 wavelet transformation can be extented to a local 3D 8x8x8 wavelet transformation for to handle video.

It seem me that the fact that each image is really close to the next image (and really close to the previous too) on a video + the line interleaving are very goods things for to be “naturally” handled by the averaging and differences power of the wavelet transformation .

The_Little_Body · November 27, 2011, 6:02pm

The 3D wavelet transformation seem really a very wonderfull method

I have added another 1D Haar transform step for to handle 8 successives slices of already 2D Haar transformed 8x8 blocs (cf .for to handle 8 successives temporales “pictures” of a bloc of 8x8 pixels)

My “multi-levels DXTlike wavelet compression” idea, where a maximum number of bits is used for the first/root level and the number of bits decrease on higher/finer levels, seem to work “more or less”
(but this can perhaps to be used for edges detection or constrast enhancement for exemple)
=> I think that this sort of compression can to be very more efficient into the quantization step

I don’t understand the problem with the use of atomics 8x8 blocs, needed for the 2D wavelet transform, when 4x4 blocs are already used since a lot of years with the DXT compresssion/decompression scheme
=> this is only a 4x increase of the bloc size used on the DXT scheme (that have already increase the bloc size to 4x4=16 pixels), so where is the technical problem ???
(Ok, SSE instructions can generaly only handle 4 floats values, but a relatively good approximation of the local 2D Haar transform can be computed with only 8 bits values and MMX instructions have always handle blocs of 8 bytes …)

The_Little_Body · December 3, 2011, 9:53pm

Hi,

I have begin to relocate my 8x8 locals wavelet coefficients into a “more like” wavelet/mipmap picture.

The first value of each 8x8 local wavelet transform is stored on the first colums/lines of the wavelet picture, the seconds line/colum values of each 8x8 bloc at the next (width/8) columns and (height/8) lines, the thirds at the next (width/8) colums and (height/8) lines and so one for the 5 others levels.

This handle “only” 8 mipmap levels but the “RLE of zeroes” stage (that come after the reindexation/quantization/thresholding of the wavelet coefficients) has in this case a lot more sequentials/identicals values to compress at each instance that without this local ->global relocation.
(and since only 64 coefficients are presents in a 8x8 bloc, I plan to use the two hightest bits for to handle one automatical incrementation/decrementation on each sequential run)

I have too begin in // the implementation of the Embedded Zerotree Wavelet Encoding for to replace the “RLE of zeroes” and ZLIB encoding stages because I think that the EZWE compression/time ratio is a lot more efficient than the ZLIB encoding … and 99% of the time is occupied by the ZLIB compression on my first local wavelet compression scheme
=> someone know where can I find infos/links about something like the EZWE encoding but on a 3D space, not only a 2D space ?
(because a video is a collection of very similars sequentials 2D pictures, I think to handle the time in the video animation with the third dimension on the 3D wavelet transform, with certainly a grouping of 8 frames [or perhaps 16 interlaced frames] per chunk/3D wavelet transform)