Hi,
I have a little benchmarked my local wavelet compression/decompression algorithm
The local wavelet transform work at about 60 MegaPixels/Second (900 x 675 = 607500 pixels in 0.01 seconds on a P4 3 Ghz)
(this can to be a lot optimised because it’s a pure C integer only implementation, cf. no asm/MMX/SSE inside)
The local wavelet untransform as about the same performance of 60 MegaPixels/Second
=> so, in this state, my local wavelet transform already work in realtime on a 800x600 window and “no too bad” on a 1280x1024 window
(note that the local wavelet transform has 8 levels of mipmaps embedded on it for free …)
The (un)zigzag, (un)quantization and (un)RLE funcs seem to have yet betters performances (but the clock() function that I used for the base of my benchmark cannot handle very very fast timings)
=> as with memcpy the value returned by clock() after a call to [Un]Zigzag(), [Un]Quantization() or [Un]RLE() functions is the same than before
And effectively ,the thing that kill alls performances was … the ZLIB compression
(it’s too slow from one to two orders for a realtime video refresh at 50 fps or more, and the latency is really too big)
For example, on a grey 900 x 675 picture = 607500 bytes, it take 0.47 s for to compress 371531 bytes to 186197, so work at only 1.239 MegaPixels/s
(the “zigzag => quantization => Harr 8x8 transform => seuil => zeroize smalls values => rezigzag => RLE” pipeline have already reducted the 607500 bytes to 371531 bytes (61%) on 0.02 seconds and the next compression by ZLIB give 186197 bytes (compression of 51%) on 0.47 seconds, for a final compressed size at about 30% (186197 bytes) of the original 900 x 675 grey picture of 607500 bytes)
=> more than 99% of the time of my local wavelet compression is spend on the ZLIB compression !!!
I have tested with differents values for ZLIB (Z_BEST_COMPRESSION … Z_BEST_SPEED) but it’s the same thing : ZLIB compression is “really too slow” (for my needs of course …)
On the decompression side, ZLIB seem very fast, because it take only 0.02 seconds for to decompress 286197 bytes to 371531
(unrle + Harr 8x8 untransform + unzigzag cannot be benchmarked because too integred into the decompression routine and because of my clock() problem too, but they seem too really very very fast)
=> I know that the huffman compression scheme and likes aren’t symmetricals, but I’m really surprised of this very big dissymmetry …
Please note that without the ZLIB/RLE compressions/decompressions, alls computations are already made on a small cache of 8x8 bytes + somes tables of 8x8 bytes for the (un)zigzag(s), (un)quantization and thresholding
So, it’s why I have now begin to work about an algorithm that have a fixed ratio compression for each 8x8 transformed Haar bloc
For what I have found until now, it’s something like DXT but with differents bits per levels for the 8 levels/mipmaps on a 8x8 transformed bloc :
8 4 3 3 2 2 2 2
4 4 3 3 2 2 2 2
3 3 3 3 2 2 2 2
3 3 3 3 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
2 2 2 2 2 2 2 2
the first value of the 8x8 transform use 8 bits (it’s like “the intensity average on the entire 8x8 patch”)
the second level of the 8x8 transform use 4 bits (they are something like “the average intensity derivatives”)
the third and fourth levels use 3 bits (I can’t easily explain on some worlds what they are )
the fourth last levels use 2 bits (they are something like “details on the 8x8 patch”)
- 2x4 bits for the begin and the end intensity (can be used by the first level for a “bigger/more precise” colorspace, and always used for the others levels)
=> this use 160 bits (20 bytes) for to store 64 pixels … so a fixed compression a about 30% (31.25%)
I have too think about schemes that uses less bits per levels (75 to 98 bits for the 8x8 patch), but they use only one bit on each of the last four levels and this seem me “really too limited”