Hi all,
I would like to start a (short) discussion about the CPU based DXT compression. Namely, DXT is probably the most common texture compression method on desktops. It is widely supported and provides pretty good compression factor (8x for DXT1). Textures can be pre-compressed and stored in DXT format, or recompressed on the fly in the execution phase. The first approach requires more storage space, since DXT’s compression factor is far behind JPEG, ECW, MrSID or similar formats. The second approach is very computationally expensive. The compression (or to be more precise, transcoding, since we have to decode from some image storage to a texture compression format) can be done on the CPU or on the GPU. The GPU based DXT compression (apart from the image storage format decompression) is far superior, since this process can be very efficiently parallelized.
However, using CUDA compressor (for example) is not a very wise decision in combination with OpenGL. Context switching and synchronization degrade performance significantly. So, we have finally come to the topic of this thread…
There are several popular CPU based DXT compressors. Many of you have probably heard about: Squish, Crunch and stb_dxt libraries. They enable tweaking parameters to trade quality for the performance. Well, I wanted to compare their performance to the in-driver DXT OpenGL based implementation. My findings are very interesting. I had no idea of the magnitude in which performance vary. (Also, it is strange that I have found bugs in implementations, which is strange considering their open-source implementation and usage. But let us discuss it later, if anyone is interested in that.)
My first idea was to make a chart which graphically depicts the difference in the performance. But, the execution times differ for several orders of magnitude. It could be our first subtopic for the discussion. Regardless of its impact on the image quality (the differences are imperceptible in most of the cases), a 400x (40000%) slow down cannot be justified.
Here are the results (CPU time) of DXT1 compressing a 4096x4096 texel ortho-photo image:
1104.5 [ms] – NV OpenGL driver
714.3 [ms] – STB (STB_DXT_NORMAL)
819.4 [ms] – STB (STB_DXT_HIGHQUAL)
1244.2 [ms] – Squish (squish::kDxt1 | squish::kColourRangeFit | squish::kColourMetricUniform)
1242.9 [ms] – Squish (squish::kDxt1 | squish::kColourRangeFit | squish::kColourMetricPerceptual)
292566.7 [ms] – Squish (squish::kDxt1 | squish::kColourClusterFit | squish::kColourMetricUniform)
291219.6 [ms] – Squish (squish::kDxt1 | squish::kColourClusterFit | squish::kColourMetricPerceptual)
297179.2 [ms] – Squish (squish::kDxt1 | squish::kColourIterativeClusterFit | squish::kColourMetricUniform)
295666.9 [ms] – Squish (squish::kDxt1 | squish::kColourIterativeClusterFit | squish::kColourMetricPerceptual)
10812.7 [ms] – Crunch (m_dxt_quality = cCRNDXTQualitySuperFast,)
10814.9 [ms] – Crunch (m_dxt_quality = cCRNDXTQualitySuperFast, cCRNCompFlagPerceptual = true)
17317.8 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityFast,)
17132.2 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityFast, cCRNCompFlagPerceptual = true)
35206.8 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityNormal,)
35798.2 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityNormal, cCRNCompFlagPerceptual = true)
122889.5 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityBetter,)
83308.3 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityBetter, cCRNCompFlagPerceptual = true)
276621.2 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityUber,)
192210.9 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityUber, cCRNCompFlagPerceptual = true)
The compression time depends on the content of the image. Here are the results of the DXT1 compression of the 8192x4096 tex. World map:
1250.2 [ms] – NV OpenGL driver
650.2 [ms] – STB (STB_DXT_NORMAL)
717.3 [ms] – STB (STB_DXT_HIGHQUAL)
1198.4 [ms] – Squish (squish::kDxt1 | squish::kColourRangeFit | squish::kColourMetricUniform)
1198.1 [ms] – Squish (squish::kDxt1 | squish::kColourRangeFit | squish::kColourMetricPerceptual)
129952.8 [ms] – Squish (squish::kDxt1 | squish::kColourClusterFit | squish::kColourMetricUniform)
130010.8 [ms] – Squish (squish::kDxt1 | squish::kColourClusterFit | squish::kColourMetricPerceptual)
129752.7 [ms] – Squish (squish::kDxt1 | squish::kColourIterativeClusterFit | squish::kColourMetricUniform)
129910.4 [ms] – Squish (squish::kDxt1 | squish::kColourIterativeClusterFit | squish::kColourMetricPerceptual)
7219.9 [ms] – Crunch (m_dxt_quality = cCRNDXTQualitySuperFast,)
7248.6 [ms] – Crunch (m_dxt_quality = cCRNDXTQualitySuperFast, cCRNCompFlagPerceptual = true)
11090.0 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityFast,)
11102.2 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityFast, cCRNCompFlagPerceptual = true)
21382.1 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityNormal,)
21804.8 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityNormal, cCRNCompFlagPerceptual = true)
63659.5 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityBetter,)
47218.5 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityBetter, cCRNCompFlagPerceptual = true)
142282.2 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityUber,)
109012.4 [ms] – Crunch (m_dxt_quality = cCRNDXTQualityUber, cCRNCompFlagPerceptual = true)
Although the world map is two times greater, the compression time is significantly less. The reason is a huge area of uniform color (oceans and seas).
What are your experiences using DXT compression libraries? Does anyone have a different experience with them? What do you use in your engines? Where did I make a mistake? Those numbers look absurdly large. Spending 5 minutes for something that can be done for 0.5 seconds could not be justified by any quality improvement.