Block Compression Pass

Normal path:

  1. Render scene into uncompressed framebuffer.
  2. Sample from that for post fx, etc.


  1. Render scene into uncompressed framebuffer.
  2. Compress framebuffer into BCn texture format.
  3. Sample from compressed texture.

Tradeoff is compression cost and artifacts vs. memory bandwidth cost sampling the texture. So this only makes sense in situations where the sampler is read alot.

Areas where this “BC pass” could be applied:

[/li][li]Deferred Shading
[/li][li]Blurs, SSAO, etc.


The ideal solution would be to accelerate gl(Copy)Tex(Sub)Image2D calls targeting a BCn internal format with the GPU. Like that, it could even accelerate existing code without any changes.

A test case compressing a 1080p RGBA8 framebuffer to BC1 cost 18ms on a Geforce 680 when using glCopyTexImage2D, no clue how to determine whether that compression is CPU or GPU, but I’m fairly certain it’s CPU because of some talk about C libraries that were at some point mentioned in conjunction with BCn compression.

I wrote to both AMD and NV but no reply, so I’m posting this here so it’s not forgotten or dies in a spam filter.

BPTC compression is very expensive, it is not really designed for real-time compression. Other compression methods might be useful for what you propose but even then you could already do this using a compute shader and ARB_copy_image which allows you to copy raw data from an uncompressed texture to a compressed one, thus the algorithm should look like the following:

  1. Render scene into uncompressed framebuffer
  2. Run compute shader that performs the compression, outputs to an uncompressed texture that is copy-compatible with the desired compressed format
  3. Copy uncompressed image to compressed image
  4. Sample from compressed texture

Though, still would accentuate the fact that at least BPTC formats are very expensive to compress.