Reading the Depth Buffer - Why so slow?


I’m writing a program uses glReadPixels() to get values from the depth buffer (GL_DEPTH_COMPONENT). I also use it to read from the frame buffer (GL_RGBA).

What I notice is that reading 32bits per pixel from the depth buffer is FAR slower than the same amount to data from the frame buffer.

Why is this and is there a way around it?

For those that are interested, I used a GForce2 GTS with 512x256 window at 32bps for colour and 32bps depth.
Reading the entire depth buffer took 88M clock cycles. Reading the same amount to data from the colour buffer took just 7.6M. 13x faster!

if you are getting the depth values into floats, it will be slow.
and on yor card, maybe on others too, get the values into an unsigned char or int, i use char. it is supposed to be faster, as far as i know.

pretty easy to say, here it is what is happening when reading the Depth Buffer
Depth values are read from the depth buffer. Each component is converted to floating point such that the minimum depth value maps to 0.0 and the maximum value maps to 1.0. Each component is then multiplied by GL_DEPTH_SCALE, added to GL_DEPTH_BIAS, and finally clamped to the range [0,1].

its obvious that this will take lot mor time than just reading the color values.

hope it helps

The depth buffer is either 16-bit or 24-bit. If it’s 16-bit, you should be using UNSIGNED_SHORT. If it’s 24-bit, either UNSIGNED_SHORT or UNSIGNED_INT is OK, but neither is particularly good, because both require pixel format conversions.

  • Matt


Thanks Chris, okapota and Matt for your help. Your advice is spot-on. The pixel format conversion to Floats was killing my program.

If you are interested, I timed some different pixel format glReadPixel() operations…

-------------------| 16 | 24 | 32 |

GL_UNSIGNED_SHORT | 2.9M | 2.9M | 2.9M |
GL_UNSIGNED_INT | 3.6M | 3.9M | 3.9M |
GL_FLOAT |88.0M |88.4M |88.7M |

M=Millions of clock cycles

…So it seems that SHORTs are faster than INTs in general. FLOATs are awful. The bpp of the depth buffer does not have such an important effect. I guess this applies to reads of the colour buffer too…

If you want to read the framebuffer, you have to think about the component format aswell. I’m not sure about it, but I think GL_BGRA is the native, and therefore the fastest, format in Win32. Using another format will force the driver to change the internal component order.