glTexImage2D vs glReadPixels - why so slow?

Take a look at these two fragments:

glReadPixels(0, 0, 2048, 2048, GL_RGB, GL_FLOAT, outArr);
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB32F_ARB, 2048, 2048, 0, GL_RGB, GL_FLOAT, inArr);

The former operates on an RGB 32-bit floating-point pixel buffer surface and the latter creates a texture of a similar format. The read back takes about 80ms (~600MB/s) but the write takes about 480ms (~100MB/s). :confused:

Granted, there is a little extra overhead in allocating texture memory. I can account for this by timing glTexSubImage2D on an already-created texture, which knocks about 10ms off. Still the overhead is insane.

NVIDIA 81.95 drivers on Windows with a GeForce 6600GT 256M and PCI Express x16. I get similar results on a 7800GTX. Any idea where I’m going wrong? (or if this problem lies outside the code I’ve pasted here?)

I guess the driver has to convert GL_RGB32F to the real internal format GL_RGBA32F (GL_RGB32F isn’t hardware accelerated)… Try uploading native RGBA texture data instead of RGB only.

What is the glReadPixels source format?
A floating point or a fixed point buffer?
In the latter case the data to read over the bus is four times less than what you download later on.

It’s about 20ms faster with GL_RGBA32F_ARB, but still a lot slower than readback.

The image being read back is a GL_RGB32F_ARB pixel buffer surface (96bpp), so the amount of data being transferred should be the same.

Try to use GL_BGR or GL_BGRA instead of GL_RGB. Also try to use PBO for readback and upload.
Using PBO’s ypu can upload up to 1.8GB/sec and readback ~550-600MB/sec on AGP mothrtboard. I didn’t test on PCI-X.

Read http://developer.nvidia.com/object/fast_texture_transfers.html

yooyo

I’ve experimented quite a bit with PBOs across different formats. On the most common formats you’ll find they’re definitely faster than glTexSubImage, but on the majority of formats they’re slower. When you have to do a conversion, I’ve found that using my multithreaded optimized code (on multicore machines) to convert the datatype and fill the PBO gives a significant speedup over allowing the driver to do the conversion.

RGBA is the preferred source format for float textures on nvidia, and BGRA is the prefered source format for uint8 texture on Nvidia. uint16 source data is pretty bad, on most nvidias they convert them to uint8 internally and that conversion kills performance.

RGBA is the preferred format for both float/uint8 on ATI. Most ATI cards handle uint16 textures pretty good.

Don’t convert RGBA to RGB. Don’t convert types. Prefer glTexSubImage to glTexImage.

Ah darn it, I deserve a good slap here. I was timing the wrong thing and using GL_RGB16F_ARB (for some lovely extra conversion overhead) in one place to boot. Ahem. :smiley:

I’m now back to optimal uploads, pretty much the same as readback. Let’s pretend this thread didn’t happen. :wink: