Strange fp pbuffer copy performance drop

Hi all,

I’m experiencing a huge performance drop (~40x) when copying data from a pbuffer to a texture when using a couple of floating point internal formats.

Because I’m trying to keep the code portable, I’m using copytexsubimage rather than render-to-texture. Regardless, I have also tried out using render-to-texture in windows with the same results. I also tried different drivers for both linux and windows using a GF6800-GT.

First I tried an fp rgba texture rectangle and a pbuffer with rgba nv float components. This works fine.

Because I’m writing a realtime gpgpu application, any increase in performance is welcome. I only needed one fp channel, so in order to decrease the memory bandwith I used an fp red texture rectangle and a pbuffer with a red nv float component. Again this works fine and I got a noticable performance speedup due to the decreased bandwidth.

Finally I arrived at the point where the application needs texture filtering and mipmaps, so I decided to use ati fp textures. I created an fp rgba texture and a pbuffer with rgba ati float components. Again this works fine.
Note: this also works when specifying a pbuffer with rgba nv float components.

Then I tried to reduce the bandwidth again. I created an fp intensity texture and a pbuffer with rgba ati float components. If I understood the spec correctly it should extract the intensity from the red color channel. This is where I experienced the huge performance drop.

Also in the document about NVIDIA OpenGL Texture Formats , I noticed that nvidia hardware performs a precision substitution for ati_float_rgb16 to ati_float_rgba16. So I thought specifying an fp rgb texture and a pbuffer with rgba ati float components would result in the same speed as an fp rgba texture. But still I experience the same huge performance drop.

In short:
How do I setup the pbuffer to allow a fast copy of only one fp channel to an ati_float_texture?



Hello Nico,

Are you sure that you set up your pbuffers as strictly non-rendertexture pbuffers?
I’m doing something very similar to what you described and I noticed a huge performance drop when I tried to copytex(sub)image2d from a rendertexture pbuffer. (I use the pbuffer as a texture as well as for generating other textures.)
Once I made separate pbuffers for rendertextures and separate pbuffers for generating textures, the copy was much faster (both for fp and non-fp pixel formats).


Thanks for the reply VCarnage.

I used the pbuffer files from the nvidia SDK 8.0. The results are the same regardless of whether I specify a “texture” string or not. I double checked by looking at the files themselves and printing out the pbuffer attributes for both tryouts.

Also when allocating a pbuffer with rgba8 components and an intensity texture works fine. So I’m still wondering why it should be different when using float-components. Somehow it’s taking a software path.