performance problem with glTexSubImage2D on recent Nvidia card

I have a little video texture application which basically consists of:

  • creating the texture once using glTexImage2D
    glTexImage2D( GL_TEXTURE_2D, /level/0, /internal_format/GL_RGBA, 2048, 2048, /border/0, /format/GL_RGBA, GL_UNSIGNED_BYTE, NULL );

  • filling the texture at every frame using glTexSubImage2D( GL_TEXTURE_2D, /level/, /xoffset/ 0, /yoffset/0, 2048, 1556, /format/GL_RGBA, GL_UNSIGNED_BYTE, MyVideoTexture );

This is on a Quadro FX 3400 card PCI-EXPRESS, which is supposed to be very fast.

But I only get 8 fps on this card while I can get 40 fps on a similar ATI card (PCI-EXPRESS).
Worse is that on an older Nvidia card (AGP) I can get 20 fps (ok it’s on another machine but it’s an older one!).

I just changed the code to use GL_TEXTURE_RECTANGLE instead of GL_TEXTURE_2D and now things are really fast but I just hate this extension as it forces me to change all my texture coordinates. I found that the extension ARB_non_power_of_two is available on this machine but not sure how to activate it? Is it automatic?

I also tried different internal formats: 4, GL_BGRA, etc. but no luck.

Indeed I tried several driver versions: 61.77, 61.82, 67.20, 70.41 and no luck.

I am sending a bug report to Nvidia but I was wondering if anyone has a hint on how to make this work with nvidia at a decent rate ?

ARB_now_power_of_two is just an ‘hint’ to says that GL_TEXTURE_2D non longer requires to be ‘power of two’ sized.

There is an extension called GL_APPLE_client_storage on MacOS X, which let you to use the system memory for your textures, instead of copy the ‘client’ version to the video memory everytime, but there is no equivalent on Windows (AFAIK)

You can try to have a look to GL_EXT_texture_object as well, and use the glPrioritizeTexturesEXT with a priority of 0.0

Use NV_pixel_data_range and fences or use EXT_pixel_buffer_objects. Also, use GL_BGR or GL_BGRA pixelformat. Im realy wondering why your app have such poor performances. Usually, NV hardware are MUCH faster in pixel transfer than ATI hw.

On my machine (P4 2.8GHz, FX5900, AGP8x) I can get ~1.8GB/sec in glTexSubImage2D.

Visit GLBench page for more info.


Please read PBO Spec . At end of this sped is few examples how to use this extension for streaming textures.