Slow initial texture upload with PBO

Hello Everybody,

I am building a library to enable simple and fast asynchronous texture uploads.

I am using the usual texture + PBO process with a single OpenGL context (i.e., map the PBO on the render thread, fill the PBO on a background thread, and afterwards back on the render thread unmap the PBO, bind it, and upload its contents to a texture). This all works functionally as expected, but I am posting here because of some “predictably irrational” performance I’ve been noticing.

Everything I read about the topic suggests that calling glTexImage2D or glTexSubImage2D with a PBO bound allows the call to return immediately. However, there seems to be large differences in the time it takes to execute glTexImage2D (or glTexSubImage2D) depending on whether or not both the PBO and the texture have been used in this process previously.

What I noticed is that if either the PBO or the texture has been allocated at the start of this process, then the call to glTexImage2D takes somewhere between 0.5 and 7 milliseconds to return. In contrast, if I use both a pre-existing PBO and a pre-existing texture (i.e., they are both pulled from pools), then the glTexImage2D operation completes in a few hundredths of a millisecond!

The difference is quite dramatic. In our test application, this means we can either have one texture ready per frame vs a hundred of them ready in a single frame.

This behavior is 100% reproducible on both a Quadro FX 4800 and a lower-end GeForce 9600M GT. And it is also true regardless of whether I use glTexImage2D or glTexSubImage2D. I thought perhaps driver swizzling was responsible, but even after switching everything over to use BGRA, the issue remains. We are also using only power-of-two sized textures.

Can somebody please explain what may be going on here? Or offer any advice for how to deal with this is? Perhaps I am doing something naive?

Many thanks in advance!

Pages for the MMU on both cpu and gpu side are being allocated and mapped. This is for both the texture and buffer-object.
Also, the texture probably has an empty version in sysram, which gets zero-initialized for OS security reasons.

I hit the same performance bug in NV driver some time ago. I went crazy about it. I was surprised that nobody else reported this earlier.
My observation was that glTexImage2D(, NULL) is fast (< 1ms). But the driver actually delays all the real initialization for later. So then the first call to glTexSubImage2D() takes ages. It takes ages no matter what kind of memory you supply (PBO or not), what size of data transfer (2k or just 1 pixel), no matter what internal texture format. It just takes long.
I’ve made some workaround that helps a bit. After I create a texture I bind it to a FBO and call glClear(GL_COLOR_BUFFER_BIT) and then I load the texture data. In the end this was faster then just loading the data directly. It does not make much sense. I know. The texture initialization is about 1ms then which is feasible for me.

Be also sure you handle mipmaping correctly. There is also a huge slowdown when loading data to NVIDIA driver “in a wrong order”.

could you explain what is “the right order”?


could you explain what is “the right order”?

thx [/QUOTE]

For example I realized that sometimes the function glGenerateMipmapEXT actually reads back the texture from GPU to system memory, reallocates the GPU texture memory and downloads it back.

All this is NV specific.

Thanks very much for the info!

mfort: do you happen to know when is the “right” time to call the mipmap generation function so that we don’t pay the penalty you describe?

First be sure the texture loading is reasonable fast without doing mipmaping.
I’ve got the best performance across all tested HW with preallocating all the texture levels in advance. Something like this:

  // first time initialization
  for(all levels) {
    glTexImage2D(..., level, ..., NULL); 
  fillBlackUsingFBOAndClear(); // this actually allocates the GPU memory

  // sometimes later
  glTexSubImage2D() with PBO  // always fast
  glGenerateMipmap()          // always fast

It would be nice to see someone from NV to tell us the best way how to load the texture without penalty in the first load.

This problem is probably marginal for most as most gfx engines preload the texture at startup and therefore they do not suffer from sudden spikes in performance.

Very interesting. Thanks again for the help!

So ARB was listening!
See ARB_texture_storage
Very nice.

I wouldn’t say “listening” so much as “was going to do it anyway”, since they couldn’t have put that spec together in a week’s time.

Sure, I was just kidding. But it was nice coincidence. This problem with creating textures in OpenGL is since day one. The problem was mostly for driver developers as it was hard to optimize. We have a new clean way to create textures after all.