Copying from PBO to Texture stalls the CPU with 16 bit RGB

I’ve noticed a weird edge case where glTexSubImage2D stalls the CPU for 5 seconds, but only when:

  • its internal format is 16-bit ushort
  • glTexStorage2D is not called beforehand

The stall doesn’t happen with 8-bit byte or 32-bit float textures, even if glTexStorage2D isn’t called

I’m able to reproduce it with the small demo code below. I understand texture uploads should run across multiple frames with fences, but I’ve merged it all into this one function for simplicity:

// This works great
//  var pixelType = GL_UNSIGNED_BYTE;
//  var sizedFormat = GL_RGB8;
//  uint bytesPerPixel = 3;

// This also works great
//  var pixelType = GL_FLOAT;
//  var sizedFormat = GL_RGB32F;
//  uint bytesPerPixel = 12;

// This format causes a stall on glTexSubImage2D
var pixelType = GL_UNSIGNED_SHORT;
var sizedFormat = GL_RGB16;
uint bytesPerPixel = 6;

uint textureSize = 4096;
uint textureSizeBytes = textureSize * textureSize * bytesPerPixel;


// Create a texture
var texHandle = glGenTexture();
glBindTexture(GL_TEXTURE_2D, texHandle);
glTexImage2D(GL_TEXTURE_2D, 0, GL_RGB, textureSize, textureSize, 0, GL_RGB, pixelType, NULL);
/* This call fixes the stall: */ glTexStorage2D(GL_TEXTURE_2D, 1, sizedFormat, textureSize, textureSize);
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_MIN_FILTER, GL_NEAREST);
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_MAX_FILTER, GL_NEAREST);
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_WRAP_S, GL_CLAMP_TO_EDGE);
glTexParameter(GL_TEXTURE_2D, GL_TEXTURE_WRAP_T, GL_CLAMP_TO_EDGE);
glBindTexture(GL_TEXTURE_2D, 0);


// Create and map a PBO
var pboHandle = glGenBuffer();
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboHandle);
glBufferStorage(GL_PIXEL_UNPACK_BUFFER, textureSizeBytes, null, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT);
var pboPtr = glMapBufferRange(GL_PIXEL_UNPACK_BUFFER, 0, textureSizeBytes, GL_MAP_PERSISTENT_BIT | GL_MAP_WRITE_BIT | GL_FLUSH_EXPLCIIT_BIT);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);


// Write to PBO
// ...


// Flush the PBO
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboHandle);
glFlushMappedBufferRange(GL_PIXEL_UNPACK_BUFFER, 0, textureSizeBytes);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);


// Copy from PBO to texture
glBindTexture(GL_TEXTURE_2D, texHandle);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboHandle);
/* This function stalls */ glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, textureSize, textureSize, GL_RGB, pixelType, NULL);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, 0);
glBindTexture(GL_TEXTURE_2D, 0);

I’m happy with the glTexStorage2D fix, but I’m curious what’s going on under the covers to cause this stall.

What GPU+drivers?

You make it less likely you’ll get any feedback by not posting GL code. This is GL wrappage, for some language and library.

For instance, what exactly are these, in OpenGL:

  • SizedInternalFormat.Rgb8
  • SizedInternalFormat.Rgb32f
  • SizedInternalFormat.Rgb16

I would guess:

  • GL_RGB8
  • GL_RGB32F
  • GL_RGB16UI (or GL_RGB16I or GL_RGB16)

FWIW, on NVIDIA any 3-component internal texture format isn’t natively supported. So you’re going to end up with expensive CPU-side texel conversions in the driver. You can check this on your driver with glGetInternalFormativ().

However, that wouldn’t explain the behavior you’re seeing (on NVIDIA drivers at least) given that none are native internal texture formats.

One thing to keep in mind is that with glTexStorage..() APIs, you give the driver everything it needs to allocate the texture storage. Whereas when you use the old gl..TexImage..() APIs, the driver has no idea how to allocate the texture because it has to piece together potentially multiple of these to get the full picture. So it has to defer the allocation, often until first use.

The net of this, I don’t know why you see this stall with the latter format and not the former two. But glTexStorage() is the better API to use for allocation the texture storage.

I’ve tested this on a 1070 and an RTX 3070

Good to know about the GL wrapper, I’ll change it to raw GL calls now.

That’s right, GL_RGB8, GL_RGB32F and GL_RGB16.

Thank you, that’s interesting. I would’ve thought the 3 parameters in glTexImage2D would give OpenGL everything it needs to allocate the texture.

  • GL_RGB internal format
  • GL_RGB pixel format
  • GL_UNSIGNED_SHORT pixel type

Possibly not, because it doesn’t include the amount of levels, which glTexStorage2D does?

Yes, exactly. Each glTexImage2D() call only allocates the data for 1 MIP level associated with the texture. You can allocate 2, 3, … all the way up to the max possible MIP levels, each with its own glTexImage2D() call.

glTexStorage2D() says heck with all that. Just let the user just tell us how many levels to allocate. Then we’re done.

On this wiki page, you can see how you’d implement glTexStorage2D() with multiple calls to glTexImage2D():