Time taken to copy data from PBO to Texture (on the GPU side) is longer than expected

Hi all,

I’ve set up an asynchronous transfer from CPU memory → PBO memory → texture and it’s working great. The CPU time is near zero and it uses fences and a pool of PBOs and textures to ensure one texture is never being updated and rendered at the same time.

I use glTexStorage2D and glBufferStorage to ensure the GPU allocates RAM upfront.

The upload process happens over multiple frames and is as follows:

  • Map a PBO with the persistent bit and flush explicit bits set
  • Write to the mapped pointer on another thread
  • Flush the PBO with glFlushMappedBufferRange
  • Insert a fence
  • When the fence has signaled, copy from PBO to texture with glTexSubImage2D
  • Insert a fence
  • When the fence has signaled, render the texture to the screen
  • Insert a fence (when this fence signals, the PBO and texture are added back to the pool)
  • Repeat the process with another PBO and texture from the pool

This runs at about 600 FPS, which is less than I expected. I used glQueryCounter to measure how long each step is taking, and found copying from the PBO to the texture with glTexSubImage2D stage takes 1.1 - 1.5 seconds on the GPU, and about 0.05ms on the CPU.

The textures are 1920x1080px and use the RGBA8 internal format and BGRA pixel format. I tried with smaller textures the speed increases directly proportionally with the amount of pixels in the texture.

Is 1.5 seconds standard 8 megabyte texture? This is on a 3070 NVIDIA card and was told the BGRA pixel format would involve a DMA transfer, so I’m wondering if there’s some kind of data transformation happening under the covers.

Update (can’t edit the above post):

It should say 1.1 - 1.5 milliseconds.

The specific code in question is:

Gl.QueryCounter(queries[0], QueryCounterTarget.Timestamp);

tex.Bind();
pbo.Bind();
Gl.TexSubImage2D(TextureTarget.Texture2D, 0, 0, 0, 1920, 1080, PixelFormat.Bgra, PixelType.UnsignedByte, null);
pbo.Unbind();
tex.Unbind();

Gl.QueryCounter(queries[1], QueryCounterTarget.Timestamp);

Then later on:

Gl.GetQueryObject(queries[0], QueryObjectParameterName.Result, out startGPU);
Gl.GetQueryObject(queries[1], QueryObjectParameterName.Result, out endGPU);

Console.WriteLine(endGPU - startGPU)

The GPU time is always 1.1-1.5ms whether I actually draw with this texture or not.

OK, so… what’s the problem? At 1.5 milliseconds for an 8 MB texture, that’s a transfer rate of approximately 5.3 gigabytes per second.

I’d say that’s not bad.

It’s not the worst but it’s the slowest part of my rendering pipeline.

The entire 3D scene renders in under 1ms, so having the UI take longer than that feels like something’s broken.

I read through GTC2012-Texture-Transfers and I’m looking into multiple GL contexts now, so I can copy from the PBO to the textures on another thread/context.

This will hopefully also solve the Pixel-path performance warning: Pixel transfer is synchronized with 3D rendering warnings.

Before jumping into multiple contexts - something still isn’t right.

The example here OpenGL Pixel Buffer Object (PBO) uses 1 PBO to stream data to a 2048x2048 RGBA8 texture, and it takes 0.04ms on the GPU

// start to copy from PBO to texture object
t1.start();

// bind the texture and PBO
glBindTexture(GL_TEXTURE_2D, textureId);
glBindBuffer(GL_PIXEL_UNPACK_BUFFER, pboIds[index]);

// copy pixels from PBO to texture object
// Use offset instead of ponter.
glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, IMAGE_WIDTH, IMAGE_HEIGHT, PIXEL_FORMAT, GL_UNSIGNED_BYTE, 0);

// measure the time copying data from PBO to texture object
t1.stop();
copyTime = t1.getElapsedTimeInMilliSec();

The only difference I could see is it uses orphaning, so I tried that but it didn’t make a difference. I also tried with a 2048x2048 texture rather than 1920x1080, but there was no difference.

There are times when it executes in 0.04ms, but it fluctuates wildly up to 1.5ms. This example here doesn’t seem to fluctuate at all