glBufferData vs glBufferSubData Performance


Will I take a significant performance hit when first writing into a buffer object if I call glBufferData with a NULL data pointer, followed by glBufferSubData to fill the entire buffer, instead of just calling glBufferData with the entire buffer? I’m sure there is at least some overhead given the extra GL call but is it enough overhead that I should avoid it? Do you expect this to vary between ATI and NVIDIA?


I doubt there is a performance hit.

nVidia mentions that it could have one if you call glBufferSubDataARB() when the GPU is already working on the area. But in your case, I think you are good.

The best way for you to figure out is probably to measure it with different graphics cards.

ref: page 12
This function gives you a way to replace a range of data into an existing buffer.
Note that in order to avoid conflicts, we may have to wait for the GPU if ever the
GPU is working with this area. As a consequence there could be a loss of


if you use BufferSubData, then there is a synchronization so that you don’t write over a buffer already being used. however, if that is the first time you write to this buffer, there is no synchronization required, so your two API calls should not be that bad.


Pierre B.
AMD Fellow.

Thanks for the responses. I moved forward with glBufferData followed by glBufferSubData because it is slightly cleaner (less house keeping) for what I am doing. If I do any performance tests, I will be sure to post the results.

Thanks again,

On a similar note, do you think that the same approach can be used with glTexImage2D and glTexSubImage2D? That is, is allocating a texture with glTexImage2D and then writing to it later with glTexSubImage2D reasonably close in performance to allocating and writing with just one call to glTexImage2D?


This depends on drivers. I can you tell my experience with NVIDIA.

  • glTexImage2D(…, NULL) takes almost nothing. But at the other hand it probably does almost nothing.
  • First call glTexSubImage2D(…, data) takes ages. So it seems the texture memory is allocating somewhere under the hood right now. Even if you update 1x1 pixel it takes long.
  • Subsequent calls to glTexSubImage2D(…) depends on number of pixels you transfer.

To answer your question. This is true on NVIDIA:
glTexImage2D(data) = glTexImage2D(NULL) + glTexSubImage2D(data)


an opengl texture can be in two states:

  • complete
  • incomplete

typically, an implementation will wait until a texture is complete to allocate the underlying video memory, so there is a one time cost when you transition from incomplete to complete.

once that is done, any upload the texture involves copy/transfer of the data. if that amount of data is large, then it dominates the time spent; if that amount is small, then you see the driver overhead.

if a texture is being used, then your texsubimage will need to synchronize with the current rendering operations; if the texture has never been used, then the sync can be skipped.