Cost of glBufferSubData

Let’s say I have a single stream buffer. I want to stream 36 vertices and 36 texture coordinates. I need to put the vertices into one vertex attribute and the texture coordinates in another. I have two options to do this, the first is to call glBufferSubData and fill it with the vertex information, then call glVertexAttribPointer, then call glBufferSubData again and fill it with the texture coordinates, then call glVertexAttribPointer. The second way is to create an array, fill it up with the vertex data, then in the same array fill the rest with texture coordinate data, then call glVertexAttribPointer for the vertex data, then since I’ve already got everything I need in the buffer, call glVertexAttribPointer for the texture coordinate data again but with an offset that is equal to the size of the vertex data. So essentially, the same amount of data but with a different amount of calls.

I find in some cases using the latter is a bit more difficult because I have to attach two different data groups in one buffer, though, in theory, would perform better.

My question isn’t limited to that specific situation, but in general, when streaming data what is the cost of calling glBufferSubData more times with a smaller buffer then fewer times with a large buffer?

You have a third option:

struct myVertex {
   float position[3];
   float texcoord[2];

myVertex data[36];

// fill myVertex data

glBufferSubData (GL_ARRAY_BUFFER, 0, sizeof (data), data);
glVertexAttribPointer (0, 3, GL_FLOAT, GL_FALSE, sizeof (myVertex), offsetof (myVertex, position));
glVertexAttribPointer (1, 2, GL_FLOAT, GL_FALSE, sizeof (myVertex), offsetof (myVertex, texcoord));

In general this will be the preferred way to fill it as strided/interleaved vertices will draw faster owing to better cache locality (special cases do exist but unless you know that you need one, you should optimize for the general case).

what is the cost of calling glBufferSubData more times with a smaller buffer then fewer times with a large buffer?

GPUs are like hard disks and networks in this regard. To simplify, each update has two costs: one is the cost of the data transfer based on the size of data transferred, the second is a fixed overhead per-update (there are further complexities such as synchronization, but we’ll ignore them for the purposes of this explanation). So if x is the total size of the data, y is the fixed overhead and n is the number of updates, the total cost is x + ny. It should therefore be obvious that for a given value of x, you should be looking to keep n as low as possible.

In other words, a single large update (or as few as possible large updates) should always be considered preferable to multiple small updates.