glBufferData just one time at start


I am AI Programmer trying to improve my knowledge about graphics programming in general.
I am writing my own game engine, I have lot of things already but I am still trying to figure out what is the best way of drawing lot of sprites.

There has been an idea that it is being in my head for quite a long time, which is not update the vertex buffer before drawing the object, but preallocate the position and texture coordinates as soon as you create the texture.
My engine is very data-driven, so I can serialize lot of classes and data.
I was thinking to create the texture and the vertex buffer and index buffer at start and then call directly the glBufferData with the proper position and textCoords, then when an object needs to be render I will just pass the transormation matrix to the shader by using an uniform.

Does it makes sense that approach?

The thing is that so far by calling glBufferSubData for each renderable I am having a not very good performance (around 60 fps drawing 3000 sprites, which seems very very slot to me)

Thanks in advance!

The way you seem to be doing this is you have a single buffer, sized for one sprite, and you make 3000 glBufferSubData calls per frame. Alternatively each sprite may have it’s own buffer so you now make 3000 glBindBuffer and 3000 glBufferSubData calls per frame.

Both of these are going to be slow.

The alternative you mention is also going to be slow, because now it’s 3000 matrix uploads to the GPU per frame.

The right way is to have one buffer, sized for 3000 sprites. Make one glBufferSubData call per frame. Make one glDrawArrays/glDrawElements call per frame.

If you can’t make one glDraw* call per frame, perhaps because your sprites have different textures, so be it, use the parameters to your glDraw* call to select subranges of the buffer to draw, but at least get those buffer updates under control.

Don’t unbind or disable state after each draw call because it will mess with any state change filtering or batching that your driver might otherwise be able to do.

Even this may not increase performance - with 3000 sprites, depending how large they are, fillrate can be your primary bottleneck. If so, that’s OK - at least you’ll know you’re bottlenecked for the right reason.

Hi mhagain!

Thanks for your reply, I was thinking about that possibility, but one more question arises to me.
Right now my shader is expecting to have a modelviewMatrix per sprite, (also a ProjectionMatrix, but that is global for all of the sprites) so it is configured as uniform. So, do you recommend to include the modelviewMatrix as part of the vertex attributes?
Because I see only 2 options here:
1.- Passing the matrix per vertex (so the matrix will be duplicated by 4 which seems a bit too much of waste)
2.- Perform the matrix operation on the CPU on the vertex before sending the vertices to the GPU.

What would it be the right approach?

Thanks! I am enjoying a lot with graphics programming!

Approach 3 would be to use instancing; this would allow you to have one matrix per sprite but have it available as vertex attribs. Be aware though that in terms of performance there’s not a huge gain from the memory saving; GPU programming is often counter-intuitive like this and sometimes “wasting memory” can actually lead to higher performance (which is one reason why I hate the word “waste” in this context; if it gives you something in return for it, surely it’s “use”, not “waste”?)

mhagain gave you a third option (instancing). A fourth option is to send each sprite as a point and use a geometry shader to convert each point to a pair of triangles.

Option 1 is feasible, but you might want to condense the matrix. For sprites, you often only need translation, rotation and uniform scale. In which case, you only need 4 values, as the matrix will always have the form

u -v x
v  u y
0  0 1

This may well be more efficient than either instancing or a geometry shader.

If you’re particularly concerned about memory, you can store an array of matrices (one per sprite) as a uniform variable or (if limits on uniform storage are an issue) a texture, then index the array using either gl_VertexID or an integer vertex attribute. But the array/texture lookup has a cost relative to receiving the data directly as a vertex attribute.

Also, if you’re splitting draw calls because of different textures, consider using either an array texture or an array of samplers (with the texture ID passed similarly to the transformation) so that you can coalesce the draw calls. This would also avoid needing to group sprites by texture. In turn, that would allow you to sort by depth, so you can render either back to front (eliminating the need for a depth buffer) or front to back (maximising the effect of early depth tests).

Hi guys I got news,

Since my level was made of tiles I just construct a class called Mesh where I can just collect all the vertices + indices for the tiles that share the same texture.
So now I am using just 1 draw call, the performance has increased a lot.

I will try your suggestion about the matrix.
I still have lot of questions but I will continue investigating a bit more.

So thanks for your advises, it was very helpful!

FWIW, you can draw a tile map as a single large quad (pair of triangles), with the fragment shader dividing the pixel coordinates by the tile size to obtain the tile indices (quotient) and the offset within the tile (remainder).

Even if you don’t go that far, a grid of tiles warrants a different approach to sprites.

Thanks GClements, I might be able to try that just for curiosity and to improve my skills with shaders!

Oh btw, regarding to the model matrix I realized that I need to rotate sprite vertices on the CPU because I need to calculate the AABB anyway. so I will probably not even try to pass the matrix as a vertex attribute because of that :frowning:

Oh btw, regarding to the model matrix I realized that I need to rotate sprite vertices on the CPU because I need to calculate the AABB anyway.[/QUOTE]
Another option is to use transform feedback mode to capture the transformed vertices. But the likely pipeline stall from copying the data back to CPU memory may well make this approach slower overall than transforming the vertices on the CPU.

Actually I have been thinking in a more elegant solution.

I might generate the AABB of the vertices without any rotation. Then rotate the AABB (so it will not be an AABB anymore, but an Oriented Box), and then calculate the AABB over the rotated Box.

I think that will work and I can still try to experiment with sending the matrix in the vertex array.


Rotating an AABB and deriving a new one is going to be considerably more work than transforming 4 verts for a sprite, although as a trade off it would let you keep the data static.

On the other hand, if you’re sending a matrix to the GPU per sprite that’s 16 floats, whereas a full set of positions for a sprite are just going to be 12 floats.

Then, depending on what you’re using your bounding box for, you might find that a bounding sphere is more efficient.

So there’s no easy answer to this; all I can say is that I’d encourage you to experiment with a few different approaches, but once you get a good enough level of performance, call it done.

If the vertices are vec3’s rather than vec4’s, then the matrix would only be 3x4 (mat4x3, as GLSL terminology has the rows and columns flipped). I suspect that it may be even simpler than that (“sprite” generally implies a plane perpendicular to the Z axis, which would imply 7 values: a mat2x2 and a vec3).