Hi ,
I am putting together a sprite rendering engine using C++, SDL2 and Opengl 3.3 . I wanted to present the outline of my drawing algorithm in order to receive feedback in case I am doing something incredibly stupid.
Each instance in the engine has a Draw event. So far, all this does is draw an untransformed textured quad (2 triangles). Since this is a rendering engine, all instance sprite info could potentially change every frame (position, angle, scale etc), so all instance data has to be processed/changed in the VBO each frame. Each Draw event leaves some basic info in a “draw command buffer” (a simple byte array).
Then a “draw key” is generated for each instance. The info contained in this key, in order, is: Instance depth, texture ID, index in the “draw command buffer”. These keys are then sorted.
So far, each quad is 64 bytes (that will increase in the future). Each draw batch can eat up a maximum of 65536 bytes (arbitrary-may change that). A VBO is created, with a size 3x of that, essentially 3 subsections. Also an index buffer is created, using unsigned shorts as indices. There is a “section” variable, denoting in which one of the 3 VBO subsections we are.
So what happens is the following:
- Map the VBO using
glMapBufferRange()
withGL_MAP_WRITE_BIT | GL_MAP_UNSYNCHRONIZED_BIT | GL_MAP_INVALIDATE_RANGE_BIT
flags, get the correct VBO subsection mapping. - For each (now sorted) draw key, find its index in the draw command buffer to get quad info.
- If adding that quad to the VBO exceeds the subsection size, or if a different texture than the last one is used, “close” that batch (ie draw it using
glDrawElement
s). Increment section counter (go to the next VBO subsection), and set new texture. - If we are beyond the end of the VBO, orphan it using buffer respec, set section back to zero. Map that VBO subsection (same flags as above), also setting vertex attribute pointers accordingly.
- Write quad info in the VBO.
- Do that until the end, do a last draw if needed.
The algorithm works. I am trying to benchmark draw speed by putting a glFinish()
before recording the time before and after drawing. Switching textures obviously is a pain, but there are ways to mitigate this (bindless etc). Also having an arbitrary depth range can produce lots of batches!
My issue is that rendering 30K sprites with 500 different depth values and only 2 different texture pages in this way eats up half the frame time (and with a good GPU - I am using my laptop’s Nvidia RTX 2060). Is this to be expected? Am I being very stupid with something, or is it simply the cost of changing sprite info every frame? I am trying to put things into perspective here, I don’t know if I am close to “the norm” or very far away. Maybe some other user can draw 100K sprites being updated each frame and I am way behind. I really don’t know where I stand in this, hence that post.
I am trying to use a more “legacy” way of drawing sprites in order to be more compatible, and I have also read that instancing is not a very good solution for rendering simple geometry like sprites.
Let me know your thought on this. Do I have a good basic idea in my hands, or should I redesign my drawing algorithm somehow?