Batch instaced and indexed draw calls

Hi ! I have some issues with batching my sprites.

I created two VBOs, one that stores UV (2 floats per vertice) and the positions (2 floats per instance) and another one that stores the sizes (2 floats per instance). Positions and UV are supposed to change a lot because they are moving sprites.

I also have one vbo which represents an indexed rectangle (8 floats)

These are my vertex pointers respectively to the rectangle shape, the uv, the position, and the size.

    glVertexAttribPointer(vertexLoc, 2, GL_FLOAT, GL_FALSE, 0, 0); //rect

    glVertexAttribPointer(uvLoc, 2, GL_FLOAT, GL_FALSE, 0, 0); //uv

    glVertexAttribPointer(posLoc, 2, GL_FLOAT, GL_FALSE, 0, (void*)(uv_offset*sizeof(GLfloat)));// pos_x pos_y

    glVertexAttribPointer(sizeLoc, 2, GL_FLOAT, GL_FALSE, 0, 0); // size_x size_y

Then i also use glVertexAttribDivisor with parameter :

  1. 0 for UV because I use it for each vertex
  2. 1 for pos and size

What I want to do is to draw all the sprites that have the same texture in one draw call (indexed and instanced).

Instanced because I want to draw the rectangle several times and then move and scale the rectangle in the shader.

So it looks like this :

put data on GPU

define pointers

for all sprites batched by texture

bind texture

Indexed and Instanced draw all sprites that have the same texture

I use this function to draw ( batches[i].numVertices/numVerticesPerInstance is the number of sprites that have texture i) :

glDrawElementsInstanced(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, nullptr, batches[i].numVertices/numVerticesPerInstance);

Should I use another draw function ?

With 1 sprite I have a rectangle of the correct size but position doesn’t work (0,0) and uv are strange (I cannot post 2 images but it is symmetric with regard to the bottom left to top right diagonal and it seems to be stretched horizontally for top left part (1st column of pixels on left is stretched ) and vertically for the other one (symmetric)).

With 2 sprites with the same texture i have the same result for both even if I change size of second one. Even gl_InstanceID doesn’t work in shader (I tried to change position with it).

I am sure that the arrays contain what I want in the right order and the last parameter in the draw function is 2 for that case so the problem may be with glVertexAttribPointer, glVertexAttribDivisor or draw function. Thank you for your help !

Are you trying to give every sprite its own distinct UVs? You can’t do that with instancing. If you have M instances and N vertices per instance, you get M*N vertices in total. Each attribute array needs the data for either N vertices (if the divisor is zero) or for M instances (more precisely, M/D instances where D is the divisor). Any given attribute has the same value either for all vertices within an instance or for the same vertex in all instances.

So for instanced sprites, the “rectangle” (per-vertex data) would typically have both positions and UVs, while the per-instance data would have position offset, UV offset and size. And possibly also UV size, if this isn’t either constant (i.e. all sprites are the same size in texture space) or the same as the size (if all sprites use the same scale factor).

1 Like

Oh ok thank you, I though that with 0 as divisor it would have acted as usual, like without instancing… So the fact that I have UV and pos in the same vbo makes pos not working ? So if I change UV (8 floats) to UV offsets (2 floats) and send it per instance, i.e. if I keep the same kind of vbo (uv_offset + pos), this will obviously solve the uv issue as you said but also the position one ?
I’ll upload the image of the issue with UV for people that may come across this post :


Thank you again.

You can. But consider that you can render sprites for multiple textures in one draw call by using either Bindless Texture or Texture Arrays.

(You could do so with texture atlases too, but that comes with some big disadvantages.)

1 Like

Well, it does. But without instancing, you’re only drawing 4 vertices, so 4 positions and 4 UVs. If you then render multiple instances, those values are re-used for every instance. Per-instance values change from one instance to the next but are the same for all vertices within an instance.

How the data is distributed across VBOs doesn’t matter; you could put everything in one VBO or every attribute in a different VBO.

A typical approach to rendering sprites with instancing is to have a single per-vertex (divisor=0) attribute which is the unit square: [(0,0), (0,1), (1,0), (1,1)] and make everything else per-instance (divisor=1).

But note that instancing isn’t particularly efficient with such small instances. Unless the implementation batches multiple instances into a single invocation group (warp, wavefront), vertex processing will only use 4 cores out of (typically) 32 or 64. So you might be better off using non-instanced rendering, and using dependent fetches (UBO, SSBO or buffer texture, indexed using gl_VertexID/4) to avoid duplicating per-sprite data.

1 Like

@Dark_Photon Well I wanted to use texture atlases and batch my sprites by texture atalases in case I have several atlases but I’ll look at bindless textures once I’ve done this, thank you.

@GClements This is just for a training purpose but I’ll keep in mind what you said :slight_smile:.
I’ve done what u told me to do and it almost worked.
The rectangle and the UVs work well (2x float[8])
Then I have 2 vbos for per instance data :

  • one that contains uv_offset and vertices_offset (vboOffset)

  • another one with uv_scale and vertices_scale (vboScale)

I redefined my pointers like this :
`glBindBuffer(GL_ARRAY_BUFFER, _vboOffset);

//UV offset 2 coord each 4 vertices
glEnableVertexAttribArray(uvOffsetLoc);
glVertexAttribPointer(uvOffsetLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), 0);
glVertexAttribDivisor(uvOffsetLoc, 1);//One pos each 4 vertices

//Pos 2 coord each 4 vertices
glEnableVertexAttribArray(posLoc);
glVertexAttribPointer(posLoc, 2, GL_FLOAT, GL_FALSE, 2*sizeof(GLfloat), (void*)(uv_offset * sizeof(GLfloat)));
glVertexAttribDivisor(posLoc, 1);//One pos each 4 vertices

glBindBuffer(GL_ARRAY_BUFFER, _vboScale);

//UV scale 2 coord each 4 vertices
glEnableVertexAttribArray(uvScaleLoc);
glVertexAttribPointer(uvScaleLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), 0);
glVertexAttribDivisor(uvScaleLoc, 1);//One pos each 4 vertices

//Size 2 coord each 4 vertices
glEnableVertexAttribArray(sizeLoc);
glVertexAttribPointer(sizeLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (void*)(uv_offset*sizeof(GLfloat)));
glVertexAttribDivisor(sizeLoc, 1);//One pos each 4 vertices`

The UV_offset and UV_scale are correctly sent to the shader (those which don’t have offset in glVertexAttribPointer) but position and size (those which have one) are just equal to 0.
Also, what is the best practice between storing data like this {vx1, vy1, nx1, ny1, vx2, vy2, nx2, ny2, …} and {vx1, vy1, vx2, vy2, …, nx1, ny1, nx2, ny2, …} ? Thank you !

Is uv_offset double the number of sprites? Did you use the correct offset when you copied the data to the buffer?

It depends. The first one may be more efficient in terms of sending attribute data to the vertex shader. The latter may be more efficient for updating the data from the CPU if you only need to update one of the attributes.

1 Like

Is uv_offset double the number of sprites? Did you use the correct offset when you copied the data to the buffer?

Yes.

This is how I filled the VBOs, uvOffset_and_pos and uvScales_and_sizes have correct data according to debug mod :

`glBindBuffer(GL_ARRAY_BUFFER, _vboOffset);
glBufferData(GL_ARRAY_BUFFER, sizeof(uvOffset_and_pos.data()), nullptr, GL_DYNAMIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(uvOffset_and_pos.data()), uvOffset_and_pos.data());

glBindBuffer(GL_ARRAY_BUFFER, _vboScale);
glBufferData(GL_ARRAY_BUFFER, sizeof(uvScales_and_sizes.data()), nullptr, GL_STATIC_DRAW);
glBufferSubData(GL_ARRAY_BUFFER, 0, sizeof(uvScales_and_sizes.data()), uvScales_and_sizes.data());`

It depends. The first one may be more efficient in terms of sending attribute data to the vertex shader. The latter may be more efficient for updating the data from the CPU if you only need to update one of the attributes.

So if positions are updated at tick rate and sprite uv offset updated at a constant rate (more rarely than positions ~10 times), 2nd version is better, right ?

Using sizeof like this won’t work; you’ll get the size of the pointer which .data() returns. You need to multiply .size() by the size of the element type.

Yes.

1 Like

Using sizeof like this won’t work; you’ll get the size of the pointer which .data() returns. You need to multiply .size() by the size of the element type.

Oops I didn’t see that one, thank you.

I also had to define the pointers in the for loop (else just the first sprites with first texture are drawn) before each draw like this :
int ic = 0;
for (int i = 0; i < batches.size(); i++)
{
/********************************************* PER INSTANCE DATA *******************************************************/

		glBindBuffer(GL_ARRAY_BUFFER, _vboOffset);

		//UV offset 2 coord each 4 vertices
		glEnableVertexAttribArray(uvOffsetLoc);
		glVertexAttribPointer(uvOffsetLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (void*)(2*sizeof(GLfloat)* ic));
		glVertexAttribDivisor(uvOffsetLoc, 1);//One pos each 4 vertices

		glBindBuffer(GL_ARRAY_BUFFER, _vboOffset);

		//Pos 2 coord each 4 vertices
		glEnableVertexAttribArray(posLoc);
		glVertexAttribPointer(posLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (void*)(uv_offset * sizeof(GLfloat) + 2 * sizeof(GLfloat) * ic));
		glVertexAttribDivisor(posLoc, 1);//One pos each 4 vertices

		glBindBuffer(GL_ARRAY_BUFFER, _vboScale);

		//UV scale 2 coord each 4 vertices
		glEnableVertexAttribArray(uvScaleLoc);
		glVertexAttribPointer(uvScaleLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (void*)(2 * sizeof(GLfloat) * ic));
		glVertexAttribDivisor(uvScaleLoc, 1);//One pos each 4 vertices

		glBindBuffer(GL_ARRAY_BUFFER, _vboScale);

		//Size 2 coord each 4 vertices
		glEnableVertexAttribArray(sizeLoc);
		glVertexAttribPointer(sizeLoc, 2, GL_FLOAT, GL_FALSE, 2 * sizeof(GLfloat), (void*)(uv_offset * sizeof(GLfloat) + 2 * sizeof(GLfloat) * ic));
		glVertexAttribDivisor(sizeLoc, 1);//One pos each 4 vertices

		glBindTexture(GL_TEXTURE_2D, batches[i].texture);
		glDrawElementsInstanced(GL_TRIANGLES, 6, GL_UNSIGNED_SHORT, nullptr, batches[i].numVertices/numVerticesPerInstance);



		ic += batches[i].numVertices / numVerticesPerInstance;
	}

Is there a draw function that can do the following :

  • First draw call → pointer advances 3 times for example

  • Second draw call → We keep advacing the same pointer, i.e. we start at 4th “iteration”

Each draw call uses the same pointer which keep his state and is “advanced” after each draw call ?
If so I could put the pointers out of the for loop.

I’m not sure what you were doing wrong, but you should be able to draw all of the sprites with a single glDrawElementsInstanced call.

1 Like

I haven’t implemented bindless texturing yet so I have to do as many draw calls as there are textures (or texture atlases). I bet it won’t be a problem if I do bindless texturing because there is only one draw call but this is not the case for the moment.

I see. In that case, you can draw a contiguous range of instances using glDrawElementsInstancedBaseInstance (requires OpenGL 4.2 or ARB_base_instance). You can draw multiple contiguous ranges using using glMultiDrawElementsIndirect (4.3 or ARB_multi_draw_indirect).

All of these functions allow specification of a base instance, avoiding the need to call glVertexAttribPointer with an offset.

If your textures/atlases have the same dimensions, format and sampling parameters (filter and wrap modes), you can combine them into a 2D array texture to avoid the need to split draw calls. Array textures require OpenGL 3.0 or EXT_texture_array. Bindless textures require the ARB_bindless_texture extension; they aren’t yet in any core version.

1 Like

Thank you ! That was exactly the function I was searching for !
Now I’ll try to remove instancing and make array textures :slight_smile:.

Thank you very much for your precious help !

Hi,

I come back to you guys because I have a problem again with the implementation without instancing.

The thing is I don’t want to store every rectangle in the VBO because I think it’s lack of memory therefore I only send ssbo with offsets and scales for both vertices and UVs. So I have 8 * instances_count floats + the ones for primitive in GPU instead of having 16 * instances_count floats .

Then in the shader I access them thank’s to gl_VertexID/4 because I’m using indexing. I also have VBOs that contain rectangle primitive (8 floats) and the corresponding UVs (primitive UV also).

I don’t have implemented array textures yet…

Bind VBO primitive and UV
Bind SSBO
Send uniform projectionMatrix
Send uniform uint uv_offset //for VBO offset
Send uniform instanceCount //else gl_VertexID is cyclic so I use (8*instanceCount +gl_VertexID)/4

for all batches bind corresponding texture and glDrawElements

As you see I “fake” instancing method with (8*instanceCount + gl_VertexID)/4 but if there are several sprites for the same texture, only the first one will be drawn.

Basically if I have only one texture and several sprites I’ll have to use multiple times glDrawElements.

So you might be better off using non-instanced rendering, and using dependent fetches (UBO, SSBO or buffer texture, indexed using gl_VertexID/4 ) to avoid duplicating per-sprite data.

So am I forced to use as much rectangles as I need but keep uv_offset and uv_scale in ssbo to achieve what you wanted me to do @GClements ?

Just draw as many rectangles as you need with one draw call. With “fake instancing”, you don’t need any vertex attributes; you can obtain all of the data from arrays (UBO, SSBO or texture), with gl_VertexID/4 holding the rectangle index and gl_VertexID%4 holding the index of the vertex within the rectangle.

Also: if space is short, don’t use floats. You can almost certainly get away with 16-bit (or even 8-bit) values.

Ok, that’s what I though. Thank you again !