glDrawArrays faster than glDrawElements

I’m rendering a number of quads (as 2 triangles), the quads don’t line up with each other, therefore the triangles to render only share one edge (2 vertices).

The naive way of rendering with VBO and glDrawArrays gives me 23XXfps whereas by using an index buffer and glDrawElements gives me 22XXfps. I was expecting glDrawElements to be slightly faster, instead of slightly slower. Because the indices are only 4 per quad, whereas without it would be 6 vertices, without any reusing of previous vertices.

I realize that this is quite a fringle application, wanting to draw quads that don’t share any edges with each other, but still a surprising outcome to me - could anyone please explain or point me to what I could be doing differently?

code snippets:
//routine to assemble the VBO:

	int numVerticies = numQuads*4; //for DrawElements
	//int numVerticies = numQuads*6; //for DrawArrays

	std::vector<uint16_t> indices;

	GLfloat* vertexBufferPositions = new GLfloat[numVerticies*2]; //each quad has 4 verticies á 2 coordinate values: x,y, meaning 8 entries per char total
	GLfloat* vertexBufferUV = new GLfloat[numVerticies*2]; //each quad has four UV coordinates per vertex: u,v meaning 8 entries per char total
	for(unsigned int i=0; i<numQuads; i++)
		//VBO for DrawArrays:
		unsigned int vertexIndex = i*12;
		unsigned int uvIndex = i*12;

		//first triangle of quad
		//top left vertex
		vertexBufferPositions[vertexIndex] = quadList[i].topLeftX;
		vertexBufferPositions[vertexIndex+1] = quadList[i].topLeftY;
		vertexBufferUV[uvIndex] = quadList[i].textureTopLeftX;
		vertexBufferUV[uvIndex+1] = quadList[i].textureTopLeftY;
		//bottom left vertex
		vertexBufferPositions[vertexIndex+2] = quadList[i].bottomLeftX;
		vertexBufferPositions[vertexIndex+2+1] = quadList[i].bottomLeftY;
		vertexBufferUV[uvIndex+2] = quadList[i].textureBottomLeftX;
		vertexBufferUV[uvIndex+2+1] = quadList[i].textureBottomLeftY;

		//bottom right vertex
 		vertexBufferPositions[vertexIndex+4] = quadList[i].bottomRightX;
 		vertexBufferPositions[vertexIndex+4+1] = quadList[i].bottomRightY;
 		vertexBufferUV[uvIndex+4] = quadList[i].textureBottomRightX;
 		vertexBufferUV[uvIndex+4+1] = quadList[i].textureBottomRightY;

		////////second triangle of quad

		//top left vertex
		vertexBufferPositions[vertexIndex+6] = quadList[i].topLeftX;
		vertexBufferPositions[vertexIndex+6+1] = quadList[i].topLeftY;
		vertexBufferUV[uvIndex+6] = quadList[i].textureTopLeftX;
		vertexBufferUV[uvIndex+6+1] = quadList[i].textureTopLeftY;

		//bottom right vertex
		vertexBufferPositions[vertexIndex+8] = quadList[i].bottomRightX;
		vertexBufferPositions[vertexIndex+8+1] = quadList[i].bottomRightY;
		vertexBufferUV[uvIndex+8] = quadList[i].textureBottomRightX;
		vertexBufferUV[uvIndex+8+1] = quadList[i].textureBottomRightY;

		//top right vertex
		vertexBufferPositions[vertexIndex+10] = quadList[i].topRightX;
		vertexBufferPositions[vertexIndex+10+1] = quadList[i].topRightY;	
		vertexBufferUV[uvIndex+10] = quadList[i].textureTopRightX;
		vertexBufferUV[uvIndex+10+1] = quadList[i].textureTopRightY;
		//VBO and IBO for DrawElements
		unsigned int vertexIndex = i*8; //4*2 -> x,y per vertex
		unsigned int uvIndex = i*8; //4*2 -> u,v per vertex
		unsigned int indexOffset = i*4; //4*1

		////first triangle of quad

		//bottom left vertex
		vertexBufferPositions[vertexIndex] = quadList[i].bottomLeftX;
		vertexBufferPositions[vertexIndex+1] = quadList[i].bottomLeftY;
		vertexBufferUV[uvIndex] = quadList[i].textureBottomLeftX;
		vertexBufferUV[uvIndex+1] = quadList[i].textureBottomLeftY;

		//top left vertex
		vertexBufferPositions[vertexIndex+2] = quadList[i].topLeftX;
		vertexBufferPositions[vertexIndex+2+1] = quadList[i].topLeftY;
		vertexBufferUV[uvIndex+2] = quadList[i].textureTopLeftX;
		vertexBufferUV[uvIndex+2+1] = quadList[i].textureTopLeftY;

		//bottom right vertex
		vertexBufferPositions[vertexIndex+4] = quadList[i].bottomRightX;
		vertexBufferPositions[vertexIndex+4+1] = quadList[i].bottomRightY;
		vertexBufferUV[uvIndex+4] = quadList[i].textureBottomRightX;
		vertexBufferUV[uvIndex+4+1] = quadList[i].textureBottomRightY;

		////second triangle of quad

		//bottom right vertex
		indices.push_back(2+indexOffset); //since we already have this vertex, we only store the index

		//top left vertex
		indices.push_back(1+indexOffset); //since we already have this vertex, we only store the index

		//top right vertex
		vertexBufferPositions[vertexIndex+6] = quadList[i].topRightX;
		vertexBufferPositions[vertexIndex+6+1] = quadList[i].topRightY;
		vertexBufferUV[uvIndex+6] = quadList[i].textureTopRightX;
		vertexBufferUV[uvIndex+6+1] = quadList[i].textureTopRightY;
	glBindVertexArray(m_vertexArrayID); // Bind our Vertex Array Object so we can use it

		glGenBuffers(1, &m_vertexbuffer);
	glBindBuffer(GL_ARRAY_BUFFER, m_vertexbuffer);
	glBufferData(GL_ARRAY_BUFFER, numVerticies*2*sizeof(GLfloat), vertexBufferPositions, GL_DYNAMIC_DRAW); // Give our vertices to OpenGL.

		glGenBuffers(1, &m_uvBuffer);
	glBindBuffer(GL_ARRAY_BUFFER, m_uvBuffer);
	glBufferData(GL_ARRAY_BUFFER, numVerticies*2*sizeof(GLfloat), vertexBufferUV, GL_DYNAMIC_DRAW); // Give our uv's to OpenGL.

		glGenBuffers(1, &m_elementBuffer);
	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_elementBuffer); // Generate a buffer for the indices
	glBufferData(GL_ELEMENT_ARRAY_BUFFER, m_numIndices*sizeof(uint16_t), &indices[0], GL_DYNAMIC_DRAW); // Give our indices to OpenGL.

//routine for rendering

//bind the texture
	glBindVertexArray(m_vertexArrayID); // Bind our Vertex Array Object so we can use it

	glBindBuffer(GL_ARRAY_BUFFER, m_vertexbuffer); // This will talk about our 'vertexbuffer' buffer

		0,					// attribute 0. No particular reason for 0, but must match the layout in the shader.
		2,					// size
		GL_FLOAT,			// type
		GL_FALSE,			// normalized?
		0,					// stride, can be 0 for tightly packed array, or user specified: 3*sizeof(GLfloat)
		(void*)0			// array buffer offset

	glBindBuffer(GL_ARRAY_BUFFER, m_uvBuffer); // This will talk about our 'vertexbuffer' buffer

		1,					// attribute 1. No particular reason for 1, but must match the layout in the shader.
		2,					// size
		GL_FLOAT,			// type
		GL_FALSE,			// normalized?
		0,					// stride, can be 0 for tightly packed array, or user specified: 2*sizeof(GLfloat)
		(void*)0			// array buffer offset

	glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, m_elementBuffer);

	//and more setup stuff for the shader..

	// Draw the thing
	//glDrawArrays(GL_TRIANGLES, 0, m_numVerticies); // Starting from vertex 0 to vertices total
	glDrawElements(GL_TRIANGLES, m_numIndices, GL_UNSIGNED_SHORT, (void*)0); // last parameter is the element array buffer offset

	glBindBuffer(GL_ARRAY_BUFFER, 0);
	glBindVertexArray(0); // Unbind our Vertex Array Object
	//unbind the texture

First, I’d suggest not resetting the VAO each time. Your drawing routine should be at most

	//and more setup stuff for the shader..
	glDrawElements(GL_TRIANGLES, m_numIndices, GL_UNSIGNED_SHORT, (void*)0);

Setting the VAO state only needs to be done in the initialisation function. That’s the point of VAOs. Some of the “more setup stuff for the shader” may also be redundant; e.g. default-block uniforms are stored in the program object and don’t need to be set each time if they don’t change.

Other than that, when the FPS figure is in the thousands, rendering time is likely to be dominated by fixed overheads. Even a small additional overhead will outweigh any per-vertex or per-primitive costs. Try increasing the number of quads 10x or 100x and compare glDrawArrays versus glDrawElements.

Ultimately, there’s no point in benchmarking trivial test cases. Benchmarks are only meaningful on “real” code or something which is very close to it. E.g. are you actually going to be drawing the same quads with the same positions and texture coordinates each frame? If you’re eventually going to be changing the vertex positions, then the cost of updating 1.5x as many vertices has to be taken into account (and again, if you only have a dozen quads, fixed overheads are likely to outweigh per-vertex overheads).

Thank you for your response GCIelements.

To give you a bit more background info: the quads contain text from a font texture. They will be drawn in the same position each frame and thus won’t get changed unless the text changes, which is very rare. Also, the snipped for drawing shows not the entire render loop, but one of the strings to be rendered (they are not yet batched all together).

To your point about VAO states: I had no luck with removing the

glBindBuffer(GL_ARRAY_BUFFER, ..

calls from the render loop, even when I moved them to the VAO assembly function. Could you please elaborate on how one could get away with setting these each frame?
Your comment did help me figuring out that during cleanup I did not need all of the functions currently there though.

To your point about benchmarking: I’ve just now tried increasing the number of quads (strings) to be rendered by a factor of 1000. I’m now at roughly 1600fps with the same percentage of drop compared to glDrawArrays(…). I could try more, but a factor of 1000x is already a lot and more than the typical use case for this code.

I’m guessing that the overhead produced by creating and using indices outweighs the performance benefits in the case where there is a reduction of only 0.66x of vertices/indices. Unlike in a mesh where this factor would be much better.

The init code needs the calls to


The draw code shouldn’t have any of those except for glBindVertexArray. All state relating to attribute arrays (plus the element buffer binding) is stored in a VAO. Once you’ve set the VAO state, there’s no need to change it. The draw code just binds the VAO (then optionally unbinds it when it’s done). Just be sure not to change any of it; there shouldn’t be any calls to glDisableVertexAttribArray or
glBindBuffer(GL_ELEMENT_ARRAY_BUFFER, 0) anywhere. Unbinding GL_ARRAY_BUFFER doesn’t matter, as the current binding isn’t stored in the VAO (each glVertexAttribPointer call stores the buffer which was bound at the time of the call).

I’ve tried numerous ways and combinations of leaving any of the function calls out of the render loop and only have them in the init code. Without luck, my objects don’t show up on the screen except for one frame. Maybe this method would work if there was nothing else to be rendered in the scene, but for me it doesnt work like that. And I’m not alone, for instance these two popular tutorials also have these gl function code in their render loop:

I think they are vital.

In any case, thanks for pointing out that my cleanup was overkill for my render loop.

So the only open question from my pov is why the overhead of indexing eats up the benefit of saving 30% of the verticies. If someone could explain that, that’d be great!

They aren’t. The whole point of VAOs is so that you can switch between different “meshes” with a single glBindVertexArray call. If the number of attributes is large (as of OpenGL 4.6, implementations are required to support at least 16 attributes), setting up the attribute arrays from scratch is a significant overhead.

Even without VAOs, there’s no actual reason for the tutorial to set up and reset the state each frame, as it’s only drawing one mesh.

Songho gives very good information. However, it’s a very old site. Your page is about VBO, most probably when VAO didn’t existed yet.

You can for example base your code on this tutorial.