Low Fps on OpenGL when rendering relatively big meshes

I am facing an issue where my FPS drops drastically (1200 fps to 80 fps) when I introduce the sponza mesh into my scene. moreover, my CPU and GPU usage are both low around 20% (CPU) and 10%(GPU). The sponza mesh that I am using is the Crytek version and has 145, 193 vertices and 262, 217 triangles. I think I am properly using VBOs and VAO to load the data into the GPU when the mesh is loaded during initialization. Here is a piece of code from my mesh class that runs once when the mesh is loaded

void Mesh::setupMesh() {
glGenVertexArrays(1, &VAO);
glGenBuffers(1, &VBO);
glGenBuffers(1, &EBO);
glBufferData(GL_ARRAY_BUFFER, vertices.size()* sizeof(Vertex), &vertices[0], GL_STATIC_DRAW) ;
glBufferData(GL_ELEMENT_ARRAY_BUFFER, indices.size()* sizeof(unsigned int), 
glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)0);
if (hasNormals) {
	glVertexAttribPointer(1, 3, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)offsetof(Vertex, normals));
if (hasTexCoords) {
	glVertexAttribPointer(2, 2, GL_FLOAT, GL_FALSE, sizeof(Vertex), (void*)offsetof(Vertex, texCoords));

And here is the part of my rendering code that is run every frame.

	for (; it != entities.end(); it++) {
	Entity* ent = (*it);

	if (ent->mesh == nullptr) {

	Mesh* currentMesh = ent->mesh;

	Shader* currentShader = nullptr;
	if(overrideMaterial) {
		currentShader = overrideMaterial->getShader();

	for (int i = 0; i < currentMesh->subMeshes.size(); i++) {

		SubMesh currentSubMesh = currentMesh->subMeshes[i];
		Shader* submeshShader = currentSubMesh.material->getShader();

		if (!currentShader && !overrideMaterial) {
			currentShader = submeshShader;

		if (!overrideMaterial && submeshShader->shaderName.compare(currentShader->shaderName)) {
			currentShader = submeshShader;

		currentShader->setMat4("modelMatrix", ent->getTransform()->getTransformationMatrix());

		unsigned int diffuseNr = 0;
		unsigned int specularNr = 0;

		for (int j = 0; j < currentSubMesh.material->textures.size(); j++) {
			Texture* currentTexture = currentSubMesh.material->textures[j];
			string name, number;
			if (currentTexture->type == TextureType::DIFFUSE) {
				name = "texture_diffuse";
				number = std::to_string(diffuseNr++);
			else if (currentTexture->type == TextureType::SPECULAR) {
				name = "texture_specular";
				number = std::to_string(specularNr++);

			currentTexture->bind(GL_TEXTURE0 + j);
			currentShader->setInt("material." + name + "[" + number + "]", j);

		currentShader->setInt("material.specularCount", specularNr);
		currentShader->setInt("material.diffuseCount", diffuseNr);
		currentShader->setInt("shadowMap", 10);
		glDrawElementsBaseVertex(GL_TRIANGLES, currentSubMesh.indexCount, GL_UNSIGNED_INT, (void*)(sizeof(unsigned int) * currentSubMesh.baseIndex), currentSubMesh.baseVertex);



I am storing the entire Sponza mesh into a single VBO and using the glDrawElementsBaseVertex to draw the different parts of the mesh that use different textures. my total draw call count according to NVIDIA NSight is 400.

I know I can add more optimizations to this scene by adding CPU level culling but I still feel 80fps is quite low for such a small scene given that I am only doing a single pass with no shadows/reflections. My question is, is this performance expected or is there some part of my code that is unoptimized and is causing this issue?

I have a GTX 1060 and i7 7700k

1 Like

Why so high? Also, how many times are you changing the shader?

So I opened up the model in blender and noticed that there are 400 meshes within it and 25 materials. So my code is mostly rendering each of those meshes in a single draw call. Also, I am not switching the shader between those calls as I have one shader that is used for all of them. I think one thing I can do is to batch up all of the meshes that use the same material into a single submesh and draw them together. but I am not sure how to do that.

1 Like

Why not simply regrouping meshes with same materials at load time ?

You also said that you have a drop of performance when introducing the sponza scene. But compared to what ? If it’s compared to drawing a single triangle, the comparison is not fair. If it’s compared to another scene with similar number of triangles and materials, then give more details.

sorry, i probably should have mentioned that in the question itself. the 1200 fps that i was getting was during rendering the crysis nanosuit model (12,910 verts and 19,070 tris). however in comparision the model only has 5 submeshes with 1 material each. also yes, regrouping meshes during load time sounds ideal however, i don’t know how i can combine vertices and indices together in such a way that they can be drawn in a single call. afaik, doesn’t glDrawElementsBaseVertex add the base vertex value to each value of the index array ? . in that case how would it work when the vertex data and the index data are not contiguous. is there some link/article you could point me to which can help me with this ?

When loading, add the base vertex index for each mesh to its indices so you can just use glDrawElements or glMultiDrawElements. Alternatively, you can use glMultiDrawElementsBaseVertex to perform the equivalent of calling glDrawElementsBaseVertex in a loop.

In any case, the first thing is to try rendering the entire mesh with a single draw call (with a single material) to see how much of the time is down to the per vertex/triangle/fragment overheads, and how much is down to the per draw call overheads. That gives you an idea of the maximum speed-up you might get from optimising draw calls, and provides a baseline for comparing different approaches. If a single draw call with a single material doesn’t provide much improvement over one draw call per sub-mesh, there isn’t much point in trying to optimise the material handling.

1200 fps is 15x faster than 80 fps. 262217 triangles versus 19070 triangles is 13.75x as many. So it’s possible that most of the performance difference is simply due to the number of triangles. Modern hardware doesn’t necessarily have much draw call overhead if you aren’t making significant state changes between calls.

I was meaning to reduce the draw calls, not to do it all in a single call, in order to try to answer the question I quoted in my previous post. And this is a C issue. You have something like this:


So simply do something like this:


So that all indices to the same material are contiguous. This will reduce your draw calls from 400 to 25. But as GClements said, you might or not see improvements. So try to figure if these draw calls are the thing to solve here.

Sorry for the late response here. @Silence I tried your advice here. I tried to batch together meshes based on their material index and I do see a significant performance boost (370 fps). So I think although I might not get back to 1200 fps this was definitely a bottleneck. I also did some minor optimizations like reducing the shader program switches but their impact was kind of minimal.

So from 80fps to 370. This is good to read. However this doesn’t mean that the pure draw calls only were a bottleneck. It can be a lot of things you’re doing around these calls.

To get back to this framerate, you need the same kind of setting. Hunting for anything that is slowing down all your program will give you more improvement. But it’s hard to tell how much you could expect.

Unless you are doing a lot of shader switches, this is generally not noticeable, even if this is one of the most costly operations to ask to your graphic card, in OpenGL.