Insane lags when drawing cubes

hello guys,
I’m trying to create a game that has tons of cubes ( around millions or even more ), so i have read about instancing and i have done what the tutorial said.
but when i add 100,000 cubes to the scene i have insane lags ( can’t move the mouse or exit ).

here is my source:

void Cube::Add(std::vector<glm::vec3> position)
	glGenVertexArrays(1, &instanceVBO);
	glBufferData(GL_ARRAY_BUFFER, sizeof(glm::vec3) * position.size(), &position[0], GL_STATIC_DRAW);
	glBindBuffer(GL_ARRAY_BUFFER, 0);

	glGenVertexArrays(1, &VAO);
	glGenBuffers(1, &VBO);
	glBindBuffer(GL_ARRAY_BUFFER, VBO);
	glBufferData(GL_ARRAY_BUFFER, vertices.size() * sizeof(glm::vec3), &vertices[0], GL_STREAM_DRAW);

	glBindTexture(GL_TEXTURE_2D, Texture);
	glUniform1i(TextureID, 0);

	glVertexAttribPointer(0, 3, GL_FLOAT, GL_FALSE, 0, (void*)0);
	glBindBuffer(GL_ARRAY_BUFFER, uvBuffer);
	glVertexAttribPointer(1, 2, GL_FLOAT, GL_FALSE, 0, (void*)0);

	glBindBuffer(GL_ARRAY_BUFFER, instanceVBO);
	glVertexAttribPointer(2, 3, GL_FLOAT, GL_FALSE, 0, (void*)2);
	glBindBuffer(GL_ARRAY_BUFFER, 0);
	glVertexAttribDivisor(2, 1);

	size = position.size();

void Cube::Draw(GLuint& programID)
	// Use Shaders

	// Bind Texture to Cube
	glDrawArraysInstanced(GL_TRIANGLES, 0, vertices.size(), size);

Cube Vertex Shaer:

#version 330 core

layout (location = 0) in vec3 position;
layout (location = 1) in vec2 vertexUV;
layout (location = 2) in vec3 offset;

out vec2 UV;

uniform mat4 MVP;

void main()
	gl_Position = MVP * vec4(position + offset, 1);

	UV = vertexUV;

Camera Update:

void Camera::Update(glm::vec3 const& scale,
					glm::vec3 const& position)
	glm::mat4 Model = glm::mat4();
	Model = glm::scale(Model, scale);
	Model = glm::translate(Model, position);
	glm::mat4 mvp = ProjectionMatrix * ViewMatrix * Model;

	glUniformMatrix4fv(MatrixID, 1, GL_FALSE, &mvp[0][0]);


int main()
	Functions funcs;

	Window window(WIDTH, HEIGHT, "Minecraft Mechanics");

	// Creating Shader
	GLuint programID = funcs.LoadShaders("src/shaders/cube.vert", "src/shaders/cube.frag");

	// Creating Cube
	Cube cube(programID);
	std::vector<glm::vec3> pos;
	int cx = 0, cz = 0;
	for(unsigned int x = 0; x < 100; x++)
		pos.push_back(glm::vec3(cx, 0, cz));

		for(unsigned int z = 0; z < 100; z++)
			pos.push_back(glm::vec3(cx, 0, cz));
			cz += CUBEDIST;

		cx += CUBEDIST;
		cz = 0;


	Camera camera(glm::vec3(0, 10, 0), 90.f, 0.001f);
	camera.Setup(*window.getWindow(), programID);

	glClearColor(0.53f, 0.81f, 0.98f, 1.0f);

	double lastTime = glfwGetTime();
 	int nbFrames = 0;

		// ----- FPS ------ //
     	double currentTime = glfwGetTime();

     	if ( currentTime - lastTime >= 1.0 )
         	printf("%f ms/frame
", 1000.0/double(nbFrames));
         	nbFrames = 0;
         	lastTime += 1.0;

		// Cube
		for(unsigned int i = 0; i < cube.getSize(); i++)
			camera.Update(cube.getScale(), pos[i]);


	return 0;

And i have added what the profiler said:

For couple of days I’ve been trying to fix this.
What am i doing wrong ?
Thanks for help!

Actually… 100,000 cubes in a scene would cause lag probably, depending upon your graphics card’s abilities.

Try starting with smaller amounts, like 1,000, and keep ramping it up, until you find the lag starts. You are probably doing everything correctly, because, again yeah, 100,000 cubes should cause lag.

(here is a link to my youtube channel on opengl programming tutorials, by the way…)

What you’ve done wrong is make the classic OO mistake of having each object maintain it’s own GL state and GL objects, and be responsible for drawing itself. This approach just doesn’t scale, leads to 100,000 draw calls, and will bring most hardware to it’s knees.

Instead of this you need to start hatching objects, so that you can handle multiple objects per draw call and start getting performance back. Modern hardware can easily handle the object counts you have; this is a design problem.

One approach is that instead of drawing each object as it passes (i.e. In an object::draw call) you instead add it to a list of drawables. At some later time you take that list, construct a big batch out of it, then draw it in as few draw calls as possible.

I see what mhagain is saying, yeah, I had that problem early on in learning opengl as well when I started to put classes into the picture. The CPU would be the bottleneck, or more to the point the CPU to GPU communication is your bottleneck.


At this stage I also have to mention the dreaded unbinding.

Simplified, what goes on inside a GL driver when you issue a draw call looks something like this:

if (StateHasChanged)
    // this part is really really expensive

If you have a draw loop with no unbinding, something like:

for (LotsOfObjects)

Then your driver may be able to intelligently optimize for cases where state doesn’t actually change, and you can get lots of draw calls done fairly quickly.

On the other hand, look at it with unbinding:

for (LotsOfObjects)

Now you’re going to hit the validation check every single iteration through the loop. You’ve taken what could have been a quick and simple driver optimization and completely destroyed it.

If you have a problem that unbinding solves, then don’t unbind - go back and fix the real source of the problem. If you don’t have a problem that unbinding solves, then why the f%#*!; are you unbinding? Either way - don’t unbind.

And to make it simple and more clear to the OP: