Help With Instanced Drawing

Hello guys,

I am trying to render a few thousand/million cubes using glDrawArraysInstanced(). This is the result I’m currently getting vs what happens when I draw each cube individually:[ATTACH=CONFIG]1618[/ATTACH][ATTACH=CONFIG]1619[/ATTACH]

Here is the code used to draw:

Terrain::Terrain(GLuint id, GLuint uniformHandle) {

	 programID = id;
	 mvpHandle = uniformHandle;

	 Block a;

	 ModelMatrixHandle = glGetUniformLocation(programID, "ModelMatrix");

	for (int i = 0; i < 100; i++) {
		for (int j = 0; j < 2; j++) {
			for (int k = 0; k < 25; k++) {
				cubes.push_back(Block(programID, i, j, k, DIRT));
				cubepositions.push_back(glm::vec3(i, j, k));

	glGenBuffers(1, &cubepositionbuffer);
	glBindBuffer(GL_ARRAY_BUFFER, cubepositionbuffer);
	glBufferData(GL_ARRAY_BUFFER, cubepositions.size() * sizeof(glm::vec3), &cubepositions[0], GL_STREAM_DRAW);


void Terrain::render(glm::mat4 View, glm::mat4 Projection) {
	for (int i = 0; i < cubes.size(); i++) {

		glUniformMatrix4fv(ModelMatrixHandle, 1, GL_FALSE, &cubes[i].getModelMatrix()[0][0]);



	glBindBuffer(GL_ARRAY_BUFFER, cubepositionbuffer);
	glVertexAttribPointer(3, 3, GL_FLOAT, GL_FALSE, 0, (void*)0);
	glVertexAttribDivisor(3, 1);

	//glDrawArraysInstanced(GL_TRIANGLES, 3, Block::getVerticesSize(), cubepositions.size());

Where prepareBatch() loads the vertex+uv buffers for the a cube and calls glVertexAttribPointer/glEnableVertexAttribArray

I am trying to create the translation/model matrix inside the GLSL vertex shader using a vec3 of coordinates rather than send the shader a full mat4 modelmatrix.

#version 330 core

// Input vertex data, different for all executions of this shader.
layout(location = 0) in vec3 vertexPosition_modelspace;
layout(location = 1) in vec2 vertexUV;
layout(location = 2) in vec3 vertexNormals;
layout(location = 3) in vec3 vertexPosition_worldSpace;

// Output data ; will be interpolated for each fragment.
out vec2 UV;

// Values that stay constant for the whole mesh.
uniform mat4 MVP;

uniform mat4 ViewMatrix;
uniform mat4 ProjectionMatrix;
uniform mat4 ModelMatrix;

void main(){

	//mat4 model = mat4	(1.0,   0,   0, vertexPosition_worldSpace.x,
	//					 0,   1.0,   0, vertexPosition_worldSpace.y,
	//					 0,     0, 1.0, vertexPosition_worldSpace.z,
	//					 0, 	0,   0, 			1.0);

	gl_Position = ProjectionMatrix * ViewMatrix * ModelMatrix * vec4(vertexPosition_modelspace, 1);
	//gl_Position = ProjectionMatrix * ViewMatrix * model * vec4(vertexPosition_modelspace, 1);


	// UV of the vertex. No special space for this one.
	UV = vertexUV;

My fragment shader is simply sampling a texture:

#version 330 core

// Interpolated values from the vertex shaders
in vec2 UV;

// Ouput data
out vec3 color;

// Values that stay constant for the whole mesh.
uniform sampler2D myTextureSampler;

void main(){

	// Output color = color of the texture at the specified UV
	color = texture( myTextureSampler, UV ).rgb;

If anyone could help me get a foothold with this instance rendering I’d greatly appreciate it!

This is wrong. Matrix constructors require their parameters in column-major order, so the translation should be at the end, e.g.

	mat4 model = mat4	(1.0,   0,   0, 0,
				   0, 1.0,   0, 0,
				   0,   0, 1.0, 0,

or more succinctly:

	mat4 model = mat4	(1.0,   0,   0, 0,
				   0, 1.0,   0, 0,
				   0,   0, 1.0, 0,
				   vertexPosition_worldSpace, 1.0);

(Arguments to vector and matrix constructors can include vectors and matrices, in which case the result is as if all of the elements were inserted as individual arguments).

But if the matrix is guaranteed to be a translation, you may as well just use:

	gl_Position = ProjectionMatrix * ViewMatrix * vec4(vertexPosition_worldSpace + vertexPosition_modelspace, 1);

Hey Thanks a bunch!

Yeah pretty simple fix, just didn’t understand the GLSL matrix syntax.

Do you have any other recommendations for optimizing further ?

I plan on implementing a VAO for each glDrawInstanced but this seems like it is now GPU limited. Using indexed draws may help some I am implementing that now. I am currently getting about 71.4 ms frames (14fps) for 70,000 cubes and would ideally like to render about 200,000 (visible cubes) if possible.

Besides using a VAO to eliminate extra calls setting up attrib settings, and using indexed draws (to lower buffer memory usage ?) I’m not sure what else (if anything else) can be done to expedite rendering.

I would assume that creating rectangular prisms/geometries out of all blocks with similar textures would be the best way to go ( I assume some GPU form of drawArrays is still used in drawArraysInstanced and cutting that down would help significantly).

Should each geometry have its own VAO ? (unrelated but trying to implement separate VAO’s for each object has destroyed my project a few times so I gave up and did instancedrawing)

Is the best way to draw repeating texture over a geometry creating separate faces and texture wrapping each geometric face?

Also, do you know what this phenomenon is:


The dark lines defining block edges fade away for a certain width of the screen. Not sure if avoidable or not. Might go away when/if basic light shading is implemented.

Multiply the model, view and projection matrices outside the shader, so you have a combined model-view-projection matrix (if you need eye-space coordinates for lighting, pass in the model-view and model-view-projection matrices).

If you’re limited by fill rate, instancing won’t make any difference (nor will anything in the vertex shader).

If you have a lot of overdraw, rendering in approximately front-to-back order will help. Typically, the depth test is performed first, and if that fails (for all fragments in a group) no further action is taken; the fragment shader isn’t run, which also means no texture fetches.

Also, do you know what this phenomenon is:

The dark lines defining block edges fade away for a certain width of the screen. Not sure if avoidable or not.[/QUOTE]
What texture filters are you using? If using mipmaps, how are you generating the mipmap levels? It’s hard to tell from the image, but that’s roughly what I’d expect if the mipmap levels were generated by sampling the base texture (and missing the dark lines) rather than averaging.




	for (unsigned int level = 0; level < mipMapCount && (width || height); ++level)
		unsigned int size = ((width + 3) / 4)*((height + 3) / 4)*blockSize;
		glCompressedTexImage2D(GL_TEXTURE_2D, level, format, width, height,
			0, size, buffer + offset);

		offset += size;
		width /= 2;
		height /= 2;

		// Deal with Non-Power-Of-Two textures. This code is not included in the webpage to reduce clutter.
		if (width < 1) width = 1;
		if (height < 1) height = 1;


I haven’t given much learning to filtering yet but this is what I assume is trilinear filtering. The images actually don’t use any filtering whatsoever because I forgot to add it to my DDS file loader as well. I am using mipmaps generated from the base texture in paint.NET.

Your explanation makes sense, to fix just switch to a better filtering type. Here is a picture with filtering (I need to make an imgur acc. or something, sorry!)[ATTACH=CONFIG]1621[/ATTACH]

With fillrate, the menu sample you see over the image is actually over the entire screen, but as far as I’m aware only the menu section that is opaque would cause overdraw. Is buffering the vertex order (for front-to-back rendering) every frame/few frames worth it?

The reason I switched to matrix multiplication in the shader is because when I was making each draw call individually my CPU was getting crushed, but do to your advice I would assume that the CPU does the matrix calculations more efficiently.

I think my next task is chunk rendering. How would I go about placing block-face textures on repeat over each face? Would I need to maintain all the vertices on faces of each chunk, eliminating only inner vertices so that the UV coordinates have something to bind to that is still square?

Also, one last thing in this thread, if I don’t disable glBlend when rendering my cubes no cubes render. Is the only explanation that there must not be alpha data in my texture so all the texture alpha is set to transparent? (blendfunc = GL_SRC_ALPHA, GL_ONE_MINUS_SRC_ALPHA)

So you’re loading all of the mipmap levels from file. The issue is with how you generate those levels; there isn’t anything you can do in the OpenGL code to fix it.

[QUOTE=spectral;1289427]Your explanation makes sense, to fix just switch to a better filtering type. Here is a picture with filtering (I need to make an imgur acc. or something, sorry!)[ATTACH=CONFIG]2557[/ATTACH]
That seems to have the same issue as before; the middle section appears to be the same colour as the bright green parts of the base level, whereas it should tend towards the average. Mipmap levels are normally generated by averaging the corresponding 2x2 (or 2x1 or 1x2) block of pixels from the level above.

The order only needs to be approximate. For a grid of cubes, I’d sort the axes according to the magnitude of the eye-space Z component (so the outermost loop is rendering planes which are roughly normal to the view direction), with the order of each axis determined by the sign of the eye-space Z (so rendering from near to far). I wouldn’t bother with the case where the nearest plane/row/cell is in the middle of the cube/plane/row. That leaves 3!*2^3 = 48 possible orderings, and the order typically won’t change from one frame to the next.

Optimising overdraw becomes more important as the fragment shader gets more complex, both in terms of computation within the shader and the memory bandwidth for texture fetches. You can get a reasonable idea of the significance of texture bandwidth by applying a level-of-detail bias to the textures. If you’re limited by texture bandwidth, a fairly small bias will result in a significant change to rendering speed.

If the matrices are uniforms, then performing the multiplication on the CPU means that you’re only doing it once per draw call; doing it in the vertex shader does it for each vertex.

The first optimisation is to discard interior faces. The other obvious optimisation is to coalesce adjacent squares into larger rectangles. A non-optimal solution is fairly straightforward and cheap: for the innermost loop, coalesce strips of adjacent squares with the same texture into a single rectangle. A more effective approach would work in both dimensions to produce MxN rectangles, but finding an optimal or near-optimal solution is a hard problem. If you’re optimising overdraw, then you can choose to include some interior faces to get fewer larger rectangles.

Textures without an alpha channel have an alpha of constant one, not zero (green and blue are constant zero if those channels are absent).

But that doesn’t matter because your fragment shader ignores the texture’s alpha component:

	color = texture( myTextureSampler, UV ).rgb;

The output variable [var]color[/var] is a vec3. If you’re planning on using blending, you need to change that to a vec4 so that you can specify the alpha component. but if you want to use blending, then you’ll need to either render from back to front or render from front to back with an alpha channel in the colour buffer. And you won’t be able to rely upon early depth tests to optimise overdraw. Nor can you discard interior faces, as they may be visible through the exterior faces.