Efficient Technique for Rendering Thousands of Tiny Quads

I’ve started working on a 2D tile rendering system. So far it draws 8x8 tiles on the screen. At 640x480 resolution, there can be up to 4941 tiles on screen at a time, and the number quickly increases at higher resolutions. I’m looking for an efficient way to render them all.

Additionally, each tile will be a solid color. On the application side, I have a vec4[MAX_COLORS] storing each color and I send an int colorIndex to each instance.

I’m using instanced rendering right now, but several things about my approach make me wonder if it’s inefficient. I’m storing the colors in a uniform in the vertex shader, which I realize isn’t the best way. I’m just getting the hang of openGL so I’m also not sure if instanced rendering would be the method here. Here’s the relevant code:

Constant data, structs:

static float colors[MAX_COLORS][4];

static float tileRect[] =
	0.0f, 0.0f,
	0.0f, TILE_SIZE,
	TILE_SIZE, 0.0f,

	vec2 positions[MAX_TILES];
	int colorIndex[MAX_TILES];
} tiles;

	GLuint vertexShader;
	GLuint fragShader;
	GLuint programID;
	GLuint vao;
} tileShader;

I bind the rectangle, positions, and color index as vertex attributes:

glVertexArrayVertexBuffer(tileShader.vao, 0, buffers.rect, 0, 2 * sizeof(float));
glVertexArrayVertexBuffer(tileShader.vao, 1, buffers.position, 0, 2 * sizeof(float));
glVertexArrayVertexBuffer(tileShader.vao, 2, buffers.colIdxs, 0, 1 * sizeof(int));

glVertexArrayAttribFormat(tileShader.vao, 0, 2, GL_FLOAT, GL_FALSE, NULL);
glVertexArrayAttribFormat(tileShader.vao, 1, 2, GL_FLOAT, GL_FALSE, NULL);
glVertexArrayAttribIFormat(tileShader.vao, 2, 1, GL_INT, 0);


glVertexArrayBindingDivisor(tileShader.vao, 0, 0);
glVertexArrayBindingDivisor(tileShader.vao, 1, 1);
glVertexArrayBindingDivisor(tileShader.vao, 2, 1);

Store the colors and and transform as uniforms:

GLint cl = glGetUniformLocation(tileShader.programID, "colors");
glUniform4fv(cl, MAX_COLORS, &colors);
GLint mv = glGetUniformLocation(tileShader.programID, "modelview");
glUniformMatrix4fv(mv, 1, GL_FALSE, worldTransform);

The render loop:

while (!glfwWindowShouldClose(window))
	glDrawArraysInstanced(GL_TRIANGLE_STRIP, 0, 4, MAX_TILES);

Vertex shader: (OpenGL 4.5)

#version 450 core

uniform vec4 colors[2];
uniform mat4 modelview;

layout(location = 0) in vec2 iPos;
layout (location = 1) in vec2 iOff;
layout (location = 2) in int iColIdx;

out vec4 vCol;

void main()
    gl_Position = modelview * vec4(iPos + iOff, 0.0f, 1.0f);

    vCol = colors[iColIdx];

Other approaches I’ve considered:
Drawing the tiles:

  • Draw the static tiles as point sprites
  • Just send the tile’s color to each instance.
  • Perhaps there’s a way to connect all the static quads into one large quad, and just render that?

For storing all the colors, I’m thinking either a 1D texture array or an ssbo.

So my question is, for upwards of 20,000 tiles (at higher resolutions), is this method sufficient, or is there a better way to handle this?

This number of triangles is so trivially low, that it will probably not really matter with todays hardware. Especially if your triangles dont overlap. What is way more important than the number of triangles is the number of fragments that are actually drawn. If you draw millions of tiny triangles it will be much faster than drawing a couple thousands of screen filling triangles.
For the situation you have described a simple VBO with the position and color data should be more than adequate. But I would guess that even with immediate mode you would still get a good enough frame rate in your situation.

I’ll second the above; even immediate mode - just make sure you have a single glBegin/glEnd pair rather than thousands of them - may well be fine, and is often an optimal path for dynamic data (if that’s what you have).