Best Practices for Instanced Chunked Heightmap Rendering

Hi all, I’m experimenting with basic heightmap rendering, where the terrain is split into 100x100 chunks composed of triangles (no strips yet):

I currently have one separate VBO for each chunk, and each chunk is rendered using glDrawArrays. Instead, I would like to render the entire frustum-culled visible terrain with one draw call, i.e. upload all terrain data to one VBO and use glMultiDrawArrays to render the chunks that are in the current view frustum.

I have used glMultiDrawArrays before and am confident using it, but I would like to optimise this terrain rendering further. Since the X and Z positions are the same in every chunk - only the altitude varies per-vertex - I’d like to only have to define the X and Z positions once and re-use it for each chunk.

I created a VBO with just the X and Z position of each triangle (not using a strip yet), then uploaded the altitudes of each vertex in each chunk to a separate large instance buffer.

I can now render all chunks at once with glMultiDrawArraysIndirect, however the same instance data is being used for each call. Each terrain chunk is positioned using gl_DrawID:


The base heightmap VBO has 6144 vertices and uses 2 ushorts for its X and Z position:

Gl.VertexAttribIPointer(0, 2, VertexAttribType.UnsignedShort, 4, IntPtr.Zero);
Gl.VertexAttribDivisor(0, 0); // Advance once per vertex

The instance buffer can store 8 million integers. Each chunk then stores 6144 integers in this buffer (1 int per vertex)

Gl.VertexAttribIPointer(1, 1, VertexAttribType.Int, 4, IntPtr.Zero);
Gl.VertexAttribDivisor(1, 0); // Advance once per vertex

One indirect command is used for each chunk, where offset is the position that the chunk’s instance data is stored:

    count = 6144,
    instanceCount = 1,
    first = 0,
    baseInstance = (uint)offset

As a test I also rendered one indirect command with multiple instances, but this produced the same result:

    count = 6144,
    instanceCount = visibleChunkAmount,
    first = 0,
    baseInstance = 0


  • Am I using glMultiDrawArraysIndirect incorrectly, or is it simply not possible to have per-vertex instance data? Or not possible to increment baseInstance per command?
  • The baseInstance value - 0, 6144, 800000, etcin the indirect command doesn't affect anything, whether I'm rendering 1 instance orvisibleChunkAmount` instances. Shouldn’t this offset the instance data that’s used for each internal draw call?
  • Is there another glDraw* method that’s more appropriate for this?

As a last resort I could upload the terrain altitude data to an SSBO and sample that in the vertex shader, but I’m suspicious that would be slower. I’d rather use VBOs / instancing correctly.

No. Instanced rendering essentially invokes the vertex shader once for each cell in a 2D table. Attribute data is either per-row (instanced) or per-column (not instanced). You cannot have per-cell attribute data (although you can fabricate it using dependent fetches based upon gl_VertexID, gl_InstanceID and/or gl_DrawID, this would be expected to have a performance cost compared to using attributes).

For a heightmap, you don’t even need to store X and Y values; you can synthesise them from the vertex indices, e.g.

    int col = gl_VertexID % columns;
    int row = gl_VertexID / columns;
    vec2 xy = vec2(col, row) * spacing;

Chunks would add a bit of complexity, but not much.

You absolute legend, thank you!

It was a bit trickier synthesizing them in the shader because each quad in the heightmap is composed of 6 indices. Works beautifully though!

void main()
    const float GRID_SIZE = 2.0;
    const float VERTICES_PER_TILE = 6.0;
    const float TILES_PER_RUN = 32.0;

    // Get the vertex index relative to this chunk's data in the large VBO
    float index = mod(float(gl_VertexID), VERTICES_PER_CHUNK);

    // Z increments every 6 vertices, and repeats every 32 tiles
    float zPos = mod(floor(index / VERTICES_PER_TILE), TILES_PER_RUN);

    // X increments every 6 x 32 = 192 vertices
    float xPos = floor(index / (TILES_PER_RUN * VERTICES_PER_TILE));

    // Create a triangle
    int triangleID = int(mod(index, VERTICES_PER_TILE));

    if (triangleID == 1 || triangleID == 3 || triangleID == 4)

    if (triangleID == 2 || triangleID == 4 || triangleID == 5)

I position each chunk by uploading their world positions to an SSBO, then read from it with gl_DrawID.

It might be preferable to use glMultiDrawElements, which requires an index array but doesn’t require shared vertices to be duplicated. The advantage is that the shared vertices will be cached, rather than executing the vertex shader (up to) six times for each vertex.