glDrawArraysInstancedEXT

Leadwerks · August 15, 2007, 2:25pm

I was about to implement GPU instancing using glDrawArraysInstancedEXT(). It looks simple but I thought I would do a quick search before I started and see what other people had to say about it. There were no results on this or the GLSL forum. What has been people’s experience with this feature? What kind of speed gains can I expect versus batched VBO rendering? Any demos available that demonstrate a comparison?

Seth_Hoffert · August 15, 2007, 4:43pm

I’m using instanced rendering in a game that I’ve been working on, and I’ve had a very successful experience - it’s quite elegant.

Before, I was applying transformations using the OpenGL matrix functions, but now I do it all in shaders and it’s quite fast.

Here’s a snippet of the vertex shader that I ultimately settled with:

#version 120
#extension ARB_draw_buffers : require
#extension EXT_gpu_shader4 : require
#extension EXT_bindable_uniform : require

varying vec3 LightDir[3];
varying vec3 EyeDir;

attribute vec3 Tangent;

uniform float texid[520];
uniform vec4 loc[520];
uniform float rot[520];
uniform vec4 colors[520];

void main()
{
    // Form the translation & rotation matrix
    float c = cos(rot[gl_InstanceID]);
    float s = sin(rot[gl_InstanceID]);

    mat4 tmat = mat4(
        c, s, 0.0, 0.0,
        -s, c, 0.0, 0.0,
        0.0, 0.0, 1.0, 0.0,
        loc[gl_InstanceID].xyz, 1.0
    );
...

I haven’t tried batched VBO rendering, but I can say that this ran faster than applying transformations on the CPU for each instance of the object.

-Seth

Leadwerks · August 15, 2007, 6:32pm

Okay, I got it working. I haven’t uploaded the matrix data to the shader, but just using a simple offset for each instance, I am getting a 38% speed increase with 2000 instances.

Leadwerks · August 15, 2007, 8:23pm

When I upload matrices each frame the performance increase with batching is about 30%. When I just use a static array the performance is 35%. I think it’s worth the flexibility just to upload every matrix each frame, instead of trying to juggle copying bits of data here and there.

What’s nice about this extension is it works so easily with existing functionality, and you don’t have to change much to use a fallback for non-compatible cards. So it’s a free 30% speed boost on the 8 series, with very little work involved.