glDrawElementsInstanced poor performance?

paradoxresolved · January 29, 2016, 11:23am

Hello all,

I am using glDrawElementsInstanced to draw 720 objects, each having 12 polys, on the screen, for a total of 8640 polys. I’m using full-lighting with diffuse, ambient, specular and normal mapping. I’m getting a frame rate of 22. If I turn off lighting, it jumps to 24. My intuition tells me that my graphics card should be able to handle this a lot better.

My card is an NVIDIA GForce GT 525M, and I’ve got almost nothing on the CPU side of the loop.

As part of my experimenting, I tried drawing 30 objects (FPS 220’s), then 40 objects (FPS 170) … a 50 point drop for just ten low poly objects!

At a glance, are these numbers in the ballpark or should I look deeper into this? I’ve poured though my code to try to find issues, but can’t find any. I’m happy to post code if these numbers are indeed low for this card. Otherwise, I don’t want to flood you with code if this looks like it’s as good as it gets.

[Note: I wanted to post a picture, but the forums keeping telling me that my gif and jpg files are invalid image files. (Anyone else have this issue??). I even tried uploading from an image hosting site. The link to it is here: http://postimg.org/image/3k2ife24d/. It’s the image with the black background and rows of plants. I can’t do much about the other images; they’re ads.]

Thanks!

Edit: I measure frame rate by taking a timestamp at the beginning of each loop and comparing it to the timestamp of the previous loop, then invert the result. I’ve heard that how you measure frame rate can be controversial.

Alfonse_Reinheart · January 29, 2016, 4:02pm

I’ve heard that how you measure frame rate can be controversial.

No, what you should have heard is that measuring framerate at all is pointless. You had the right measurement: the frame time. Then you inverted it, losing useful information.

Take your 220-to-170 fps change. That seems like a lot. But 220fps is really 4.5ms, while 170fps is 5.8. That’s a change of only 1.3ms. Percentage wise, it seems like a lot. But in absolute numbers, it’s very little.

Note that 4.5/1.3 is 0.28, which is close to the 33% increase in your workload (going from 30 to 40 objects). So it’s what you expect.

I am using glDrawElementsInstanced to draw 720 objects, each having 12 polys

Instancing usually isn’t worthwhile for such small models. Not to mention small numbers of objects.

I’m using full-lighting with diffuse, ambient, specular and normal mapping.

What exactly do you mean by these terms? Are you doing these computations per-vertex or per-fragment?

paradoxresolved · January 29, 2016, 4:52pm

So, it appears to be behaving normally? Isn’t it generally a good idea to use the least number of CPU draw calls as possible? This begs the question: does the decision to use instancing depend on the number of draw calls for a single object type, or all calls for all objects. For example, if I have 1000 different types of objects in the scene, each having 1 or more instances, isn’t better to instance every call?

The issue with frame rate is frustrating. I agree that frame time is more informative, but the general populace seems obsessed with frame rate, usually 30 or more as a benchmark. Even if I work with frame time as a measure of efficiency, in the end I feel obliged to report the efficiency as frame rate. In either case, I’m simply trying to keep from using unnecessary steps in order to make my program more efficient (and more importantly learn as I go). It seems bizarre that such a simple scene should be so hard for the graphics card to produce. This computer can tackle games that are far more graphical than the simple scene I made.

By lighting, I’m using a split between vertex and fragment shader, as shown here:

Vertex Shader


#version 330 core
layout (location = 0) in vec3 position;
layout (location = 1) in vec3 normal;
layout (location = 2) in vec2 tex_coords;
layout (location = 3) in vec3 tangent;
layout (location = 4) in vec3 bitangent;
layout (location = 5) in mat4 instanceMatrix;




#define MAX_NUMBER_OF_SHADER_LIGHTS 6
uniform vec4 LightPosition[MAX_NUMBER_OF_SHADER_LIGHTS];
uniform int LightEnabled[MAX_NUMBER_OF_SHADER_LIGHTS];


uniform mat4 mProjection;
uniform mat4 mView;


out vec2 UV;


out vec3 position_cs;
out vec3 eyedirection_cs;
out vec3 eyedirection_ts_normal;
out vec3 lightdirection_cs[MAX_NUMBER_OF_SHADER_LIGHTS];
//out vec3 lightdirection_ts[MAX_NUMBER_OF_SHADER_LIGHTS];
out vec3 lightdirection_ts_normal[MAX_NUMBER_OF_SHADER_LIGHTS];


void main(void)
{
    mat4 mModelView = mView * instanceMatrix;
    ////////////////////////////////////////////////////////////////////
    position_cs = (mModelView * vec4(position,1.0)).xyz;


    /*export*/ eyedirection_cs = -position_cs;


    // model to camera = ModelView
    vec3 normal_cs = mat3(mModelView) * normal;
    vec3 tangent_cs = mat3(mModelView) * tangent;
    vec3 bitangent_cs = mat3(mModelView) * bitangent;


    mat3 TBN = transpose(mat3(tangent_cs,bitangent_cs,normal_cs));


    /*export*/ vec3 eyedirection_ts =  TBN * eyedirection_cs;
    /*export*/ eyedirection_ts_normal = normalize(eyedirection_ts);


    ////////////////////////////////////////////////////////////////////


    for( int i = 0 ; i < MAX_NUMBER_OF_SHADER_LIGHTS ; i++ )
    {
        if(LightEnabled[i] > 0)
        {
            /////////////////////////////////////////////////////////////////////////////////////
            vec3 lightposition_cs = LightPosition[i].xyz;
            /*export*/ lightdirection_cs[i] = normalize(lightposition_cs + eyedirection_cs);
    
            /*export*/ vec3 lightdirection_ts = TBN * lightdirection_cs[i];
            /*export*/ lightdirection_ts_normal[i] = normalize(lightdirection_ts);


            /////////////////////////////////////////////////////////////////////////////////////
        }
    }


    UV = tex_coords;
    gl_Position = mProjection * vec4(position_cs,1);
}

// Fragment shader


#version 330 core
#define MAX_NUMBER_OF_SHADER_LIGHTS 6


uniform vec4 LightPosition[MAX_NUMBER_OF_SHADER_LIGHTS];
uniform vec4 LightColor[MAX_NUMBER_OF_SHADER_LIGHTS];
uniform int LightEnabled[MAX_NUMBER_OF_SHADER_LIGHTS];
uniform float Ambient_Intensity;


uniform sampler2D tex;        // texture bound in slot 0
uniform sampler2D spec;        // specular bound in slot 1
uniform sampler2D norm;        // normal bound in slot 2


in vec2 UV;


in vec3 position_cs;
in vec3 eyedirection_cs;
in vec3 eyedirection_ts_normal;
in vec3 lightdirection_cs[MAX_NUMBER_OF_SHADER_LIGHTS];
in vec3 lightdirection_ts_normal[MAX_NUMBER_OF_SHADER_LIGHTS];


out vec4 color;


void main (void)
{    
    vec4 MaterialDiffuseColor = texture2D( tex, UV.xy );
    if(MaterialDiffuseColor.a == 0)
        discard;


//    color = vec4(MaterialDiffuseColor.xyz,1.0f);
//    return;

    color = vec4(0.0f,0.0f,0.0f,1.0f);


    vec4 MaterialAmbientColor = vec4(MaterialDiffuseColor.xyz * Ambient_Intensity,1.0f);
    vec4 MaterialSpecularColor = texture2D( spec, UV.xy );


    vec3 TextureNormal_ts = normalize(texture2D( norm, vec2(UV.x,UV.y) ).rgb*2.0 - 1.0);


    for( int i = 0 ; i < 1 ; i++ )
    {
        if(LightEnabled[i] > 0)
        {
            float distance = length( vec3(LightPosition[i]) - position_cs );


            // Cosine of the angle between the normal and the light direction,
            float cosTheta;
            if(gl_FrontFacing)
                cosTheta = clamp( dot( TextureNormal_ts,lightdirection_ts_normal[i] ), 0,1 );
            else
                cosTheta = clamp( dot( -TextureNormal_ts,lightdirection_ts_normal[i] ), 0,1 );


            // Direction in which the triangle reflects the light
            vec3 R = reflect(-lightdirection_ts_normal[i],TextureNormal_ts);
            float cosAlpha = clamp( dot( eyedirection_ts_normal,R ), 0,1 );


            color += MaterialAmbientColor + (LightColor[i] / (distance * distance)) * (MaterialDiffuseColor * cosTheta + MaterialSpecularColor * pow(cosAlpha,16));
        }
    }
}

Edit: BTW, I’m using instanced matrices. In your experience, does switching to using position vectors and quaternions make enough difference in performance to warrant the switch?