shader optimization

I’ve been using glsl for a few months now and i know a few things as far ass optimizations like MADD instructions are fast sampling is slow avoid division. I’d love to get some input on the subject, if anyone could give me any pointers or refer me somewhere else that would be brilliant.

I made this geometry instance vertex shader that works great (so much better than go through a loop). Is there anything i should change or optimize? I’m guessing the texel fetches are the slowest part.

#version 150 core

in vec3 vertex;
in vec3 normal;
in vec2 texCoord;
in vec3 tangent; 
in vec3 binormal;

uniform samplerBuffer tboSampler;  // contains model matrix for each instance
uniform samplerBuffer colorTboSampler; //contains separate colors for each instance
uniform mat4 viewMatrix;
uniform mat4 projectionMatrix;
uniform float far; //  1/farclipplane

out vec4 eyeSpaceVert;
out vec3 eyeSpaceTangent;
out vec3 eyeSpaceBinormal;
out vec3 eyeSpaceNormal;
out vec2 tuv; //texcoord
out vec4 instanceColor; //separate color per model
out float depth; //linear depth

mat4 getModelMatrix()
 mat4 tbo; 
 for (int i =0;i<4;i++)
  tbo[i] = texelFetch(tboSampler,(gl_InstanceID  )* 4 + i);
 return tbo;

mat3 getNormalMatrix(mat4 mvMatrix)
 return mat3(transpose(inverse(mvMatrix)));

void main(void)
  tuv = texCoord; 
  mat4 mvMatrix = viewMatrix * getModelMatrix();
  mat3 normalMatrix = getNormalMatrix(mvmatrix);
  eyeSpaceVert = mvMatrix * vec4(vertex,1.0);
  eyeSpaceTangent  = normalMatrix * tangent;
  eyeSpaceBinormal = normalMatrix * binormal;
  eyeSpaceNormal   = normalMatrix * normal;

  instanceColor = texelFetch(colorTboSampler,gl_InstanceID);
  depth = -eyeSpaceVert.z * far;
  gl_Position = projectionMatrix *  eyeSpaceVert;

If your modelview matrix doesn’t have any scaling, I bet you can hand code a function that will outperform the library inverse function.

Actually, since you only need the 3x3 rotation matrix for the normals, I am pretty sure you can just use the modelview matrix to transform normals if there is no scaling.


  1. Consider using Instanced Arrays. This will remove your sampling instructions.

  2. Prepare normal matrices ahead. You can use Transform Feedback to generate instanced array of them.

3-minor) Your 3 eye-space vectors can be obtained by a single 3x3 matrix multiplication instead of 3 matrix by vector mults.

4-revolutionary) Consider using a single quaternion (vec4) to represent the tangental space instead of normal, tangent & bitangent (3 vec3). This way you’ll need one normalization in fragment shader instead of 3, and reduce memory load & attribute slots used.

Thanks for the great feedback guys. I’m using non uniform scaling so i can’t just use the model view matrix for my normals.

I tried this and i was a bit surprised to find out that using glVertexAttribDivisorARB and then just reading the values as attributes was actually somewhat slower. I had to split my model matrix into 4 separate buffers to be able to read it as attribs. I did some quick tests using fps (i know i should use timers but as i said it was a quick test) i got the following:

batch of 125000 low poly models:
36fps using texture buffers
27fps using Instanced Arrays

batch of 25000 higher poly models:
13fps using texture buffers
10fps using Instanced Arrays
I don’t really think i need to run any further tests, it look like in this instance(no pun intended) texture buffers are faster

Sorry I’m a little lost here, i’v never used Transform Feedback.

I had already done this but left it this way in my example because i wasn’t really sure if it made a difference, now i know, thanks again.

Wow this makes a lot of sense. So i’m guessing you can pretty much replace any 3x3 orthogonal matrix with a single quaternion. I know this works with bones too but could you use the same idea for other matrices like your view matrix? I know what a quaternion is and i know how to use it, but honestly i don’t understand the underlying math. I can easily convert my TBN matrix into a quternion and send it into my shader but what would be the most efficient way to transform my normal map using it? should i convert the quaternion into a matrix and then do a mat3 * vec3? I’m guessing there’s a quicker way but i’m a graphic designer so i never made it to algebraic theory :stuck_out_tongue:

given the following code:

void applyNormalMap(in mat3 tbnMatrix,out vec3 normal)
 vec3 nmap=(texture2D(normalSampler,texCoord)).xyz;
 nmap = nmap * 2.0 - 1.0;
 normal = normalize(tbnMatrix * nmap);

How would i incorporate a quaternion? I came up with the following using borrowed code

vec3 qtransform( vec4 q, vec3 v )
 return normalize(v + 2.0*cross(cross(v, ) + q.w*v,;

void applyNormalMap(in vec4 tbnQuat,out vec3 normal)
 vec3 nmap=(texture2D(normalSampler,texCoord)).xyz;
 nmap = nmap * 2.0 - 1.0;
 normal = qtransform(tbnQuat,nmap);

Finally can i normalize a quaternion just like i would a vector?

  1. Your tests clearly show that Instanced Arrays are slow. However, I expect them to be faster eventually with some new driver release.

  2. Since you are not going to use vertex attributes for your matrix array, you can use regular texture processing for that purpose.

  3. I’m not sure about it, actually. It can even be done by GLSL compiler. But it’s easy to do on your side, anyway.

  4. 3x3 orthonormal matrix = rotation transform = quaternion + handness bit.
    I’m using quaternions as a complete replacement for matrices:

Yes, you can send a quaternion alone to the shader (instead of 3x3 matrix) if the matrix handness is known in advance. For TBN matrices generated from the texture coordinates, the handness may vary. I’m storing vertex basis handness in ‘position.w’ component.

Your code is correct, assuming tbnQuat represents TBN -> desired space transform.

For farther reference you can use the code of KRI project (see link), especially the exporter part, where TBN is converted to the quaternion + handness.

And yes, luckily, quaternions are normalized like a regular 4-component vectors. It’s one of the few quaternion operations that are hardware supported in GLSL at this moment…