Asked the same question on stackexchange, no answers there. (I cant include links?)

Trying to implement animation on my engine.

I’m at the first stage, rendering default pose of skinned meshes.

Working as expected but very slow.

With the below calculation, the shader takes **6ms** to run.

```
struct MeshUniform {
mat4 transform;
mat4 normalMatrix;
vec4 baseColorFactor;
vec4 roughnessMetallicNormal;
vec4 hasColorMetallicNormalTexture;
mat4[30] jointMatrices;
};
layout (std430, binding = 4) buffer meshUniformSSBO { MeshUniform[] meshUniforms; };
layout (location = 0) in vec3 position;
layout (location = 1) in vec3 normal;
layout (location = 2) in vec2 uv;
layout (location = 3) in vec4 tangent;
layout (location = 4) in uvec4 joints;
layout (location = 5) in vec4 weights;
layout (location = 6) in int drawId;
void main() {
MeshUniform meshUniform = meshUniforms[drawId];
mat4 model = meshUniform.transform;
mat4 skinMat =
meshUniform.jointMatrices[joints[0]] * weights[0] +
meshUniform.jointMatrices[joints[1]] * weights[1] +
meshUniform.jointMatrices[joints[2]] * weights[2] +
meshUniform.jointMatrices[joints[3]] * weights[3];
vec4 positionVec4 = skinMat * model * vec4(position, 1.0);
...
}
```

If I remove skinMat calculation and multiplication, same shader takes less than **1ms**.

```
// mat4 skinMat =
// meshUniform.jointMatrices[joints[0]] * weights[0] +
// meshUniform.jointMatrices[joints[1]] * weights[1] +
// meshUniform.jointMatrices[joints[2]] * weights[2] +
// meshUniform.jointMatrices[joints[3]] * weights[3];
// vec4 positionVec4 = skinMat * model * vec4(position, 1.0);
vec4 positionVec4 = model * vec4(position, 1.0);
```

- Scene has 87706 vertices, shown in Blender statistics.
- I’m using glMultiDrawElementsIndirect with single VAO.
- Joint matrices for non-skinned meshes are identity matrix.
- MeshUniform is persistent, coherent ssbo map. Only updated when needed.
- I’m using the same calculation on shadowmaps, so it takes another 6ms.
- Gpu is 1080 Ti.

I tried adding “jointCount” to MeshUniform struct, and doing the joint calculation only if jointCount > 0. But it still took 6 ms to calculate.

Is this to be expected and what can I do to improve?

##
**With DMGregory’s suggestions on stackexchange:**

I tried,

- Multiplying joint matrices by position vector, then summing the results.
- Pre-multiplying model with joint matrices on cpu.

It looks like this now;

```
vec4 positionVec4 = vec4(position, 1.0);
vec4 sum =
meshUniform.jointMatrices[joints[0]] * weights[0] * positionVec4 +
meshUniform.jointMatrices[joints[1]] * weights[1] * positionVec4 +
meshUniform.jointMatrices[joints[2]] * weights[2] * positionVec4 +
meshUniform.jointMatrices[joints[3]] * weights[3] * positionVec4;
positionVec4 = sum;
```

**It’s still taking 5-6ms to run.**

Someone in lwjgl forums posted a question similar to mine in 2012.

(Again, i cant include links)

In his last message he said;

using a constant as the array index while accessing boneMatrixes

brings performance up

Sure enough if I exclude *joints* array lookup from above code like this;

```
vec4 positionVec4 = vec4(position, 1.0);
vec4 sum =
meshUniform.jointMatrices[0] * weights[0] * positionVec4 +
meshUniform.jointMatrices[1] * weights[1] * positionVec4 +
meshUniform.jointMatrices[2] * weights[2] * positionVec4 +
meshUniform.jointMatrices[3] * weights[3] * positionVec4;
positionVec4 = sum;
```

**It renders in 1ms.** But of course resulting image is not correct.

Maybe it will give some ideas to more experienced people on OpenGL.