# New challenge!, inverse transpose matrix under GLSL 120...!

Hi Guys!,

I’m having some troubles, of how I could obtain the transpose(inverse(mat3 mat)) under GLSL 120.
This is because I’m using GPU Skinning to render organic meshes.

So the most important code is:

``````//Here we made all the magic to get a Mat4 that allow us to put the Vertex in the correct place ( Local Position )
mat4 matTransform = obtainMatTransformCurrentAnimation( ... );

vec4 vLocalPos = matTransform * glVertex;

//Now, we multiply it with the gl_ModelViewProjectionMatrix to get the final position
gl_Position = gl_ModelViewProjectionMatrix * vLocalPos;

//So here is the problem, to calculate the normals you need the transpose(inverse(matTransform))
mat3 normalMat = transpose( ddeInverse( matTransform ) );
normalMat = gl_NormalMatrix * normalMat;

//Now we made the normal Multiplication to the normal varying variable
pNormal = normalMat * gl_Normal;

//Here are the ddeInverse and ddeDeterminant Methods.
float ddeDeterminant( mat3 A )
{
//Calculate the determinant

float determinant = +A[0][0] * ( A[1][1] * A[2][2] - A[2][1] * A[1][2] )
-A[0][1] * ( A[1][0] * A[2][2] - A[1][2] * A[2][0] )
+A[0][2] * ( A[1][0] * A[2][1] - A[1][1] * A[2][0] );

return determinant;
}

mat3 ddeInverse(const in mat3 Matrix)
{
// Calculate the determinant...
float D = ddeDeterminant(Matrix);

// Singular matrix, problem...
if(D == 0.0)
return mat3(0.0);

// Calculate the transpose...
mat3 MatrixT = transpose(Matrix);

// Calculate needed determinants...
float D00 = MatrixT[1][1] * MatrixT[2][2] + MatrixT[2][1] * MatrixT[1][2];
float D10 = MatrixT[0][1] * MatrixT[2][2] + MatrixT[2][1] * MatrixT[0][2];
float D20 = MatrixT[0][1] * MatrixT[1][2] + MatrixT[1][1] * MatrixT[0][2];
float D01 = MatrixT[1][0] * MatrixT[2][2] + MatrixT[2][0] * MatrixT[1][2];
float D11 = MatrixT[0][0] * MatrixT[2][2] + MatrixT[2][0] * MatrixT[0][2];
float D21 = MatrixT[0][0] * MatrixT[1][2] + MatrixT[1][0] * MatrixT[0][2];
float D02 = MatrixT[1][0] * MatrixT[2][1] + MatrixT[2][0] * MatrixT[1][1];
float D12 = MatrixT[0][0] * MatrixT[2][1] + MatrixT[2][0] * MatrixT[0][1];
float D22 = MatrixT[0][0] * MatrixT[1][1] + MatrixT[1][0] * MatrixT[0][1];

// Assemble matrix of cofactors...
MatrixAdjugate[0] = vec3( D00, -D01,  D02);
MatrixAdjugate[2] = vec3( D20, -D21,  D22);

// Calculate the inverse...
return (1.0 / D) * MatrixAdjugate;
}
``````

I’m limited to GLSL 120 because the shader is running under a 7800GTX… so any advice?.

Yep. Looking at the problem and taking a step back, first question whether you need to. Suggestion:

Only allow rigid transformations for your joint transforms (both joint orientation and animation transforms, and thus joint final transforms as well), and then inverse = transpose and there’s no need to mess with this inverse transpose stuff. Just use the rotation component of your skinning transform to skin your normals.

Note: rigid transform = rotates and translates only (no scales or shears)

Note that even the blended inverse transpose transformed normal is not precisely the correct normal, but it’s typically regarded as good enough.

Also note that even if you allow uniform scale, IIRC inverse transpose is still overkill. You can optimize that.

If you use linear blend skinning, keep in mind that blending matrices is really garbage (thus joint collapse, rigid transforms becoming deorthonormalized, etc.). Consider quaternion-based methods.

You could even use quaternions instead of normals too.

Right now, I processed the Gpu Skinning with a static boneMatrix[61], and skin like this:

``````//Weight Vector
vec4 weight    = gl_MultiTexCoord1;

//Matrix Bone Index
vec4 matIndex  = gl_MultiTexCoord2;
int  matIndex0 = int(matIndex.x);
int  matIndex1 = int(matIndex.y);
int  matIndex2 = int(matIndex.z);
int  matIndex3 = int(matIndex.w);

//Matrix Transformation
mat4 matTransform  = boneMatrix[matIndex0] * weight.x;
matTransform += boneMatrix[matIndex1] * weight.y;
matTransform += boneMatrix[matIndex2] * weight.z;

//Local Position Computation
float finalWeight   = 1.0f - ( weight.x + weight.y + weight.z );
matTransform += boneMatrix[matIndex3] * finalWeight;
``````

So per mesh… I can use 61 bones max.

Just FYI, the above should work and takes 2 16-byte input attributes (32 bytes total). If you care, you can pass in the same info using only 1/4 of the space (8 bytes) by passing in a uvec2 into a single vtx attrib. In .x store 4 8-bit packed joint/bone indices. In .y store 4 8-bit packed weights (0…255). Just make sure you populate it on the C++ side using glVertexAttribIPointer, not glVertexAttribPointer.

Actually, you said GLSL 1.2, which doesn’t have bit shifts or uint/uvec. Not sure if you can run a higher version on a GeForce 7 or not by forcing it, but if not guess you can’t do the above on that GPU.

``````
//Matrix Transformation
mat4 matTransform  = boneMatrix[matIndex0] * weight.x ...
matTransform += boneMatrix[matIndex1] * weight.y;
...
``````

Ok, so standard linear blend skinning (LBS) so the usual caveats apply.

Re your inverse transpose question presumably directed toward skinning your vertex normal, if your joint/bone matrices are rigid transforms, just use mat3( matTransform ) to transform the normal.

Hi there!,

Thanks Dark Photon for the advice, but the engine must run on Linux / Mac / Windows, and right now is not a good idea to made some kind of “hacking” to improve the upload of the joints and weights to the Gpu, It seems that I really can made some kind of optimization under Linux, but I cannot made it work under Windows so… U_U no luck at the moment.

I’m trying to test what you said in the previous post, but neither the mat3( matTransform ) or the transpose( mat3( matTransform ) ) seems to work.

``````
mat3 normaMat = gl_NormalMatrix * mat3(matTransform);

``````

The implementation is based on the MD5 one, so no scales or shares are applied, so I really cannot understand why is not working properly…

Here is the code from CPU side to fill the VBO.

``````
voidddeCL::ddeMeshBuild( ddeMesh *tempMesh )
{
/* Local variables */
Vertex                    *tempVert            = NULL;
Weight                   *tempWeight        = NULL;
Joint                       *tempJoint           = NULL;
ddeFloat                 *currVBOPos        = NULL;
ddeFloat                 *VBOCPU             = NULL;
ddeMath::vec3         tan1   [tempMesh->numVerts];
ddeMath::vec3         tan2   [tempMesh->numVerts];
[u]vector[/u]<ddeMath::vec4> WeightBuffer;
[u]vector[/u]<ddeMath::vec4> BoneIndexBuffer;

if( tempMesh->isSkinned == false )
{
[u]printf[/u]( "Processing Mesh... %s
", tempMesh->meshName );
//We create a temporary array in order to copy all the meshes from the file to the VBOCPU Array,
//We don't need 64 bytes  of information for each vertex, but this way we will made a perfect pre-fetching
//and no cache miss on GPU, so that's why we send dummy information
VBOCPU  = ( ddeFloat* )[u]malloc[/u]( ( tempMesh->numVerts * 16 )  * sizeof( float ) );

for ( ddeIntj = 0, vboCpuIterator = 0; j < tempMesh->numVerts; j++, vboCpuIterator += 16 )
{
//Temp Variables for boneweight and index matrix
ddeMath::vec4 boneIndices = ddeMath::vec4(0);
ddeMath::vec4 boneWeights = ddeMath::vec4(0);

tempVert = &tempMesh->verts[j];

tempVert->pos.x = tempVert->pos.y = tempVert->pos.z = 0.0;
tempVert->n.x   = tempVert->n.y   = tempVert->n.z   = 0.0;

for ( ddeInt k=0; k < tempVert->weightCount; k++ )
{
tempWeight =  &tempMesh->weights[ tempVert->weightIndex + k ];
tempJoint  =  &tempMesh->joints [ tempWeight->joint         ];
ddeMath::quat q;

ddeMath::vec3 result = tempJoint->quat * ddeMath::vec3( tempWeight->pos[0],
tempWeight->pos[1],
tempWeight->pos[2] );

tempVert->pos.x += ( tempJoint->pos[0] +  result.x ) * tempWeight->w;
tempVert->pos.y += ( tempJoint->pos[1] +  result.y ) * tempWeight->w;
tempVert->pos.z += ( tempJoint->pos[2] +  result.z ) * tempWeight->w;
boneIndices[k]   = ( float )tempWeight->joint;
boneWeights[k]   = tempWeight->w;
} // for (weights)

// Copy the 3 new vertices to the temporary VBO
currVBOPos    = &VBOCPU[vboCpuIterator];
currVBOPos[0] = tempVert->pos.x; //Vertex Pos0.
currVBOPos[1] = tempVert->pos.y; //Vertex Pos1.
currVBOPos[2] = tempVert->pos.z; //Vertex Pos2.
currVBOPos[3] = 0.0f;            //Vertex Pos3 (Dummy).

currVBOPos[8] = tempVert->tc.x;  //Texture Coord 0.
currVBOPos[9] = tempVert->tc.y;  //Texture Coord 1.

#ifdef DDEDEBUG
//Check for duplicates Verts in the VBO
ddeCheckCurrVertexVBO( tempVert->pos.x,
tempVert->pos.y,
tempVert->pos.z,
VBOCPU,
j,
tempMesh->meshName );
#endif

WeightBuffer.[u]push_back[/u]   (boneWeights);
BoneIndexBuffer.[u]push_back[/u](boneIndices);
} // for (mesh vertices)

// For each normal, add contribution to normal from every face that vertex
// is part of and also made the first pass for the Tangent Creation
[u]printf[/u]( "Calculating Normals...
" );
for( ddeInt j = 0; j < tempMesh->numTris; j++ )
{
//Normal Stuff
//Retrieves Vertex from Index List
Vertex *v0 = &tempMesh->verts[ tempMesh->tris[j].v[0] ];
Vertex *v1 = &tempMesh->verts[ tempMesh->tris[j].v[1] ];
Vertex *v2 = &tempMesh->verts[ tempMesh->tris[j].v[2] ];

float Ax = v1->pos[0] - v0->pos[0];
float Ay = v1->pos[1] - v0->pos[1];
float Az = v1->pos[2] - v0->pos[2];

float Bx = v2->pos[0] - v0->pos[0];
float By = v2->pos[1] - v0->pos[1];
float Bz = v2->pos[2] - v0->pos[2];

float nx =   Ay * Bz - By * Az;
float ny = -(Ax * Bz - Bx * Az);
float nz =   Ax * By - Bx * Ay;

v0->n[0] += nx;
v0->n[1] += ny;
v0->n[2] += nz;

v1->n[0] += nx;
v1->n[1] += ny;
v1->n[2] += nz;

v2->n[0] += nx;
v2->n[1] += ny;
v2->n[2] += nz;
}

[u]printf[/u]( "Calculating First Pass Tangent...
" );
for( ddeInt j = 0; j < tempMesh->numTris; j++ )
{
//Tangent Stuff
//Retrieve Triangle from Index List
ddeLong i1 = tempMesh->tris[j].v[0];
ddeLong i2 = tempMesh->tris[j].v[1];
ddeLong i3 = tempMesh->tris[j].v[2];

//Fill the 3 Verts of the triangle
ddeMath::vec3 vert1 = ddeMath::vec3( tempMesh->verts[i1].pos[0],
tempMesh->verts[i1].pos[1],
tempMesh->verts[i1].pos[2] );

ddeMath::vec3 vert2 = ddeMath::vec3( tempMesh->verts[i2].pos[0],
tempMesh->verts[i2].pos[1],
tempMesh->verts[i2].pos[2] );

ddeMath::vec3 vert3 = ddeMath::vec3( tempMesh->verts[i3].pos[0],
tempMesh->verts[i3].pos[1],
tempMesh->verts[i3].pos[2] );

//Fill the 3 Texture Coords of each Vertex
ddeMath::vec2 tc1 = ddeMath::vec2( tempMesh->verts[i1].tc[0],
tempMesh->verts[i1].tc[1] );

ddeMath::vec2 tc2 = ddeMath::vec2( tempMesh->verts[i2].tc[0],
tempMesh->verts[i2].tc[1] );

ddeMath::vec2 tc3 = ddeMath::vec2( tempMesh->verts[i3].tc[0],
tempMesh->verts[i3].tc[1] );

//Calculate the Tangent
ddeFloatx1 = vert2.x - vert1.x;
ddeFloatx2 = vert3.x - vert1.x;
ddeFloaty1 = vert2.y - vert1.y;
ddeFloaty2 = vert3.y - vert1.y;
ddeFloatz1 = vert2.z - vert1.z;
ddeFloatz2 = vert3.z - vert1.z;

ddeFloats1 = tc2.x - tc1.x;
ddeFloats2 = tc3.x - tc1.x;
ddeFloatt1 = tc2.y - tc1.y;
ddeFloatt2 = tc3.y - tc1.y;

ddeFloat r = 1.0f / (s1 * t2 - s2 * t1);

ddeMath::vec3 sdir = ddeMath::vec3( ( t2 * x1 - t1 * x2 ) * r,
( t2 * y1 - t1 * y2 ) * r,
( t2 * z1 - t1 * z2 ) * r );

ddeMath::vec3 tdir = ddeMath::vec3( ( s1 * x2 - s2 * x1 ) * r,
( s1 * y2 - s2 * y1 ) * r,
( s1 * z2 - s2 * z1 ) * r );

//Add the result for each tangent
tan1[i1] += sdir; tan2[i1] += tdir;
tan2[i2] += sdir; tan2[i2] += tdir;
tan2[i3] += sdir; tan2[i3] += tdir;
}

// Normalize each normal && Calculate Final Tangent and BiNormal
[u]printf[/u]( "Filling Normal - Tangent - BiNormal...
" );
for( ddeIntj = 0, vboCpuIterator = 0; j < tempMesh->numVerts; j++, vboCpuIterator += 16 )
{
Vertex *v = &tempMesh->verts[j];

float mag = (float)[u]sqrt[/u]( float( v->n[0] * v->n[0] + v->n[1] * v->n[1] + v->n[2] * v->n[2] ) );

// Avoid Division By Zero
if ( mag > 0.0001f )
{
v->n[0] /= mag;
v->n[1] /= mag;
v->n[2] /= mag;
}

//Finish Tangent and Binormal Calculations
ddeMath::vec3 tmpNormal = ddeMath::vec3( v->n[0], v->n[1], v->n[2] );
ddeMath::vec3 tmpTan;

( ddeMath::dot( ddeMath::cross( tmpNormal, tan1[j] ), tan2[j] ) < 0.0f) ?  tmpTan = tan1[j]
: tmpTan = tan2[j];

ddeMath::orthonormalize( tmpNormal, tmpTan );

ddeMath::vec3 resultTan = tmpTan;

ddeMath::vec3 biNormal = ddeMath::cross( tmpTan, tmpNormal  );

// Now save Normals in the VBO Array
currVBOPos    = &VBOCPU[vboCpuIterator];
currVBOPos[4] = v->n[0]; //Normal 0.
currVBOPos[5] = v->n[1]; //Normal 1.
currVBOPos[6] = v->n[2]; //Normal 2.
currVBOPos[7] = 0.0f;    //Normal 3. (Dummy).

currVBOPos[10] = tmpTan.x;   //Tangent x.
currVBOPos[11] = tmpTan.y;   //Tangent y.
currVBOPos[12] = tmpTan.z;   //Tangent z.

currVBOPos[13] = biNormal.x; //Binormal x.
currVBOPos[14] = biNormal.y; //Binormal y.
currVBOPos[15] = biNormal.z; //Binormal z.
}

// Creates a new Vertex Buffer Object for Mesh, Normals and TexCoords && Populate it
glGenBuffersARB( 1, &( tempMesh->V_N_TC_VBO ) );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, tempMesh->V_N_TC_VBO );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, ( ( tempMesh->numVerts * 16 ) * sizeof( GLfloat ) ), VBOCPU,
GL_STATIC_DRAW_ARB                                         );

// Create a Buffer for the Bone Weights
glGenBuffersARB( 1, &( tempMesh->BWEIGHT_VBO ) );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, tempMesh->BWEIGHT_VBO );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, sizeof( ddeMath::vec4 ) * tempMesh->numVerts, &( WeightBuffer[0] ),
GL_STATIC_DRAW_ARB                                              );

// Create a Buffer for the Bone Weights
glGenBuffersARB( 1, &( tempMesh->BINDEX_VBO ) );
glBindBufferARB( GL_ARRAY_BUFFER_ARB, tempMesh->BINDEX_VBO );
glBufferDataARB( GL_ARRAY_BUFFER_ARB, sizeof( ddeMath::vec4 ) * tempMesh->numVerts, &( BoneIndexBuffer[0] ),
GL_STATIC_DRAW_ARB                                                    );

// Disables VBO Buffer
glBindBufferARB( GL_ARRAY_BUFFER_ARB, 0 );

// Also creates a Vertex Buffer Object for the Triagle List
glGenBuffersARB( 1, &( tempMesh->T_VBO )                  );
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, tempMesh->T_VBO );

// At last we Populate the VBO
glBufferDataARB( GL_ELEMENT_ARRAY_BUFFER_ARB, ( ( tempMesh->numTris * 3 ) * sizeof( GLfloat ) ),
tempMesh->tris, GL_STATIC_DRAW_ARB                );

// Disables VBO Buffer
glBindBufferARB( GL_ELEMENT_ARRAY_BUFFER_ARB, 0 );
tempMesh->isSkinned = true;

//CheckError();

// Free the temporaries VBOs
[u]free[/u]( VBOCPU );

//TESTING PURPOSE//
[u]free[/u]( tempMesh->verts   );
[u]free[/u]( tempMesh->tris    );
[u]free[/u]( tempMesh->weights );

}// Only for non skinned meshes
//} // for (meshes)
}

``````

Well, I will keep looking for it!

Thx!

Have you validated your skinning transforms are otherwise valid through another means?

For instance, put the skin mesh aside (along with skinning and your existing code), and just render the graphical skeleton using 100% CPU code (aside from drawing the lines). For starters, just draw the bind pose (joint orientation transforms only) and draw the bones as simple lines. That’ll verify your joint orientation transforms look reasonable. Then mix your joint animation transforms into that to get the skeleton moving. That’ll verify that your joint animation transforms are probably reasonable. As a ++ solution (totally optional) if you want to see your joint-local space orientations, render a stick for the bones instead of just a line:

If you find a problem just doing this, then perhaps your problem, or one of them at least, is bad joint transforms (or maybe misunderstanding what transforms you were given) rather than issues with skinning in your shader.

Also, more detail on “neither seems to work” would be helpful. Prior to posting this thread, what are you seeing or what aren’t you that leads you to believe there is a problem?