Faster Matrix Multiplies

for a matrix multiply, is it faster to push the gl matrix, load the gl identity, mutiply by m1, multiply by m2, glGet the result, and pop the original gl matrix, or to hand code an asm matrix multiplier?

i imagine the former will use available hardware T&L, if it’s present, but what’s the trade off on the number of loads necessary?

for example, which would be faster, or are they both wrong:

inline Matrix MultiplyMatrix1( const Matrix& m1,const Matrix& m2 )
{
Matrix Product;

glPushMatrix( );
glLoadIdentity( );
glLoadMatrixd( &m1 );
glLoadMatrixd( &m2 );
glGetDoublev( GL_MODELVIEW_MATRIX,&Product );
glPopMatrix( );

return Product;
}

inline Matrix MultiplyMatrix2( const Matrix& m1,const Matrix& m2 )
{
// insert a faster asm/unrolled equivalent of:
Matrix Product;
for( int i = 0; i < 4; i++ )
for( int j = 0; j < 4; j++ )
for( int k = 0; k < 4; k++ )
Product[i][j] += m1[i][k]*m2[k][j];

return Product;
}

i’m ok at asm, and i’d have no problem w/ implementing it here, but it’s not as easily ported then…

any ideas?

thx for any replies (well, not ANY )

EDIT:
actually, wouldn’t MultiplyMatrix1( m1,m2 ) be MultiplyMatrix2( m2,m1 ), because ogl does matrices backwards?

[This message has been edited by Succinct (edited 12-12-2000).]

Yuck, don’t use OpenGL as your matrix multiplication library. MultiplyMatrix2 will always be faster. If it starts with “glGet”, you can assume that it will be at minimum, a little slow, and in the worst case, extremely slow (for example, if the matrix is kept in the HW and not shadowed on the host).

In your example, you probably meant glMultMatrix, not glLoadMatrix. Also, glLoadIdentity followed by glMultMatrix should be replaced by glLoadMatrix, which will be somewhat faster.

  • Matt

Do your own matrix multiplies in software. All hardware vendors STRONGLY discourage any per-frame use of glGet because it screws up pipelining. (It might not hit you, but it will hit someone somewhere.)

Also, I wouldn’t bother with asm to start with. It was necessary back when CPUs were slower and you were transforming all vertices yourself, but might not be any more. Profile your app first. I’m doing all my maths in software, using not-really-optimized C++ with spurious temporaries all over the place, and all the maths code combined is less than 0.1% of execution time in the profiler.

Originally posted by MikeC:
and all the maths code combined is less than 0.1% of execution time in the profiler.

So you have plenty of CPU left to improve your monsters’ AI !

Hem, sorry, I am getting crazy tracking one of my bugs…

Regards.

Eric

kewl, Thx, guys.

i figgerd the glGet stuff would be slow, d/l from server to client and stuff, but i was just curious to see how to access the HW transform stuff.

thinking about it, i figured it out (CVA/extensions…)

thx again

-Succinct