for a matrix multiply, is it faster to push the gl matrix, load the gl identity, mutiply by m1, multiply by m2, glGet the result, and pop the original gl matrix, or to hand code an asm matrix multiplier?
i imagine the former will use available hardware T&L, if it’s present, but what’s the trade off on the number of loads necessary?
for example, which would be faster, or are they both wrong:
inline Matrix MultiplyMatrix1( const Matrix& m1,const Matrix& m2 )
glLoadMatrixd( &m1 );
glLoadMatrixd( &m2 );
glGetDoublev( GL_MODELVIEW_MATRIX,&Product );
inline Matrix MultiplyMatrix2( const Matrix& m1,const Matrix& m2 )
// insert a faster asm/unrolled equivalent of:
for( int i = 0; i < 4; i++ )
for( int j = 0; j < 4; j++ )
for( int k = 0; k < 4; k++ )
Product[i][j] += m1[i][k]*m2[k][j];
i’m ok at asm, and i’d have no problem w/ implementing it here, but it’s not as easily ported then…
thx for any replies (well, not ANY )
actually, wouldn’t MultiplyMatrix1( m1,m2 ) be MultiplyMatrix2( m2,m1 ), because ogl does matrices backwards?
[This message has been edited by Succinct (edited 12-12-2000).]