i am rethinking what type of internatl representation to chose for matrices.
i prefer the way dGraphics does it:
<cpp>
Object _11, _12, _13, _14;
Object _21, _22, _23, _24;
Object _31, _32, _33, _34;
Object _41, _42, _43, _44;
</cpp>
is the multiplication via doubly looping faster ?
<cpp>
for()
{
for()
{}
}
</cpp>
i mean is the m[4][4]; the better way to store the data ?
and i also have another question. how can i load SIMD optimized vector classes dynamically ? is there any good solution chosing dynamically which class to use ? (standart, SIMD optimized)
So just check the assembly to find out what’s faster…
(you can do that by going to menu ‘Project -> Settings -> C/C++ tab -> Listing Files -> Assembly with Source Code’)
Also, using for loops is slower, altough it might be possible that the compiler unrolls them… (but I don’t think it does, haven’t checked that actually)
[This message has been edited by richardve (edited 02-21-2002).]
as far as I know
mov eax, DWORD PTR _b1$[ebp]
is as fast as
mov eax, DWORD PTR _b1$[ebp+4]
'cause it’s calculated in the pipe of the cpu bevor the move is executed.
you can optimse the code if you force your compiler to align the matrices and vectors to 32Byte boundary 'cause the cache stores this blocks and if your vector would be in two blocks, then the cache would load two 32byte blocks even if you just need 16byte (4 floats)
so you should store your vectors in a arrays to get more chache hits (with ISSE or 3DNow to cpu will wait and the memory will work )
you can also use the prefetch instructions, I’ve got 30% more performance in some code parts…
and don’t do stuf like:
for(…)
{
do something
use ISSE
}
instead collect work (try to fill the 1L cachesize)