# Matrix * vector or vector * matrix

matrix * vector seems more natural because it can be converted to DP4 instruction or 3 DP3 instructions.

The other one will require the driver to transpose the matrix itself to do the DP4 and DP3 instructions or do some funky inefficient math in the shader.

Right?

Yes, this is right.
Beside that “matrix * vector” is mathematically the correct form if you use column vectors.

Actually, doing it the other way isn’t any less efficient. Instead of DP4, DP4, DP4, DP4 you get MUL, MAD, MAD, MAD.

Actually, doing it the other way isn’t any less efficient. Instead of DP4, DP4, DP4, DP4 you get MUL, MAD, MAD, MAD.
Assuming that MAD (and DP4, for that matter) doesn’t get converted into 2 machine opcodes…

Originally posted by Humus:
Actually, doing it the other way isn’t any less efficient. Instead of DP4, DP4, DP4, DP4 you get MUL, MAD, MAD, MAD.
Don’t you mean

instead of DP4, DP4, DP4, DP4?

If it’s only MUL, MAD, MAD, MAD, then I guess I don’t know that trick. Can you post an example?

I think it should not matter if you do matrix * vector or vector * matrix as the driver can choose what suits it best. (ie. as they are equal in result - but I would think vector * matrix is the most optimal as matrices are most likely stored in colum major order and would expand to 4 DP4’s)

I think Humus may be getting confused by multiplying a vector by the transpose of a matrix. In this case you use MUL,MAD,MAD,MAD.

I think Humus may be getting confused by multiplying a vector by the transpose of a matrix. In this case you use MUL,MAD,MAD,MAD.
Reverse-multiplying is the same thing as multiplying with the transpose.

Opps, I don’t kow what I was thinking…

vM != Mv
(This is changing the vector between being a row vector to column vector in the second operation)

If column major order:
vM = DP4, DP4, DP4, DP4
M

If row major order:
M
v = DP4, DP4, DP4, DP4

===========================
Colum major M*v

MUL out, M, v
MAD out, M, v, out
MAD out, M, v, out
MAD out, M, v, out

as opposed to Colum major v*M
DP4 out.x, M, v
DP4 out.y, M, v
DP4 out.z, M, v
DP4 out.w, M, v

sqrt[-1], to be clear…

We want to do v * M, but M is not transposed (as it should be)

``````//vertex * matrix in GLSL
MUL out, M, v
MAD out, M, v, out
MAD out, M, v, out
MAD out, M, v, out
``````

We want to do v * M, and M is transposed, so it ends up being M^T * v

``````//matrix * vertex in GLSL
DP4 out.x, M, v
DP4 out.y, M, v
DP4 out.z, M, v
DP4 out.w, M, v
``````

Was it the DP4 in the fragment shader that takes 2 instructions on the R300?

NOTE: My original question was aimed at doing this in the vertex shader

Originally posted by V-man:
Was it the DP4 in the fragment shader that takes 2 instructions on the R300?
No, it uses only one instruction slot, but it encumbers the scalar pipe too, so in a sense it runs at half the speed of DP3 since it (unlike DP3) doesn’t leave the ability to issue another scalar instruction in parallel with it.

V-Man,
That seems to be right assuming the matrices are stored in row major order.

Originally posted by sqrt[-1]:
V-Man,
That seems to be right assuming the matrices are stored in row major order.

In shaders, those are always row accesses, so I prefer to talk in terms of transposed or not transposed.

I think a vector of 4 can be defined as a row or column vector.

I also think GLSL uses column major order. (ie. matrix selects the first column of the matrix)
(or in your case the first row of the transposed matrix -I still think saying column is easier )

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.