Math-friendly matrices?

Hi all,

I was reading this article about SIMD optimizations for 3d maths and I was thinking about the way OGL uses transposed matrices (row-major ordered). Why not conforming to the more widely (and mathematically correct) column-major order? Why not doing this with the incoming OGL 3.0?

Matrices in OpenGL are stored and manipulated in column-major order style (read the specification - section 2.11.2). In you application, you may store them in memory as you wish, there are extensions to deal with both cases.

Note: due to the way that SSE instructions work, it ends up being convenient to have the input matrix be in column-major order. If you’d rather keep your matrices in row-major order (the way OpenGL expects them), don’t panic - it turns out you can do it either way without a significant performance penalty. Keep reading.
This guys says GL expects them row major?
GL has always expected column major and also it is true that column major is better for SSE code like when you want to do matrix * vertex

apeternier, you are confused because you are treating matrices stored in column-major order as if they were stored in row-major order (which look like they were transposed).

Now, row-major or column-major ordering is related to how you store the data of a matrix, and OpenGL expects column-major ordered matrices. This means that the real logical matrix IS mathematically correct and we need to store them column by column (column-major) in the one dimensional array.
The confusion is created because a lot of people treat the one dimensional array as if it is stored row by row (row-major).
This is a very fundamental mistake, some graphics packages just go ahead and read the one dimensional array as if it was row-major.

OpenGL matrices are mathematically correct:

OpenGL expects you to treat the array column-major:

What do you mean by OpenGL matrices being mathematically correct? Laws of Mathematics do not define memory layout of matrices in computer memory.

But they define the layout of matrix elements in a linear transformation.

So if you say I have a linear transformation matrix that does such and such transformation, but really what you have is the transpose of it, then your expressions are “mathematically incorrect”, at least not obeying to a certain standard in matrix theory. You would need other “mathematically incorrect” operations to make it all work (vM for applying your transformation instead of Mv). apeternier thought OpenGL does not confirm to “mathematically correct” matrices, but I explained that the reason that he thinks so is that he is reading them as if they are row-major matrices, which has the effect of transposing the matrix.

v * M is no more “mathematically incorrect” than M * v. It’s a valid operation, and giving a meaning to these values is up to the person who created the matrix.

v * M is no more “mathematically incorrect” than M * v. It’s a valid operation, and giving a meaning to these values is up to the person who created the matrix.

It is a matter of convention. Standard linear algebra defines a specific convention for these things. While it may be equivalent to the opposite convention, nobody in mathematics uses the opposite convention. If you were to write a serious mathematics paper that uses row-major matrices, you would probably not be published until you rewrote it to column-major.

It’s not the same as left-handed coordinates vs. right-handed coordinates. In that case, many different people use many different coordinate systems for many different reasons. The only people who use row-major matrices are graphics programmers, and that’s only because its the way the hardware works fastest.

Why would

mat4 A;
vec4 B;
vec4 C;

C.x = dot(A.row0,B);
C.y = dot(A.row1,B);
C.z = dot(A.row2,B);
C.w = dot(A.row3,B);

be faster than

mat4 A;
vec4 B;
vec4 C;

C = B.x*A.column0;
C+= B.y*A.column1;
C+= B.z*A.column2;
C+= B.w*A.column3;



It should be obvious.

In case 2, you can’t parallelize the process. If you can compute 4 independent shader opcodes per cycle, it will still take you 3 cycles to complete that matrix multiply. That is because the results are dependent on each step. You do all the multiplies on one cycle, but you can’t get the results of any of them to add together until cycle 2. Cycle 2 does two vec4 adds, but those results aren’t available until Cycle 3 where those 2 vec4’s can be added together.

In case 1, each step is totally independent of the others. If you have 4 independent shader opcodes, they can all write to different parts of the same vector, so it only takes 1 cycle. If you only have 3 opcodes available, it takes 2 cycles, whereas case 2 still takes 3.

It depends on how it is implemented. Assuming the hardware doesn’t natively support a matrix by vector multiply and instead it does 4 dot products :

C.x = dot(A.row0,B);
C.y = dot(A.row1,B);
C.z = dot(A.row2,B);
C.w = dot(A.row3,B);

will take 4 cycles.

In the second case :

C = B.xA.column0;
C+= B.y
C+= B.zA.column2;
C+= B.w

the first line is a MUL, the next 3 lines are MAD.
In total, 4 cycles.

The above is true for older GPUs. I don’t know the details of SM 4 hardware and haven’t dwelved into the newer CPU functions for a while.

Indeed V-man, that was my point exactly.

Being a GPGPU programmer myself I was rather in a CUDA state of mind when thinking about this. IMHO, I believe that in architectures based on stream processors it’s best to keep ‘threads’ separate. So assume we have four threads running. In the second case each thread accesses an element of A.column0 and is being multiplied with the value of B.x that’s being broadcasted to all four threads and each thread accumulates it’s value in the element of C corresponding to its thread.

The first case would be harder to implement efficiently, since the vec4 accesses need to be coalesced (one value per thread) but for the dot product the sum of different threads needs to be calculated. I believe this to be true for G80+ anyway where the architecture is scalar-based instead of (vec3+scalar)-based.


But many people in computer graphics use the opposite convention. Besides, row/column-major is about storage, while we’re talking about the logical layout of the matrix elements, i.e. whether you put the axes of the transformed coordinate system into the rows or the columns of the matrix.

The only people who use row-major matrices are graphics programmers, and that’s only because its the way the hardware works fastest.

Mv or vM makes no performance difference on any GPU I know of.

Row vs. column major matrices has nothing to do with mathematics. In all mathematics papers I know, they don’t care about how the matrix is stored in memory. A matrix is just a 2-dimensional array of numbers.

The only difference between row and column major is the order how the 2-dimensional matrix is stored in 1-dimensional memory. Not a single formula is changed.

Of coures, with changing the matrix order, you change the implementation of the matrix operations. But thats an implementation concern that has nothing to do with mathematics. The mathematical semantics stays exactly the same.

If one would argue there’s such a thing as row-major or column-major in mathematics, then clearly the way a matrix is normally used in math more closely matches how a 2D array in C/C++ is normally used, which is row-major. While there’s nothing preventing you from using any other convention, you normally access a 2D array in C/C++ as M[row][column]. This matches the M[row, column] notations in mathematics.

As for v * M vs. M * v, they are about equally common in the math material I’ve read. It just depends on if you prefer to view a vector as 1xN or Nx1 matrix.

For square matrixes this is true. For non-square it matters. For instance with a 4x3 matrix you can have either mul-mad-mad-mad or dp4-dp4-dp4, and for 3x4 you either get mul-mad-mad or dp3-dp3-dp3-dp3.

True, I only had non-square matrices in mind. Well, at least for the latest scalar architectures it shouldn’t matter any more since you can use the same instruction sequence, just with different source registers.

While v * M (or really, v^T * M) is a mathematically correct operation without no context, it is mathematically incorrect if the matrix M is meant to be the mathematical entity “transformation matrix”, v is meant to be the mathematical entitiy “vector to be transformed”. These terms are well defined in mathematics and the mathematically agreed form will always dominate your own decision of representation if you claim that you are representing the mathematical concept of “homogeneous transformation”. Transformation matrices are defined unambiguously by mathematics, specifically by matrix theory and linear transformations branches of mathematics. Check the “Geometry” section of the rotation matrix link I gave from wikipedia to see that M * v is the correct one. Unlike your claim, it is not up to the person to give a meaning to matrix-vector multiplication as a linear transformation, it is well defined by mathematics which is the the theory behind computer graphics.

A little correction: mathematically, there is no such thing as “row-major matrix” or “column-major matrix”. There’s the logical entity “matrix”, and you need to decide on the order only when you need to store this two dimensional entity in one dimensional storage. You are supposed to be consistent about this storage order if you want the logical entity to be intact. In other words, the way you read and write the matrix into memory should not change anything about that logical matrix entity. What you mean by “row-major matrices” in that sentence is in fact either “matrices that were stored as column-major matrices but were read as row-major matrices”, or “transposed transformation matrices”. Which, I hope, stresses the source of all this confusion.

Transformation matrices were defined well before OpenGL, and OpenGL uses them as they are. The problem is ignoring the fact that OpenGL stores them in column-major order in one dimensional memory and treating that one dimensional memory as if the matrix was stored in row-major order, effectively transposing the matrix. In my opinion, propagating this problem into higher levels of code by representing transformation matrices by their transposes is just bad programming practice because it does not obey a well-known mathematical standard and it’s not what OpenGL man pages tell you when they mention transformation matrices. A well written matrix class should know when its one dimensional storage is meant to be in column-major order and when the user asks for an element of the matrix it should not directly use [] operator of C which would imply row-major order.

It’s more intuitive to read one dimensional memory into two dimensions in row-major order, and OpenGL did not choose to use this intuitive ordering because of efficiency, which is the source of all this confusion.

Edit: I just realized that there was a second page to this thread and most of my points were already made:)

True, both may be valid matrix-vector operations, but the latter is how transformations are applied to vectors in maths.