# Computer Graphics and Matrices

Hello!

Well, i searched for this on the internet. But i could not find a convincing answer for the question “Why are matrices used in Computer Graphics”.

I got some explanations saying it is an elegant way of expressing equations. But, are there more important reasons as to why it is used? Say maybe faster processing with hardware(i couldn’t think how that could be possible).

Thanks!

If u ask me, the most powerful feature that matrices give is concatenation whereby several transformations can be combined into a single matrix.

Another advantage is simplicity of representation of any transformation affine or non-affine.

Matrices provide a solid, general tool to represent and combine all common transformations.

You can do translation, rotation, scale, etc. operations without matrices as well. But as soon as you want a general representation for “any” transformation, matrices become a handy choice. The final nail is the capability to combine / concatenate transformations; This is common operation in computer graphics, so it is beneficial to have a single representation and a single way to combine all sorts of transformations.

Thanks for the replies! So, there is no reason why they would could be implemented faster on hardware right?

Thanks!

There are two different things happening here.

Thing number 1 is representing your transformations as matrixes and contatenating multiple matrixes to build your final transformation.

Thing number 2 is multiplying your final matrix by each vertex’s position in order to place it in the world and on your screen.

Thing number 1 is performed in software, thing number 2 in hardware (on most GPUs - there are still some older Intels that will do it in software).

Thing number 1 is only done once per-frame or per-object; at most you will have a few hundred of these per-frame.

Thing number 2 is done per vertex and there will be 10s of thousands, hundreds of thousands, or more of these per-frame.

There is nothing to gain and a lot to lose by performing thing number 1 in hardware. There are comparatively so few of these operations that there would be no measurable performance difference, and by doing it in hardware you lose the ability to quickly and easily read back the result to the CPU (which is often needed).

Thing number 2 is faster in hardware and there are plenty of reasons for why this is so. GPUs are massively parallel, and need no special code or other intervention on your part in order to get this. This means that your GPU can do multiple matrix * position multiplies at the same time. GPUs are also set up for doing this kind of operation more efficiently; a multiplication of a 4-component position with a 4x4 matrix can be done with only 4 instructions, again without any special intervention needed on your part.

Matrix multiplication ARE faster on hardware.
CPU (CPU is hardware) can use sse instruction to multiply row of matrix making 4 multiplication at the same time (a matrix is 4 vector of 4 float), also XBox 360 and Ps3 have native instruction set to manage vectors on CPU/PPU with very low latency.

as mhagain said gpu are very fast to manage a lot of data (doing the same transformation on a lot of vertexes) but usually CPU are faster (more clock), so if you need to compute a multiplication between two matrix is a huge waste of time to do it on GPU.

Another way to represent orthonormal transformation of points is quaternion + vector position, but in this way you can’t do to scale or other non orthogonal transformation.

Say maybe faster processing with hardware(i couldn’t think how that could be possible).

Before we begin, I want to make sure you understand something. Graphics programmers have been using matrices in transformations for at least 3 decades now. Graphics hardware has had matrix transformations built into them, in one way or another, for a decade. People who are paid a lot of money to make things faster have decided that this is the most optimal way of doing things.

So before you start questioning whether people with Ph. D’s and 6–7 figure salaries can actually do their jobs, consider that you perhaps don’t know enough about the subject at hand.

OK, let’s start with a simple problem. You have some vertices and you want to transform them from model space to camera space. All of these vertices will be transformed by the same transformation sequence. So we have one arbitrarily big model that will go through some transformation.

Let’s look at the simplest case: translation. Your transformation will simply be some 3D translation operation.

What’s the per-vertex cost with matrices? Using standard 4D vectors and a standard 4x4 matrix, you have 16 floating-point multiplications and 12 floating-point additions. The per-vertex cost with using the regular math directly is 3 floating-point additions. Seems like a loss for the matrix size.

Let’s say we want to do a rotation. And let’s say that his rotation is axis-aligned. So it is a rotation about the X, Y or Z axis. The matrix cost is 16 multiplies and 12 additions. The regular math cost is 4 multiplies and 2 additions. Again, seems like a loss for matrices.

But how about when we do arbitrary angle/axis rotation, rather than using a cardinal axis? The matrix version is still 16 multiples, 12 additions. But the regular math version jumps to 9 multiplies and 6 additions. A loss for the matrices, but the gap is dwindling.

Now, doing just a rotation is useless; you almost always want some translation in there, yes? So now we do an angle/axis rotation followed by a translation. What’s the cost? Matrices are still 16 multiplies, 12 additions. But the math version is now 9 multiplies and 9 additions.

Fair enough. The regular math wins…

Unless we stop playing around and do something for real.

Consider a hierarchical model of a human. Each transform is relative to its parent transform. Again and again, all the way up to the root. For the typical human figure used in various software, the fingertip transform has about 10 transforms between it and the root (pelvis, lower-spine, mid-spine, upper-spine, clavicle, upper-arm, lower-arm, wrist, finger-joint-1, finger-joint-2, finger-tip).

What’s the cost of doing 10 separate translations+rotations? Well with regular math, you can’t concatenate transforms. So you have to do each one in turn. The overall cost is therefore 10x the cost of doing one transform. So it’s 90 multiplies and 60 additions.

Matrices? Because you can concatenate transforms, it’s just one matrix multiply: 16 multiplies and 12 additions.

Now, you may say that it’s not fair. After all, concatenating those transforms takes time, right? But we’re only looking at per-vertex cost. The cost per-object to do this concatenation, the various matrix multiplies to compute the current matrix, is irrelevant. If each object has a large number of vertices, the overall performance will be governed by the per-vertex cost, not the per-object cost.

If you’re just drawing boxes, then the per-object cost may matter. But if you’re drawing something real, then it almost certainly doesn’t. And if you’re drawing boxes… who cares? My embedded HD 3300 can churn out boxes by the thousand.

Not to mention the simple shader complexity issue. Doing a translation followed by a rotation with regular math requires different shader logic from a rotation followed by a translation. That means you need two different vertex shaders to do these two things. If you need scaling, you now need a third vertex shader. To get all possible orderings of a single scale, translation, and rotation, you need eight shaders.

Do you really need your shaders to actually specifically encode the order of transformations? It’s much simpler to just pass matrix data, where the order of operations is encoded in the matrix (TR is not the same matrix as RT).

And even all of that doesn’t change one simple fact: matrix multiplies are really fast.

See, a matrix multiplication is a very simple operation. It is a vector-vector multiplication, followed by 3 vector-vector multiply/add operations. It looks like this:

``````
MUL temp, mat.x, vec;
MAD temp, mat.y, vec, temp;
MAD temp, mat.z, vec, temp;
MAD  out, mat.w, vec, temp;

``````

Each operation is dependent on the previous, but each channel of the values is independent of the last. Because of that, the shader compiler can boil this down into 4 independent sets of 4 multiplies and 3 adds. It can execute each of those in parallel (because that’s what GPUs do). Therefore, this will take no more than 4 cycles to complete (outside of any pipelining tricks and such).

While this works just as well for regular math in theory, because regular math is… well, regular math, the compiler has to do more work to optimize it as well as this. So you need to spend time getting the transform operation executed in exactly the form that the compiler will see and optimize. Compiles know what M*v means, and they’re good at optimizing that. Optimizing more arbitrary code is not as foolproof.

And let’s say you do get it perfectly correct. Let’s say you get the angle/axis + translation optimized perfectly, just like the matrix multiplication case. What do you save?

In many cases, nothing.

Doing a generalized angle/axis transformation is mathematically no different than doing a 3x3 matrix multiply against a 3D vector. You just don’t have that fourth component getting in there. So it’s 1 MUL and 2 MADs, but on 3D rather than 4D vectors. And you need one more opcode to do the addition for the transation, again on 3D coordinates. So it’s 1 MUL, 2 MADs, and one ADD, all 3D instead of 4D.

So, with each opcode, there is the chance to do something with that 4th component (that’s how shaders work on GPUs).

If you’re talking about NVIDIA hardware GeForce 8xxx or above, or ATI hardware of the new Southern Isles, then they can find something to do with the 4th component fairly often. They are really scalar hardware that can execute 4 separate opcodes on a single shader.

However, if you’re looking at any pre-Southern Isles ATI GPU, or any GeForce 7xxx or below, then you’re in trouble. These are vector hardware, so each component of each opcode pretty much has to be doing the same thing. So each operation is 4D, even if you don’t do anything with it. So unless you have some scalar operation somewhere later (that isn’t dependent on the result of this) which needs some MAD, MUL, or ADD work, that fourth component will go unused.

In short: 3D math costs you just as much as 4D math on that hardware. Even on the scalar hardware, if you don’t have any scalar work (or maybe just some vector input-to-output copying), the compiler’s scheduler is going to have a hard time finding a way to put those extra 4 scalar opcodes to good use.

Can you make regular math transformations faster than matrix math? Yes. Particularly for simple 2D transforms. Should you do this in the general case? Absolutely not.

This, basically.

This is something that people have been doing for close on 20 years. It works. It works well. It’s been proven to work and work well. It’s the way graphics hardware works. So don’t sweat the details; one’s own hunch about things cannot stand up to close on 20 years experience and evidence of what actually works.

Also matrices have been used almost for two centuries (not decades!) to describe geometric transformations in mathematics, and basically nobody found anything better yet.

quaternions?

(Unit) quaternions can only represent rotations.

@Alfonse Reinheart

>> So before you start questioning whether people with Ph. D’s and 6–7 figure salaries can actually do their jobs, consider that you perhaps don’t know enough about the subject at hand.

I’m sorry, i never meant to question anyone! I’m just a noob and i wanted to understand the reason for the usage of matrices. Also, if i knew the subject enough, why would i be asking here? >> i couldn’t think how that could be possible

By posting that, all i meant was, I didn’t know how its done in hardware. It was a genuine question. I don’t know why it led you to think that i was offending anyone.
It was an excellent explanation though. Precisely what i was looking for. Thank you.