Camera matrix definition

I have been studying computer vision for a while and now I am approaching OpenGL for the first time. I am trying to develop a small application, but there is a mismatch from what I studied and what I see in the OpenGL definitions.

Until now, I have been dealing with a so called “camera matrix”, usually indicated with P (ref: Multiple View Geometry in Computer Vision, R. Hartley and A. Zisserman, Cambridge University Press, 2000; and many other books).This is a 4x3 matrix that performs the projection from 3D points to 2D projected points. Since to describe a 3D point in homogenous coordinates we need 4-vectors, and 3-vectors to do the same with 2D points, the P matrix is 4x3.
If X is the 3D point defined as X=(x,y,z,w) and x is its projection on a plane defined as x=(x,y,w), then x=PX.

Now, as far as I have seen in OpenGL there are only 4x4 matrices, defining rotations, reshaping, translations and projections. What is the relationship between the camera matrix P that I’m used to use and the OpenGL matrices? Why also the projection matrix is a 4x4 matrix?

Thank you

Have fun :

I guess even if not all matrices need to be 4x4, they are done this way to be completely generic and invertible.

For camera matrix its ok if you store it internally as 4x3 to save some space. projection matrix on other hand can have meaningful values in all 16 field and has to be 4x4. to make it possible to multily them and get a ModelViewProjection you have to provide the camera and world matrices as 4x4 as well. also as ZbuffeR said you can invert only square matrices and internally the driver does many things with your provided matrices.

And as if it is not sufficient, I suggest you to read the paragraph 2.12 “Coordinates transformation” of the Opengl specification that explains clearly and briefly the OpenGL philosophy of transforming coordinates: modelview, projection, viewport transformation matrices…

I’ve already read all those documentation, but I think the real answer is that when you deal with the problem with a purely mathematical point of view, you don’t have to consider the occlusion problem. 3D points could have their projection in the same 2D point and this is perfectly fine as long as the math is concerned.
Obviously when OpengGL needs to render the 3D objects, needs to know what lies in front of what; what they did is to consider even the projection as a 3D point, where the z component has no meaning for the projection itself, but it’s used to determine what must be displayed and what is occluded. And this is the use of the z buffer.

Am I right?

If so, the only thing I have to do is to slightly change my math description to add an extra column to my projection matrix. I have to check if this has any implication in the way I recovered the camera matrix P.

Thanks a lot

the z coordinate after viewport transformation of all fragments at a particular screen position is compared to each other to remove hidden parts. On a mathematical viewpoint, I mean, as long as you don’t really want to rasterize 3D objects, the z coordinates is useless in screen coordinates and that is why you don’t take into account the z coordinate in your 2D coordinates.

But practically, you don’t live in a wonderful world where everything works fine. When you rasterize a 3D objects, you have to know which part of this one is in front of another one… and you can do it only with the depth buffer system. That is why OpenGL keeps this z coordinate even in the screen coordinates, just to perform then, the dept comparison.
Now, all this as nothing to do with homogeneous coordinates, the 4x4 matrix trick for 3D transformation is just used to be able to represents all 3D transformations in one matrix and to not maintains several matrices for rotation and translation for instance.
In conclusion, you are right but your 3x4 matrix model is unpractical for the reasons stated above and as you said you need to port this model to the opengl one.

If a translation is the identity matrix with m13, m14 & m15 representing the x, y & z translation, what do m4, m8 & m12 represent?

In one code, an orbitEye point was set in m13, m14 & m15, and orbitPoint in m4, m8 & m12, but that is all that has been found regarding use of the 4th row.

Thanks for any clarification.

AFAIK, the 4th row is useful for computing projections. Here you can find a projection example set with the glFrustum function; And in this function, the fourth row is used to divide all coordinates by -z. Remember that all homogeneous coordinates are then divided by its fourth component (w), to get back to 3D coordinates.

About camera (and a direct application are billboards) try to see this helpfull link…

The author helped me a lot …


i recently battled with this same issue. the way i ultimately solved it was pretty ad hoc, but it did get the job done.

first, the difference between multiplication methods in opengl and h&z must be rectified by transposing the camera matrix, so it is really 4x3 rather than 3x4.

to create a 4x4 matrix (which opengl requires, i inserted a new column between cols 2 and 3. so this means that row 1 in your solved mat becomes col 1 in the opengl matrix, row 2 becomes col 2 and row 3 becomes col 4.

now for the ad hoc part. the best results i could muster came when i set the 3rd column of the model-view-projection matrix to be (0 0 -0.05 0). i also tried (0 0 1 0) which was a total failure and (0 0 x 0) where x is the 3,3 entry from the original camera matrix. this worked ok, but gave some clipping issues.

hope this helps!