# OpenGL Projection matrix to intrinsic matrix

Hi

I was wondering: how exactly do I convert a 4x4 Projection matrix used in OpenGL to a 4x3 intrinsic matrix (see code snippet)?

``````
|FMx	s	Px	0|
|0	FMy	Py	0|
|0	0	1	0|

``````

What is an “intrinsic matrix”? I Google’d the term but didn’t find anything.

In any case, you cannot convert a perspective projection matrix into a 4x3 matrix. You need all four rows to be able to transform into a 4D homogeneous coordinate system.

Do you mean this?
http://en.wikipedia.org/wiki/Camera_resectioning

http://www.opengl.org/discussion_boards/…2205#Post272205

@Ludde: Yes, that is what I meant

@Abdallah: the discussion is similar, but not the same. I could use the

P=glFrustum( zNearcx/fx,-zNear(w-cx)/fx, zNearcy/fy, -zNear(h-cy)/fy,zNear,Zfar)

transformation, but I don’t know what the cx, cy, fx and fy values are. I only know the projection matrix, which I obtained by doing

``````
gl.glGetFloatv(gl.GL_PROJECTION_MATRIX, stack, 0);

``````

Well, he can convert, by for example, snipping away the bottom row, maybe he does not need it. By ‘intrinsic’ he probably means the good old C array float matrix[][];

Anyway, generally you shouldn’t query the GL state an application sets itself. Cache it somewhere.

I have more or less solved the issue by following the steps described on this website: http://www.songho.ca/opengl/gl_transform.html

My x coordinates are now ok, my y coordinates however remain wrong, with no appearant pattern to fix it. Does anyone know about any common issues with this?

These are my results:

Extracted results from OpenGL:
1 = 73.0, 363.0
2 = 117.0, 444.0
3 = 290.0, 362.0
4 = 464.0, 405.0

Calculated results
1: x = 70.26506249042694, y = 195.04359793445985
2: x = 114.41204999234157, y = 110.08719586891971
3: x = 291.0, y = 195.04359793445985
4: x = 467.58795000765843, y = 152.56539690168978

Yea, the common issue might be, that you’re providing GL with a row-major matrix, but need to provide a column-major one. Here’s some code for ya:

``````
template <class T>
inline void finite_frustum_perspective_projection_matrix(T const frustum[],
Matrix<T, 4, 4>& matrix)
{
BOOST_ASSERT((GetN(frustum) > 0) && (GetF(frustum) > 0));

matrix = 0;

matrix(0, 0) = 2 * GetN(frustum) / (GetR(frustum) - GetL(frustum));
matrix(0, 2) = (GetR(frustum) + GetL(frustum)) /
(GetR(frustum) - GetL(frustum));
matrix(1, 1) = 2 * GetN(frustum) / (GetT(frustum) - GetB(frustum));
matrix(1, 2) = (GetT(frustum) + GetB(frustum)) /
(GetT(frustum) - GetB(frustum));
matrix(2, 2) = (GetF(frustum) + GetN(frustum)) /
(GetN(frustum) - GetF(frustum));
matrix(2, 3) = 2 * GetF(frustum) * GetN(frustum) /
(GetN(frustum) - GetF(frustum));
matrix(3, 2) = -1;
}

``````

Try this with tvmet, but probably transpose, before you upload.

Also, check out http://library.nu/ .

I doubt it, since I’m not actually using row-major or column-major matrices for this, I’m using a Matrix framework that takes care of all matrix calculations for me. I’ll show you my code, so you can understand what’s going on (it’s in Java btw, I’m using JOGL):

``````
double[][] modelview = new double[][]{
{1,	0,	0,	-5},
{0,	1,	0,	-5},
{0,	0,	1,	-25},
{0,	0,	0,	1}};

Matrix v = new Matrix( modelview );

double[][] proj = new double[][]{
{3.732051,	0,		0,		0},
{0,		3.732051,	0,		0},
{0,		0,		-1.00001,	-0.400002},
{0,		0,		-1,		0}};

Matrix p = new Matrix( proj );

Matrix eye = v.times( vertex );

Matrix clip = p.times( eye );

clip.set(0, 0, clip.get(0,0) / clip.get(2, 0) );
clip.set(1, 0, clip.get(1,0) / clip.get(2, 0) );
clip.set(2, 0, clip.get(2,0) / clip.get(2, 0) );

double x = 0;
double y = 0;
double w = 582;
double h = 560;
double n = 0.2;
double f = 40000;

double[][] screen = new double[][]{
{ ( (w/2)*clip.get(0,0) ) + (x + (w/2) ) },
{ ( (h/2)*clip.get(1,0) ) + (y + (h/2) ) },
{ ( ( (f-n) /2)*clip.get(2,0) ) + ( (f+n) /2 ) }};
Matrix screenPoint = new Matrix( screen );

``````

You obviously initialize your matrices as row-major matrices, that’s about all I see. Don’t know what JOGL does internally. This looks suspicious to me, but maybe that’s just me:

``````
clip.set(2, 0, clip.get(2,0) / clip.get(2, 0) );

``````

I did some searching and it turns out you were right, Ugluk: I was normalising my clip matrix incorrectly, I fixed that now:

``````clip.set(2, 0, clip.get(2,0) / clip.get(3, 0) );
``````

I also discovered that I was using a row-major matrix as modelview matrix and column-major as projection matrix I fixed those too.

This is how I extract the data, with the results

``````
float mv[] = new float;
gl.glGetFloatv(gl.GL_MODELVIEW_MATRIX, mv, 0);
// Result: [1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, -5.0, -5.0, -25.0, 1.0]

float pr[] = new float;
gl.glGetFloatv(gl.GL_PROJECTION_MATRIX, pr, 0);
// Result: [3.732051, 0.0, 0.0, 0.0, 0.0, 3.732051, 0.0, 0.0, 0.0, 0.0, -1.00001, -1.0, 0.0, 0.0, -0.400002, 0.0]

``````

This is my current calculation

``````
double[][] modelviewRowMajor = new double[][]{
{1,	0,	0,	0},
{0,	1,	0,	0},
{0,	0,	1,	0},
{-5,	-5,	-25,	1}};
double[][] modelviewColumnMajor = new double[][]{
{1,	0,	0,	-5},
{0,	1,	0,	-5},
{0,	0,	1,	-25},
{0,	0,	0,	1}};
Matrix v = new Matrix( modelviewColumnMajor ); // or modelviewRowMajor

double[][] projectionRowMajor = new double[][]{
{3.732051,	0,		0,		0},
{0,		3.732051,	0,		0},
{0,		0,		-1.00001,	-1},
{0,		0,		-0.400002,	0}};
double[][] projectionColumnMajor = new double[][]{
{3.732051,	0,		0,		0},
{0,		3.732051,	0,		0},
{0,		0,		-1.00001,	-0.400002},
{0,		0,		-1,		0}};
Matrix p = new Matrix( projectionColumnMajor ); // or projectionRowMajor

Matrix eye = v.times( vertex );
Matrix clip = p.times( eye );

double[][] normalisedClipMatrix = new double[][]{
{clip.get(0,0) / clip.get(3, 0)},
{clip.get(1,0) / clip.get(3, 0)},
{clip.get(2,0) / clip.get(3, 0)},
{clip.get(3,0) / clip.get(3, 0)}};
Matrix normalised = new Matrix( normalisedClipMatrix );

double x = 0;
double y = 0;
double w = 582;
double h = 560;
double n = 0.2;
double f = 40000;

double[][] screen = new double[][]{
{ ( (w/2)*normalised.get(0,0) ) + (x + (w/2) ) },
{ ( (h/2)*normalised.get(1,0) ) + (y + (h/2) ) },
{ ( ( (f-n) /2)*normalised.get(2,0) ) + ( (f+n) /2 ) }};
Matrix screenPoint = new Matrix( screen );

``````

Because of this, my x-coordinates are slightly more accurate but my y-coordinates are still very wrong

Actual:
pos 1: 73.0, 363.0
pos 2: 117.0, 444.0
pos 3: 290.0, 362.0
pos 4: 464.0, 405.0

Calculated:
pos 1: x = 73.79463179999999, y = 196.40205759999998
pos 2: x = 117.23570544, y = 112.80411519999998
pos 3: x = 291.0, y = 196.40205759999998
pos 4: x = 464.76429456, y = 154.6030864

You must have other problems somewhere. Keep trying and you’ll eventually see what’s wrong. I wouldn’t trust the library you’re using as much as you do.

Hi

I managed to find the mistake and correct it, I now get correct results. Unfortunately, the result is no good as I’m not using the correct method (part of a bigger algorithm, just go with it). So basically I’m back to square one, so I’ll try to explain better what I want to do.

Here 's an image of the calculation I have to do:

In short: I’m trying to calculate the screen position of point P in 3D from the current camera position in the 3D world.

The intrinsic camera parameters are internal camera properties (different for every camera). The extrinsic camera properties are the camera translation (t)and rotation ®.

• the 3D position of point P is KNOWN

• the camera position and rotation (matrix D) are KNOWN

• what I need: the intrinsic camera matrix

The problem here is that OpenGL is not an actual camera, and it’s conventions are slightly different than those of an actual camera. So I have no idea how to do this. As far as I know, the values I need are “hidden” somewhere in the Projection matrix, but I have no idea how to correctly extract them.

ANY help is appreciated…

Based on your diagram, the “extrinsic” camera matrix (D) is the VIEWING transform (in OpenGL terms).

And the “intrinsic” camera matrix (K) is some kind of PROJECTION transform (probably perspective). It doesn’t match OpenGL’s perspective projection transform (see here, for instance): but from the form you cite, I suspect you’re just trying to omit the screen-space Z (depth) transform and just do X and Y.

In any case, hopefully this helps you map the concepts to what OpenGL calls them. If you read “Chapter 3: Viewing” in the OpenGL Programming Guide, this will all become even more clear (you can browse this on-line here).

Once you determine the PROJECTION matrix you want, you can multiply these VIEWING and PROJECTION matrices with your VIEWPORT transform and take your 3D points from world to screen coordinates. gluProject is one function that will do this for you, though it’s not hard to do it yourself.

gluPerspective is a convenience function to build a PROJECTION matrix that takes the camera parameters probably more in terms you’re used to thinking about: vertical field-of-view (FOV) and aspect ratio. Here’s some code that computes the projection matrix values from those inputs:

You’re absolutely right, but the thing is: this calculation is part of a larger algorithm where the camera rotation and position are the input variables, so gluProject is absolutely no good to me.

I know I can extract the modelview matrix using glLoadMatrix(), but as I understand it, the modelview matrix is the view matrix multiplied with the model matrix, so I’m not sure if that’s any good to me

Ok, so from those inputs (R and t presumably), you can completely determine the VIEWING transform, aka matrix D.

I know I can extract the modelview matrix using glLoadMatrix(),…

I think you mean “glGetFloatv(GL_MODELVIEW_MATRIX, matrix)”.

…but as I understand it, the modelview matrix is the view matrix multiplied with the model matrix, so I’m not sure if that’s any good to me

Well, from the above, sounds like you know R and t, and thus can compute the VIEWING transform (matrix D). But if that assumption is incorrect…

…regardless of whether you or some other software is in control of the MODELVIEW matrix, typically the VIEWING matrix is pushed on first (e.g. via gluLookAt, or some app-specific functionality), and then various MODELING transforms are multiplied on top.

So just grab the MODELVIEW matrix contents right after the VIEWING transform is loaded but before any MODELING transforms are multiplied on along with it. If someone else is loading the VIEWING transform, get them to give it to you, or call you so you can get it right after they load it.

Thank you, Dark Photon, that’s very helpful.

Now a follow-up question: say, I have a scene with 20 objects and one camera. The viewing matrix is known, as are the positions of the objects in the scene.

If I understand it correctly, the model matrix is different for each object. Hence, if I take 4 objects at random after rendering, there is no way for me to know what the model matrices (and as a consequence also the modelview matrix) of these objects are, correct? Or is there some way of estimating this based on their positions in the 3D world?

Often, yes.

Hence, if I take 4 objects at random after rendering, there is no way for me to know what the model matrices (and as a consequence also the modelview matrix) of these objects are, correct? Or is there some way of estimating this based on their positions in the 3D world?

It sounds like someone else is tweaking the MODELVIEW stack, and you’re trying to figure out from watching it what’s going on (?)

Do you know the initial VIEWING matrix? And do you have access to query the MODELVIEW matrix between each object render?

If so then you can compute each object’s MODELING transform if desired. MODELVIEW = VIEWING * MODELING. So at any given time, you can compute the MODELING transform using VIEWING^-1 * MODELVIEW = MODELING.

Not exactly, but you’re close. Like I said, this is part of a bigger algorithm. The ultimate goal is to observe a scene with a camera, identify several objects in the scene and use those objects to calculate the camera position in the scene. Basically, what’s happening is that I’m rendering a scene with OpenGL, after which I need to replicate this process manually in order to calculate the camera position.

Short version:

``````
1. render scene
2. make view (screenshot)
3. identify key objects
4. calculate camera position
A. set camera position (educated guess - translation and rotation)
B. calculate view (screen) positions of objects from guessed camera position
C. calculate error between measured result (step 2) and calculated result (step B)
D. adjust camera position according to measured error in step C
E. repeat until result is found or error is below threshold

``````

this means I have the viewing matrix (the educated guess), but not the model matrix of the objects. In theory I could store this for each object, and if this proves to be the easiest solution I will probably do so. It’s just not a very logical way of doing it, that’s why I’m trying to avoid is 