OpenGL for computer vision

I’m doing a small project on augmented reallity and want to use openGL to render an object that is to be inserted in a moviesequens.
Now the question:
How are the matrix for the camera defined in openGL?
How do I convert the parameters in a cameramatrix (as defined in computer vision) to openGL format?

Hi. I, too, am a vision research kinda guy.

What do you mean convert it into a computer vision matrix? There is no “standard” computer vision matrix. For example, you might see proj matricies in vision as the identity, with the understanding points are projected into focal multiplied coordinates.

There are a number of parameters that vision knows about that aren’t modeled in opengl, at least not explicitly. The optical centre in vision is just the axis alignment in opengl. Radial distortion isn’t modeled, tho’.

if you want some more thoughts on this, email me?


… The correspondent relationship of the transformation matrix …
… between the computer graphics & computer vision …

1.Coordinate Transformation in Computer Vision

Without considering the lens distortion,a 34 projection, for a
pinhole camera model, an 3
4 projection matrix M always appears in
the computer vision literature like the following one:

…[u]…[Xw] [f/(dx/dy) 0 u0 0] […t0]…[Xw]
Ze*[v]=M*[Yw]=[…0…f v0 0]*[…R…t1] * [Yw]
…[1]…[Zw] […0…0 .1 0] […t2]…[Zw]
…[ 1] … [0 0 0 1 ]…[ 1]

…[f/(dx/dy) 0 u0 0] [Xe]
…=[…0… f v0 0]*[Ye] … (1)
…[…0… 0 .1 0] [Ze]
…[ 1]

… t
[Xw,Yw,Zw,1] : world coordinate
… t
[Xe,Ye,Ze,1] : eye coordinate

R : 3*3 rotation matrix
t0,t1,t2 : translation vector
(u0,v0) : optical center in the image
f : effective focus for the pinhole camera model
dx,dy : the pixel size in the x and y direction

The extrinsic matrix just corresponds the MODELVIEW matrix in OpenGL.
For a clean discussion,later we’d like to leave it out and start from
an eye coordinate.

From equation 1, we have:
/Zeu=fXe/(dx/dy)+u0Ze . —> … / u=f/(dx/dy)(Xe/Ze)+u0
\Zev=fYe… +v0*Ze … \ v=f… *(Ye/Ze)+v0 (2)

2.Coordinate Transformation in Computer Graphics

In this section we will consider how the transformation is performed
in the most popular Graphics library OpenGL. As mentioned above,we start
the transformation from eye coordinate.

[Xc] … [Xe]
[Yc] = P*[Ye] … (3)
[Zc] … [Ze]
[wc] … [ 1]

u = (Xc/Wc+1)*width /2 + x0
v = (Yc/Wc+1)*height/2 + y0 … (4)

Equation 3 formulates the perspective projection, without loss of
generality,we specify the W component of the eye coordinate as 1.

The viewport transform is expressed in equation 4, in which
(u,v) is the screen coordinate. (x0,y0,width,height) are the parameters
provided in the glViewport function call.

If we build our projection matrix using gluPerspective,which is very
convenient and most frequently used, we can write out the perspective
matrix P in OpenGL like this:

… ctg(fovy/2)
… [ ----------- 0 … 0 … 0 … ]
… aspect
… [ … 0 ctg(fovy/2) . 0 … 0 … ]
P = … zFar+zNear 2zFarzNear … (5)
… [ … 0 … 0 … ---------- ------------- ]
… zNear-zFar zNear-zFar
… [ … 0 … 0 … -1 … 0 … ]

fovy,aspect,zFar,zNear are the parameters of gluPerspective
aspect is at most time specified as width/height, but it need not
to be like that.

Now,with all the equations ready, by some substitutions we can reformulate
(u,v) coordinate:

/u=(ctg(fovy/2)/aspect)(-Xe/Ze)(width/2)+width/2+x0 … (6)
\v=ctg(fovy/2) … (-Ye/Ze)(height/2)+height/2+y0

3.Wrap them up

With the analysis, things get more and more clear. What remains to be
done is the comparision of equation 2 and 6.

You maybe notice that there is a negative sign in the equation 6. That’s
not surprising because OpenGL use a left-handed screen coordinate while
the eye coordinate system is right-handed.

Neglecting the negative sign,we can give out the correspondence:

/u0= width/2+x0
\v0=height/2+y0 … (7)

\f … = ctg(fovy/2)*height/2 … (8)

For some specical cases, we can make some simplication about the correspondent

this is also the default behavior of OpenGL viewport transformation,
Equation 7 will be simplied as:

/u0= width/2
\v0= height/2

You probably feels familar with this case because in Computer Vision(CV) we
often presume the optical center of the camera is at the center of the image.

This means the image to be analyzed in CV is isotropic, the pixel sizes in
horizontal and vertical direction are the same.

Then equation 8.1 comes into f=(ctg(fovy/2)width)/(2aspect), compare it
with 8.2, we can derive aspect=width/height. Oh, this is most-frequently-used
setting when using the gluPerspective!

*(because it seems that the posting system doesn’t work well with the multiple
blank spaces, I have to tweak the equations using dots. Hope it looks better
:frowning: )

[This message has been edited by inet (edited 08-23-2000).]

[This message has been edited by inet (edited 08-23-2000).]

I’m sorry about the format of the last post from me. But I really don’t know how I can make them better.(Though I have a version with decent feel-of-look in my own computer).

If you’re interested in the content and confused with the format, you can just email me.(

Any suggestions and comments are welcome about this essay.

This is exactly what I was looking for!
Thank yoy very much!
/Per Åstrand

[This message has been edited by pas (edited 08-23-2000).]

You can use UBB - codes to format your source can’t type it in here, go to edit/delete message:

class Acme
Acme(int _a) { a = _a; }

int a;

y-e-r… you just described the classic pinhole camera model. It’s not the ONLY model used in computer vision, you know.

check out:

“An Efficient and Accurate Camera Calibration Technique for 3D Machine
Vision”, Roger Y. Tsai, Proceedings of IEEE Conference on Computer Vision
and Pattern Recognition, Miami Beach, FL, 1986, pages 364-374.


“A versatile Camera Calibration Technique for High-Accuracy 3D Machine
Vision Metrology Using Off-the-Shelf TV Cameras and Lenses”, Roger Y. Tsai,
IEEE Journal of Robotics and Automation, Vol. RA-3, No. 4, August 1987,
pages 323-344.

straight from tsai’s calibration distribution:

1 - What is Tsai’s camera model?

Tsai’s camera model is based on the pin-hole model of 3D-2D perspective
projection with 1st order radial lens distortion. The model has 11
parameters: five internal (also called intrinsic or interior) parameters:

    f      - effective focal length of the pin-hole camera,
    kappa1 - 1st order radial lens distortion coefficient,
    Cx, Cy - coordinates of center of radial lens distortion -and-
             the piercing point of the camera coordinate frame's
             Z axis with the camera's sensor plane,
    sx     - scale factor to account for any uncertainty in the
             framegrabber's resampling of the horizontal scanline.

and six external (also called extrinsic or exterior) parameters:

    Rx, Ry, Rz - rotation angles for the transform between the
                 world and camera coordinate frames,
    Tx, Ty, Tz - translational components for the transform between the
                 world and camera coordinate frames.

In addition to the 11 variable camera parameters Tsai’s model has six fixed
intrinsic camera constants:

    Ncx - number of sensor elements in camera's x direction (in sels),
    Nfx - number of pixels in frame grabber's x direction (in pixels),
    dx  - X dimension of camera's sensor element (in mm/sel),
    dy  - Y dimension of camera's sensor element (in mm/sel),
    dpx - effective X dimension of pixel in frame grabber (in mm/pixel), and
    dpy - effective Y dimension of pixel in frame grabber (in mm/pixel).

and another thing: some computer vision research uses the affine camera model.


[This message has been edited by john (edited 08-23-2000).]

Thanks for your comments.

Anyway, you “can” use the more complex
camera model, but the method of derivation is
the same.

The Tsai’s camera model in essence considers
the 1st order distortion of the lens.
As for the affine camera, it only adds one
skewness parameter.


sorry to disturb you but I’m into an augmented reality project applied to medical stuff. Now,
In all the litterature I’ve found I can always see the same model (pinhole) from where they derive all the intrinsic and extrinsic paramenters. They derive the projection matrix and moving the object in the scene the overlay the virtual object. but,
what if I move the camera!??! I’ve never understand if I can or not moving the camera as in all the works I found the camera is steady.


Intrinsic camera parameters do not change when moving camera. They belong to the “inside” of the camera.

Extrinsic parameters change when moving camera. They belong to the “ouside” of the camera, ie world position and world rotation.

So when moving camera, you have to update extrinsic params.