simulate depth sensor

i have to simulate a depth sensor (RGB + depth).
as of the moment i have to use opengl2.

to do that i have to get the z buffer and the rgb-image.

as far as i found out i can get the z-buffer’s values with:

glReadPixels(0, 0, w, h, GL_DEPTH_COMPONENT, GL_FLOAT, depht_buffer);

my questions:

  1. is my method correct?
  2. is there a better way to get the z-buffer?
  3. as far as i know the z-buffer’s values are non-linear in [0, 1] with 0==clipNear and 1==clipFar.
    how can i transform the z-buffer’s values to a linear scale? (e.g. a value delta of 0.1 is the same distance independent of the corresponding values.)

thanks for any help.

/edit: fixed an error


Not if you’re limited to OpenGL 2 (and don’t have the ARB_framebuffer_object extension).

If the clip-space coordinates don’t have constant W, the depth-buffer values will be non-linear with respect to eye-space Z. For a perspective projection, clip-space W is typically equal to eye-space -Z.

The values in the depth buffer are equal to (Z/W+1)/2, where Z and W are the clip-space Z and W, linearly interpolated across the primitive.

If you know how both clip-space Z and W relate to eye-space Z (i.e. you know the projection matrix and you also know that eye-space W is always 1), then you can obtain eye-space Z from the depth values.

For a projection matrix of the form

[ ?  ?  ?  ? ]
[ ?  ?  ?  ? ]
[ 0  0  A  B ]
[ 0  0 -1  0 ]

and assuming Weye = 1, you have
Zclip = A*Zeye + B
Wclip = -Zeye

Conversion to NDC involves dividing clip-space X,Y,Z by W:
Zndc = Zclip/Wclip
= (A*Zeye+B)/(-Zeye)
= -(A+B/Zeye)

Finally, NDC Z (which lies in the range -1 to +1) is converted to depth (in the range 0 to 1) by
depth = (Zndc+1)/2
= (1-A-B/Zeye)/2

(Actually, it can be slightly more complex than that if the depth range is changed with glDepthRange()).

Solving for Zeye gives
depth = (1-A-B/Zeye)/2
2depth = 1-A-B/Zeye
B/Zeye = 1-A-2
Zeye = B/(1-A-2*depth)

If the matrix is a typical perspective matrix (generated with e.g. gluPerspective(), glFrustum(), or equivalent), then
A = (zNear+zFar)/(zNear-zFar)
B = 2zNearzFar/(zNear-zFar)

In which case, the equation for Zeye becomes
Zeye = -zFarzNear/(depthzNear + (1-depth)zFar);
Zeye = -zFar
zNear/(zFar + depth*(zNear - zFar));

Note that typically A and B are both negative, and Zeye is negative for anything that’s visible, with Zeye becoming increasingly negative as you get farther from the viewpoint (i.e. +Zeye is pointing out of the screen).

first thanks for your help.

with your equation for Zeye i was able to implement the depth cam and it worked good enough (since it was used on small distances and an error was assumed).

now the depth data is used to generate a point cloud and is used for simultaneous localization and mapping.

when we tried this we noticed flat surfaces are curved. (since Zeye is not the distance beween camera and object)

how i understood it:
in cam coord

  cam<---  -Zeye --->|   ^
     |  \            |   |
     |      \        |  missing part for distance
     |    distance   |   |
     |           \   |   v
     |            -->object

  1. am i correct with my assumption of Zeye unequal distance(cam,obj) ?
  2. if that’s the case i need Xeye and Yeye to calculate the distance. How can i get them?
  3. is there a better way to calculate the distance between the camera and an object.

thanks again for any help.

As your diagram suggests, Zeye is the distance in front of the viewpoint, not the distance from the viewpoint. The distance from the viewpoint is the magnitude of the eye-space position vector (Xeye,Yeye,Zeye).

They can be determined from the window coordinates (i.e. the row and column indices within the data returned by glReadPixels).

Given window coordinates (Xwin,Ywin), where (0,0) is the lower-left corner of the window, (width,height) is the upper-left corner and (0.5,0,5) is the centre of the lower-left pixel, the conversion from NDC to window coordinates is given by the viewport transformation, set with glViewport(Xv,Yv,Wv,Hv) (the first time a context is bound to a window, the viewport is automatically set to cover the entire window as if by glViewport(0,0,width,height); thereafter, the viewport must be set explicitly if the window is resized).

The conversion from NDC to window coordinates is

Xwin = (Xndc + 1)(Wv/2) + Xv
Ywin = (Yndc + 1)
(Hv/2) + Yv

so the inverse conversion (from window coordinates to NDC) is

Xndc = (Xwin-Xv)(2/Wv)-1
Yndc = (Ywin-Yv)

Conversion from clip coordinates to NDC involves division by W, i.e.

Xndc = Xclip/Wclip
Yndc = Yclip/Wclip
Zndc = Zclip/Wclip

so the inverse is

Xclip = Xndc * Wclip
Yclip = Yndc * Wclip
Zclip = Zndc * Wclip

For a perspective transformation, Wclip is proportional to Zeye, which you’ve already calculated. Typically, it’s equal to -Zeye. If you’ve managed to calculate Zeye correctly from Zndc, then you must already know Wclip.

Conversion from eye coordinates to clip coordinates is given by the projection matrix. In the general case, you would need to invert that matrix. But a projection matrix generated by gluPerspective() or glFrustum() always has the form

[Sx  0 Kx 0]
[ 0 Sy Ky 0]
[ ?  ?  ? ?]
[ ?  ?  ? ?]

so the conversion is

Xclip = Sx * Xeye + Kx * Zeye
Yclip = Sy * Yeye + Ky * Zeye

and the inverse is

Xeye = (Xclip - Kx * Zeye) / Sx
Yeye = (Yclip - Ky * Zeye) / Sy