how to reconstruct depth from texture?

basteagui · January 11, 2016, 12:36pm

Hello Everyone.

i hope you can help me understand this issue;
i really have no clue of how to do this.
i am working on monogame/xna

i have browsed tons of articles on this and still it’s not more clear than before. i am not even completely sure of what the W in the xyzw components does, nor do i know what clip space even is… it seems i am way over my head but that’s never stopped me before.
if you could help me understand what to do here?

i know there is more than one way to do this; if i could understand just one way to do it that would set me in the right path i think

i thank you guys in advance, and sorry if i post in the wrong place

GClements · January 11, 2016, 5:43pm

Clip coordinates are homogeneous (i.e. they have an extra W component). Homogeneous coordinates are converted to Euclidean coordinates by dividing by W: (x,y,z,w) -> (x/w,y/w,z/w). Homogeneous coordinates are used because they allow translation and perspective projection to be represented as linear (matrix) transformations: multiplying the W component by a given factor is equivalent to dividing all of the other components by that factor.

Clip coordinates are converted to normalised device coordinates (NDC) by dividing by W. Both represent the same coordinate system, but clip coordinates are homogeneous while NDC are Euclidean. The normalised X and Y coordinates are affine to window X and Y coordinates: conversion between the two (via the viewport transformation) involves only scaling and translation. Similarly, the normalised Z coordinate is affine to the depth values stored in the depth buffer and used for depth tests; translation between the two is based upon the depth range set by glDepthRange(); by default, a Z value of -1 in NDC maps to a depth value of 0 while a Z value of 1 maps to a depth value of 1.

Note that conversion from homogeneous coordinates loses some information (4 values become 3), but that doesn’t actually matter. The point of homogeneous coordinates is that multiplying all components by the same factor doesn’t have any significance: you’re only interested in the ratios.

If you’re using a “typical” perspective transformation matrix which looks like


[?  ?  ?  ?]
[?  ?  ?  ?]
[0  0  A  B]
[0  0 -1  0]

then eye-space coordinates of [Xe,Ye,Ze,1] get transformed to clip-space coordinates [Xc,Yc,Zc,Wc] where

Zc = A*Ze+B
Wc = -Ze

and these get transformed to normalised device coordinates [Xn,Yn,Zn] where

Zn = Zc/Wc
= (A*Ze+B)/(-Ze)
= -A-B/Ze.

Finally, normalised Z gets converted to depth as

depth = (Zn*(Dfar-Dnear) + (Dfar+Dnear))/2;

If Dnear=0 and Dfar=1 (the default), this becomes:

depth = (Zn+1)/2

So, if you have a depth value and want to convert it back to eye-space Z:

Zn = 2depth-1
Ze = -B/(A+Zn)
= -B/(A+2depth-1)

basteagui · February 9, 2016, 5:45am

thank you for the response my friend and i am sorry for the late reply, in fact i saw your explanation soon after you posted but i could not make heads or tails of anything you meant so i had to do some reading which left me only more confused, though finally i could read your formulas after i learned the camera matrix and knew that a and b are derived from the near and far plane- but i didn’t understand much more

then i watched a very lengthy set of lectures from the university of california on this subject here

then after weeks and weeks, and after pondering, drawing it… until the idea is intuitive and finally as last step looking at the actual math - it’s second nature to me now. thanks for providing some explanation, in fact i am now going to try and reconstruct position from storing linear depth and will try inverting the backbuffer-

much appreciated!!