Recovering a 3D cube from its 2D projection.

Bulhakov · May 15, 2004, 10:52am

Can anyone help me with the algorithm or equations needed for my problem:

I start with a photo of a wooden cube on a flat surface. Through various image processing I am able to find out the 2D coordinates of the cube’s 6 outside vertices (assuming the 7th visible one cannot be recovered due to textures on the cube). Now from this set of 6 points and knowing the length of the cube’s side, I’m to construct a 3D model of the photo (basically a mesh of the cube).

I understand I have to somehow find the coordinates of the camera respective to the cube, but I am at a loss with the math.

I know I will come up with two possible results (as I can’t tell which of the two vertices not in the outline is closer to the camera), but that is not a problem.

Krisztian · May 18, 2004, 10:41pm

O.K., this is a great question, and if you have any other work you do with this, please let me know, I am interested. So far, you have the fact that the object has no left or right rotation, so basically not only the 7th point is covered by the texture but also the 7th absolute point |-7|, if there is a shift at the bottom, than it’s obvious that |-7| is at least true, otherwise it can be dropped as a texture error with good certainty. One idea is if it’s layered like Canoma, than object referencing can tell you whehter it’s close to another solid object allowing you to use a lighting algorithm to determine if it’s a left or right shift. If it’s in free space and a single object, you can use a textured ligthing algorithm which is as follows. The abs value -7 left or 7 right in reference to the distance of 2 points from the closest two points would lead to an absolute average in up and down or vertical axis. The idea is, that if that determines the answer to the problem it’s the same as looking through a wireframe cube with quad view and it’s always shifted one way or another, which is why most people stare at a cube for a long time trying to figure out where the opening is . So its a 50/50 percent correct answer but it’s absolutely logically correct. Now you can guess why that’s good for photographs. Since you can always tell what’s relative left or right, you can pick what 50/50 side is the correct translation. except, the same problem will become noticeable when the software reaches another mirrored layer. Once again layering is good since you can re-process after visual inspection if it’s used for very accurate work.

One possible problem I just realized. If you write this function, try to stay with layered logical response and if not inspected allow the process to go all the way to the end giving you mirrored errors. The reason for this is that if you have a black background in a complete image with free floating objects, than a separate function for regions can solve the logical error but if the rest of the process finishes, than if a logical cleanup can determine an average lighting value and correct the object rotations, than separate use of the correcting function will render the resulting objects of those functions useless and the free floating regions could be inverted in various directions not giving a proper translation of the image. So both functions should exist, but leave the single use function out of batch processing for that reason. That way the single function is open for use any way you want.

Bulhakov · May 19, 2004, 6:31am

Thank you for your long response. It might have been a lot more helpful if both my English and CG skills were a bit better.
The problem of which vertice is visible, as I said before, is not a problem. I will have to perform the computations for several objects laying next to eachother, and I can assume I know the approximate location of the camera (above the table, to the left of the objects) and that the backs of all objects lie on one plane.
My problem is with calculating the point coordinates in 3d, knowing only their x,y coordinates in 2d, and how they should be related in space (as vertices of a cube, with a known edge length, the camera focus distance is also known). My level of math allows me to understand the math needed to draw a mesh of a cube (I haven’t learned shading yet) and to the rotations/translations through basic matrix transforms. However I have an extreme problem with making reverse calculations from the given data. I would be grateful for further help and advice.

Mega · May 20, 2004, 11:31pm

if you always have seven visible points - you’re lucky. the convex hull of these points consists of six segments, connecting six points. these segments are images of cube edges (one can prove that). the seventh point is an image of the cube vertex closest to the viewer. now we can draw another three edges from central point to peripheral points. there are two ways to draw these three edges - fisrt from center (0) to peripheral points (1,3,5), second from center to points (2,4,6). one of these ways is incorrect. to detect the correct one, we must find three points of “ray convergence” (there is a theorem that a perspective projection could have up to three points of convergence). these points lay on cube’s coordinate axes. coordinate axes of a cube are those three edges drawn by one of those two ways.

ok - lets go numeric. consider your convex hull points numbered 1…6, center point numbered 0. assume first try to use 01,03,05 segments to be cube axes. then lines containing segments 12 and 03 and 54 must intersect at first convergence point, 34 and 05 and 16 - at the second point, 23 and 10 and 65 - ath the third. if that is not true, then our attemp was incorrect and the correct axes are 02,04,06.

then, you must compute converging edges lengths ratios. these ratios give you angles between cube facets and viewing plane. then, by substituting known cube vertices’ coordinates into projection matrix equations and by decomposing them you form a set of solvable (hope so) linear equations. solution of these equations will be a rows (or columns) of the projection matrix.

anyway, after identifying cube axes the task become simple.