No need for fancy camera position, just leave it to 0,0,0.
Then estimate the field of view of the digital camera : take picture of a plane grid with know measurements, parallel to the lens, at a precise distance. Then in OpenGL render a virtual grid with same known dimensions, and blend it with the picture taken by real camera. Adjust fov until match is good enough.
Afterward, you “only” have to apply rotation the virtual camera, and place the objects at the correct positions.
One important step but more complex is to “undistort” the camera picture, as often straight lines are captured curved.
There is quite a lot of literature on the subject, search for “intrinsic camera parameters”, “augmented reality”, etc.
Then there is a long list of features to attain perfect camera matching, which ones do you need ?
- manual projection/distortion match
- automatic projection/distortion match
- gl graphics can be hidden behind real objects
- gl graphics can cast shadows on real objects
- gl graphics receive cast shadows from real objects
- color match between gl/real
- add realistic color noise + bayer effects + blur on gl graphics
- add motion blur on gl