True Camera via OpenGL4.5

Gentlemen, I want to share the new (at least to my knowledge) technique with you. :slight_smile:

As we know, rendering large scenes had always been tricky: if near clipping plane set too close, depth-fighting for far objects becomes noticeable; if zNear is pushed away, then objects at near are partially culled. There are ways to work it around, but they all come at some expense.

But now, with ARB_clip_control in OpenGL4.5 core we can finally simulate a camera with truly real properties: with no far clipping plane (infinite drawing distance) and very small zNear. It will let us fit a simulated camera object in an any tiny hole in the scene and let it capture the whole world around without any artifacts:
On the picture the far mountains are ~60km away while the camera is sitting almost at the ground level and has only 1mm distance to the near clipping plane. The demo project can be downloaded here.

So here is the way to make those drawing properties possible.
We need a floating point depth buffer, therefore we need a FBO and can no longer render directly to the window (but who does nowadays, right?).
Then we need to change the mapping of depth values from default [-1…1] to required [0…1]:


By doing so we keep the Zndc coordinate in [-1, 1] range from being mapped to the [0, 1] range in window space. This is crucial, because scaling by 0.5 and adding 0.5 causes loss of precision for any values of near-zero ranges; therefore, 1e-10 and 2e-10 will both result in the same value in depth buffer. As we will see later on, those tiny values will form the majority of the depth buffer contents, so we need to tolerate them.

Now the core of the idea: the projection matrix. It has to be constructed in some unusual way:

    | f/aspect  0      0      0    |
    |                              |
    |    0      f      0      0    |
P = |                              |
    |    0      0      0    zNear  |
    |                              |
    |    0      0     -1      0    |

f = ctan(ViewAngleVertical/2)
aspect = Viewport.x / Viewport.y;
zNear: distance to the near clipping plane.

Such projection matrix results in a reversed depth range: the further the object, the smaller the depth values of it’s fragments. For most objects of the scene the gl_FragCoord.z will look like XXXe-YYY. But as long as the number fits into the range the float number can represent - we still have 23 bits of mantissa’s precision (before underflow). Therefore, the depth testing has to be set like this:


It should be mentioned, though, that the precision changes with distance, therefore the depth values of adjacent surfaces may become equal at a certain distance. But this error increases linearly with the distance, just like level-of-detalization for objects should, ideally. In any case, there are no doubts that the technique has superior advantage over the conventional camera setup and can compete with w-buffer technique of DirectX, being an OpenGL alternative to that.

By doing so we make our gl_FragDepth written to the depth buffer as it is, without any remapping performed. It is crucial, because default scaling by 0.5 and adding 0.5 causes loss of precision for any gl_FragDepth values of near-zero ranges

There is a mistake in your terminology here. gl_FragDepth is the fragment shader output that defines the fragment’s depth component. It doesn’t get scaled. The range of gl_FragDepth is [0, 1], and always has been.

The particular scaling/offset you’re trying to avoid with glClipControl happens in the transformation from clip-space to NDC space, and the later transformation from NDC to window space. These happen per-vertex, not per-fragment.

Changed to gl_FragCoord.z. Thank you, Alfonse!

That’s also technically still wrong, at least the first time you mention it. The post-clipping transform (and accompanying loss of precision) all happen before the generation of fragments.

So you would need to change the earlier paragraph to something along these lines:

By doing so we keep the vertex’s Z coordinate from being mapped to the [-1, 1] range, then back to the [0, 1] range in window space. This is crucial, because scaling by 0.5 and adding 0.5 causes loss of precision for any values of near-zero ranges; therefore, 1e-10 and 2e-10 will both result in the same value in depth buffer. As we will see later on, those tiny values will form the majority of the depth buffer contents, so we need to tolerate them.

The rest however is fine.

Hm, thanks, but I will change it a bit, because vertex’ Z is not mapped to [-1…1] range, actually…

That’s not quite how it works. From the ARB_clip_control specification:

    "Primitives are clipped to the clip volume. In clip coordinates,
    the view volume is defined by

        -w_c <= x_c <= w_c
        -w_c <= y_c <= w_c
          zm <= z_c <= w_c
     where zm is -w_c when the clip control depth mode is
     NEGATIVE_ONE_TO_ONE or zero when the mode is ZERO_TO_ONE."

So in clip-space, if you use ZERO_TO_ONE, then clipZ is clipped to the range [0, clipW]. So when you perform the transform to NDC space by dividing by clipW, you get an NDC_z value on the range [0, 1]. So by changing the clip control, you’re changing the NDC-space range for Z.

And here’s the transform from NDC space to window space, from the ARB_clip_control spec:

    "The vertex's window coordinates, (x_w y_w z_w)^T are given by:

        ( x_w )     ( p_x/2 x_d + o_x )
        ( y_w )  =  ( p_y/2 y_d + o_y )
        ( z_w )     (     s z_d + b   )

    where s is (f-n)/2 and b is (n+f)/2 when the clip control depth mode
    is NEGATIVE_ONE_TO_ONE; or s is (f-n) and b is n when the mode
    is ZERO_TO_ONE."

Notice that the s and b values when using ZERO_TO_ONE are designed to map only the [0, 1] range of NDC values into the depth range. Whereas when using NEGATIVE_ONE_TO_ONE, this maps [-1, 1] into the depth range.

So clip control very much prevents the NDC-space Z from being in the [0, 1] range. The whole point of this method is so that the range of Z values are always on the range [0, X], no matter what space they are in. This is why you can’t just use shader logic to fix the problem; because the window-space transform (which is hard-coded) would always transform from a [-1, 1] NDC depth space.

Alfonse, no matter how precise we call it, the algorithm will still be the same. The technique is already proven to be working. :slight_smile:
BTW, if you like it, feel free to make your own publication on the web (just don’t forget to mention me :slight_smile: ). There are no math in my article - just a brief explanation. Would be interesting to see the capabilities of FP depth buffer in this setup - how exactly the precision error depends on zNear and distance?
And what do you think, does this technique have a potential to become popular camera’s setup? Like before with gluSetupPerspective, where both zNear and zFar had to be set?

I think you’re misunderstanding this conversation. I’m not claiming that your description of the results of the algorithm is in error. I’m claiming that your description of why it works is in error. Your “brief explanation” gives people the wrong idea of why it works.

If you don’t want to give an explanation, if you just want to say, “Do this, it makes your scene better”, that’s fine. But if you are going to explain why it does, then you need to get that explanation right.

And yours isn’t. It’s close; you can get the general gist of it. But you make statements that simply aren’t correct.

I always feel that it’s better to provide no information at all than to provide inaccurate information. For example, this site explains the issue in detail and correctly. It clearly states that OpenGL’s problem is NDC-space being [-1, 1], which puts the near plane at 1. Floats near 1 can’t use their exponential precision, which they could if the near values were at 0 instead. Which is what clip control does.

It has its advantages, though your method does conflate two orthogonal issues. Namely, ARB_clip_control-based depth buffers and an infinite far plane. You can use the same infinite far plane math with standard OpenGL clip/NDC/window space transforms and math. The results simply won’t look as good, due to the lack of precision. And similarly, you can use a far plane with ZERO_TO_ONE and floating-point depth buffers.

I can certainly see the ZERO_TO_ONE part becoming a de-facto standard (and due to the obvious advantages and no disadvantages, I see no reason why Vulkan would even allow you to use the OpenGL way). However, there are a lot of legacy code and tutorials out there that describe the old OpenGL way of handling things. So there’s going to be a lot of programmers “in the know” who will do the right thing, and there will be many who haven’t encountered the technique.

And then there will be those using hardware that could handle it, but doesn’t have up-to-date drivers anymore. Nothing much can be done about them.

Well, it would effectively mandate floating-point depth buffers (the whole scheme falls apart with fixed-point). Or logarithmic depth (which is basically floating point without the mantissa).

Maybe hardware is now at the point where that’s a non-issue; I don’t know.

D3D always worked with ZERO_TO_ONE, and it didn’t have any problems with fixed point depth buffers and computation. To be sure, it has no advantages compared to OpenGL’s [-1, 1] when using fixed-point. But it also has no disadvantages.

That’s probably why OpenGL used the [-1, 1] range to begin with. Early GL was designed with the expectation that depth computations and buffers would use fixed-point numbers. And fixed-point numbers have a linear distribution across a particular range, so one range is functionally no different from another. So they picked the range that was consistent with the X and Y ranges for NDC space.

It only becomes a problem when you’re using true floating-point values, since you lose the usefulness of the exponent at the near plane.