Tutorial Proofreading and Correcting.

Alfonse, I agree with your rationale for postponing the discussion of VAOs from chapter one until a bit later. (I knew I was taking a risk commenting before I finished reading everything you wrote. Sure enough, you had already covered the issue I raised.)

Dan, you provided a lot of information that really helps explains things. As it turns out, I do have an ATI graphics card in my computer. I have the ATI beta release driver that first supported OpenGL 4.0 installed. However, my OpenGL 4.0 core profile programs that didn’t have any VAO executed without any errors being raised, but all I got was a blank window. Incredibly frustrating. My guess now is that other tutorial authors must have had NVIDIA graphics cards in their computers, and consequently their demos have worked on their computers even though they haven’t had any explicitly created VAOs, or they didn’t restrict their context to core functionality and basically got away with unwittingly writing faulty code.

Thanks for the helpful insights.

Strange, I have a Nvidia (crappy Ge 9300M) and ATI (crappy Radeon HD5450), but VAOs worked out of the box on both. It’s a good thing to have 2 video cards to play with and compare.

Yeah, VAOs work fine on both, it’s just when you try to use the default VAO in a core profile where the difference occurs:

If you try the following code in a core profile:

glBindVertexArray(0);
glBindBuffer(GL_ARRAY_BUFFER, bufferID);
glVertexAttribPointer(index, size, type, normalized, stride, &offset);
glDrawArrays(GL_TRIANGLES, first, count);

then ATI throws a GL_INVALID_OPERATION error after the glVertexAttribPointer + glDrawArrays calls since no VAO is bound (as mentioned in appendix E of spec), but NVidia doesn’t throw this error (at least the last time I tested).
People developing on NVidia might write code/demos that use the default VAO in a core profile that works when testing on NVidia but when it comes to running on ATI it fails.

Alfonse’s tutorial already shows how to get it to work on both ATI + NVidia, by including:

glGenVertexArrays(1, &vao);
glBindVertexArray(vao);

Dan is right, of course.

To set the record straight, I had stated previously that ATI didn’t throw any error even though I didn’t have any VAO bound. Turns out I screwed that check up. I called glGetError() only after I had released my rendering context at the very end of my test programs – since the OpenGL error status is sticky, I figured if there was no error at the end of the program, there was no error anywhere before. I didn’t realize that the OpenGL error status is part of the rendering context, too.

Anyway, I just moved my glGetError() call to before the point where I release my rendering context and sure enough, ATI had thrown the error all along just as the spec requires. So, I created a lot of the frustration I suffered for the past two weeks. If I had properly checked for errors, I would have quickly narrowed my problem down to my call to glVertexAttribPointer(). What was wrong still wouldn’t have been obvious to me, but at least I would have been looking narrowly at the right location.

I like to use this macro in my code:


#ifndef NDEBUG
# define GL_DEBUG(x)\
  do {\
    GLenum error(glGetError());\
    if (GL_NO_ERROR != error)\
    {\
      std::cerr << gl_error_string(error) << std::endl;\
      BOOST_ASSERT(0);\
    }\
  } while(0)
#else
# define GL_DEBUG(x)
#endif // NDEBUG

GLubyte const* gl_error_string(GLenum error)
{
  return gluErrorString(error);
}

Why not use glIntercept? You don’t have to ugly up your code or anything, and you can simply use it periodically as needed.

Was scanning random bits of this. Good stuff. Great to have a tutorial geared toward GL4 to point folks too.

One thing I just passed on the Depth Clamping page…:

If you’re wondering what happens when you have depth clamping and a clip-space W <= 0 and you don’t do clipping, then… well, OpenGL doesn’t say. At least, it doesn’t say specifically. All it says is that clipping against the near and far planes stops and that fragment depth values generated outside of the expected depth range are clamped to that range. No, really, that’s all it says.

The wording I don’t think is quite right here and is a bit confusing. Since I understand and have implemented homogenous clipping, I suspect this might confuse others. Might deserve a tweak.

For reference, Homogenous clipping is:

-w <= x,y,z <= w

The above verbage is talking about the case where:

clip.w <= 0

Geometrically, what is this? Not obvious. Well, for perspective projections, clip.w = -eye.z (assuming eye.w = 1), so plugging in, we have:

eye.z >= 0

So this is (somewhat cryptically) talking about the class of all points at or “behind” the plane of the eye. Why is the verbage singling out this case? Not obvious.

However, L/R/B/T clipping still occur, so (for perspective projections only! which is what you’re assuming here) the points behind the eye will be culled by the L/R/B/T clipping planes. So the handling of these points is specified. The main class of points we add here with depth clamping are those in the pyramid formed by the eyepoint and the near plane corners, as well as those beyond the far clip still within the L/R/B/T planes. And OpenGL is clear about that clipping behavior.

Now what about for orthographic projections. That’s where the tutorial verbage above doesn’t make sense. For ortho, clip.w = 1. So there are no cases where clip.w <= 0. The verbage is geared toward some derived property of the perspective projection, and is kinda confusing anyway why it’s doing that.

Think it’s probably simpler to just say that homogenous clipping goes from:

-w <= x,y,z <= w

to just:

-w <= x,y <= w

when depth clamp is enabled, with Z values just being clamped after the x/y clip.

And if you continue to mention this special case, you might caveat that you’re talking about points behind the eye with a perspective projection, rather than just say cryptically clip-space W <= 0.

This brings up what might be another valuable, but rarely presented, chapter to add to your OpenGL tutorial: tips on how to debug OpenGL programs.

A couple years ago I dabbled with OpenGL a bit and also experienced considerable trouble getting beyond a blank window due to some error of mine from not really understanding how OpenGL worked. Once data enters the blackbox of the GL, it seems pretty hard to debug when some things go wrong (especially when the resulting window is blank). Anyway, I’m sure you’ve learned a number of useful techniques over the years, such as using glIntercept, that help you debug your OpenGL programs. It could be very helpful for people learning OpenGL to read some discussion of debugging techniques.

… Homogenous clipping is:

-w <= x,y,z <= w[/QUOTE]
A slightly different take on this that occurred to me (and doesn’t require any geometric intuition). When clip.w < 0, there are no points that satisfy the above relation. This is true whether or not z is clipped or not. So these points are always clipped. When clip.w == 0, the perspective divide ill-defined so that should be clipped as well.

So clip.w > 0 is really the only class of points that stand a shot at not being clipped. This is always the case in ortho (clip.w == 1), but depends on eye.z for perspective (clip.w = -eye.z).

I would present the perspective and clipping section differently, though it is debatable if it is better.
I would just repeat what the GL specification says on clipping:
-w <= x,y,z <= w where gl_Position=vec4(x,y,z,w). From there, talk about projection matrices and as an example give the typical frustum and ortho matrices. From there, one can talk about the “w” divide and another key feature: the affect of the “w” values on interpolation of out’s of a vertex (or geometry) shader. A nice discussion with an example of using linear interpolation vs perspective correct interpolation I think would be invaluable to new comers a great deal. Along the same lines, a nice explanation of why z-clip is interpolated linearly but vertex shader out’s are (default) interpolated perspective correct.

On the discussion of frustum perspective matrices, the use of the ratio [near:far] might lead a new comer to believe that the ratio of the 2 is important, where as the real beans in how small zNear is… you do address this but perhaps take the discussion from a different point of view: given zNear, zFar, and given z in that range, calculate delta_z (as a function of z, zNear and zFar) where delta_z is the smallest delta on z to trigger a different value on 24-bit depth buffer. That I think would be quite useful for all new comers to see, and it also exposes them early to letting zFar=infinity as well (Ahhh, the days when DEPTH_CLAMP was only NV_DEPTH_CLAMP). That discussion also nicely leads to polygon offset to do decals.

Lastly, for the Unix users, maybe make a unix makefile too, with the assumption that the Unix system has FreeImage, GIL and some version of GLUT. FreeGLUT has glutGetProcAddress so in theory all the code could be platform agnostic… in theory…

Don’t know if this is a bad or good idea: use #version 330 in the shaders.

All in all, I am glad to see a new OpenGL tutorial popup, especially one that is GL3+ centric. Keep it up and I look forward to reading it more!

P.S. A wish list of topics that have few, if any tutorials:
Uniform Buffer Objects
Framebuffer Objects Layered Rendering (for example drawing to all 6 faces of a cubemap in one draw call, this will then imply a geometry shader write up…)
GL3’s instance drawing API
Transform Feedback

the affect of the “w” values on interpolation of out’s of a vertex (or geometry) shader.

Originally, I intended to hold off the discussion of perspective-correct interpolation until the tutorials involving textures. The examples are usually more clear when you can see the effect of switching it on and off with textures. However, after thinking about what you said, I’ve decided to move it up to the first lighting tutorial. It’s the first place where there is a physically-based need to have perspective-correct interpolation of values.

I don’t think it’s good to discuss perspective-correct interpolation in the perspective projection tutorial. The perspective tutorial is pretty dense as is, and any examples would be fairly artificial.

On the discussion of frustum perspective matrices, the use of the ratio [near:far] might lead a new comer to believe that the ratio of the 2 is important, where as the real beans in how small zNear is…

But it is based on the ratio. 1:100 is no different than 10:1000, in terms of how close something must be (relative to the near/far planes) before you run out of precision. Increase both near and far by a factor of 10, and you increase that value by a factor of 10 as well. It is only when you keep the far plane fixed that the near plane’s absolute value is important.

I think that what needs emphasizing more than I do is that clipping is a real problem for moving the near/far planes around. Since I talk about the ratio before actually talking about clipping, it creates an issue with the ordering of things. The far plane determines the absolute maximum extent of what you can see, and it is the most likely to be limited by absolute need. So the question becomes how big of a near plane you can choose and live with, given the far plane you have to live with.

I think I can move things around a bit to make this work better.

Lastly, for the Unix users, maybe make a unix makefile too, with the assumption that the Unix system has FreeImage, GIL and some version of GLUT. FreeGLUT has glutGetProcAddress so in theory all the code could be platform agnostic… in theory…

All of the libraries I use are cross platform. Premake4 is a cross-platform build system. The only things standing in my way of just having the whole thing be cross-platform are:

1: My extension loading code, which I auto-generate with some Lua scripts, is Windows-only. The main issue there is with getting function pointers for core functions. With the Windows loading code, it simply includes the usual “gl/gl.h” and copies the function pointers when needed. This works for all of the 1.1 entrypoints. But I don’t know what version that the various X-Windows implementations work with, so I don’t know what their “gl/gl.h” uses. And I don’t know if glutGetProcAddress works for statically-linked entrypoints.

2: I don’t have a Linux install to test it on.

Admittedly, #2 is easier to solve than #1.

Don’t know if this is a bad or good idea: use #version 330 in the shaders.

I started these before GL 3.3, so I just kept using 3.2-centric code. I’ll update the tutorials to 3.3 when I feel that ATI and NVIDIA’s 3.3 drivers are sufficiently stable. There are still one or two issues I’ve seen about ATi’s 3.3 drivers that I want to see fixed before switching over to them. Maybe in a couple of months or so. At which point, I’ll also add text to introduce useful 3.3 functionality like attribute index assignment in the shaders and so forth.

A wish list of topics that have few, if any tutorials:

I can post my current outline of future topics and tutorials, if you want. There’s a partial outline up on BitBucket (which, btw, has without question the worst wiki software in the history of wikis). My full outline is in the Mercurial repository (though it also gets caught up in the distribution), in the documents directory.

I find that one of the most difficult things about planning tutorials is ordering. This is also one of the problems with NeHe’s tutorials; everything’s all over the place. I want each of my tutorials to properly build on knowledge from the previous ones.

The other problem is just finding an example that doesn’t look too artificial, so that it inspires people to use it for the right purposes. Indeed, that’s something I’m really going to push when I get into texturing. So many texturing tutorials and documents always talk about it in terms of images.

Take transform feedback for example. It’s easy enough to explain what it does. But how do you explain it in such a way that the reader understands when to use it? The most obvious usage of it I can think of is as a performance optimization for multi-pass rendering. That’s easy enough to describe, but talking about it requires having some need to talk about multi-pass rendering. So how would that come up?

I still haven’t finished reading everything, but I’ve got enough comments that I don’t want to wait to post them all at once.


Chapter 3. OpenGL’s Moving Triangle page, Moving the Vertices section:

“The cosf and sinf functions compute the cosine and sine respectively. It isn’t important to know exactly how these functions work, but they effectively compute a circle of radius 2. By multiplying by 0.5f, it shrinks the circle down to a radius of 1.”

You need to replace the word radius with diameter.


Chapter 4. Objects at Rest, Perspective Correction page, Mathematical Perspective section:

“A perspective projection essentially shifts vertices towards the eye, based on the location of that particular vertex. Vertices farther in Z from the front of the projection are shifted less than those closer to the eye.”

I believe you mean to say that vertices farther in Z (…) are shifted more than those closer to the eye. Or maybe I am not interpreting in the same way what you mean by the front of the projection. I’m assuming you just mean that vertices farther from the projection plane are shifted more than vertices nearer the projection plane; there’s no need to distinguish which side of the projection plane is being referenced.


Figure 4.6 shows the eye at the origin, but has the following caption:
“The projection of the point P onto the projection plane, located at the origin. R is the projection and E is the eye point.”

The projection plane isn’t located at the origin in the figure, however.

Beneath that, there is the text:
“What we have are two similar right triangles; the triangle formed by E, R and the origin; …”

But E (the eye) is at the origin, so there are only two distinct points.


Equation 4.1 is given as:
R = P * (Ez / Pz)

But if P is on the projection plane, then Pz is zero and equation 4.1 is indeterminate. I believe you need equation 4.1 to be something more like:
R = P * (Ez / (Ez + Pz))


Same chapter and page, The Perspective Divide section:
“You might notice that the scaling can be expressed as a division operation (dividing by the reciprocal). And you may recall that the difference between clip space and normalized device coordinate space is a division by the W coordinate. So instead of doing the divide in the shader, we can simply set the W coordinate of each vertex correctly and let the hardware handle it.”

I’ve got a few problems with this. First sentence: division is multiplying by the reciprocal, not dividing by the reciprocal. Third sentence, re: setting the W coordinate of each vertex correctly. I don’t know what you mean by “correctly” or by “letting the hardware handle it.” With shaders, isn’t it the programmer’s responsibility to make the hardware handle it?

It’s here that I have some issues basically surrounding the use and explanation of the W coordinate. At this time I need to think more about this because it’s intertwined with lots of things, so I’ll try to come back to it later. In the meantime, I’ll just point out that you have defined W coordinates in all your geometry passed to OpenGL, but that you never use those W coordinates in example 4.2 (ManualPerspective Vertex Shader); it appears that in Example 4.4 (MatrixPerspective Vertex Shader) you assume and require all W coordinates have a value of one (i.e., that the input geometry be in Euclidean coordinates), though there is nothing to verify or force the programmer to comply.

As background information, the way homogeneous coordinates are converted to Euclidean coordinates is by dividing all coordinates of any point by its W coordinate. As long as W is not zero, then W will be one after the conversion. Presumably, to allow homogeneous coordinate geometry is the reason that OpenGL allows the programmer to pass W coordinates to the OpenGL pipeline. Otherwise, a value of one could have been assigned to W within the OpenGL pipeline. (The world we live in consists of Euclidean geometry, so at some point geometry in the OpenGL pipeline must be represented as Euclidean geometry if we hope to represent reality.)

Basically, the perspective where I am coming from is to set the stage for eventually getting to an OpenGL 4.0 pipeline with all five shader stages in use. Using them to render rational geometry, such as rational Bezier curves and surfaces, and by extension NURBS curves and surfaces, requires that homogeneous coordinate geometry (i.e., W not all the same) be used. I know you’re nowhere near that at this stage in the tutorials (and might choose never to go that far), but beginning with OpenGL 4.0 and the use of tesselation shaders, homogeneous coordinate geometry will be important and descriptions of W and the transformations here ought not lead to contradictions or confusion later.

The projection plane isn’t located at the origin in the figure, however.

When I first started doing the writeup for the tutorial, I had the projection plane at the origin, for some reason. I got pretty far into the tutorial, until I couldn’t make the math work out. I thought I had fixed all of that, but apparently, I missed some.

First sentence: division is multiplying by the reciprocal, not dividing by the reciprocal.

No, I mean division by the reciprocal.

R = P * (Ez / Pz) is equivalent to R = P / (Pz / Ez). Assuming that Ez and Pz are non-zero.

I could probably make that clearer, though.

Third sentence, re: setting the W coordinate of each vertex correctly. I don’t know what you mean by “correctly” or by “letting the hardware handle it.”

From the perspective of simply performing the act of perspective projection, the shader is perfectly capable of outputting coordinates in NDC space, with a W of 1. However, since there is a division in the transform from clip-space to NDC space, we can “let the hardware handle it” by setting the W coordinate of each vertex “correctly” (as in, setting W to the value that, when divided by P will produce R. IE: Pz/Ez).

Remember: at this point, we’re not ready to talk about why this division step exists in hardware at all, when the shader is perfectly capable of doing it by itself. And we’re not ready to talk about the reasons for doing it, like homogeneous space clipping or perspective correct interpolation, let alone GL 4.0 homogeneous math.

But it is based on the ratio. 1:100 is no different than 10:1000, in terms of how close something must be (relative to the near/far planes) before you run out of precision.[/QUOTE]
I almost made this same comment yesterday too. Near is where it’s at. You can push far out to infinity, and allegedly you don’t lose that much precision in doing so with a perspective projection, despite the fact that the near:far ratio drops to 0. That’s why this technique was sometimes used for shadow volumes (that or depth clamp).

It makes some intuitive sense when you consider that the bulk of your precision is clustered up close to the near plane. The further you go out into the scene, the less precision is used there, to where pushing the far clip out to infinity doesn’t lose you that much.

I almost made this same comment yesterday too. Near is where it’s at. You can push far out to infinity, and allegedly you don’t lose that much precision in doing so with a perspective projection, despite the fact that the near:far ratio drops to 0. That’s why this technique was sometimes used for shadow volumes (that or depth clamp).

This is the exact reason why I suggested to do the calculation given zNear and zFar (infinity is ok) and z in that range (or really -z) to calculate how much z has to change to be picked up by a 24 bit depth buffer.

You can push far out to infinity, and allegedly you don’t lose that much precision in doing so with a perspective projection, despite the fact that the near:far ratio drops to 0.

I’m going to assume that by “zFar at infinity” you don’t actually mean plugging the floating-point +INF value into the equation. As doing so would cause both terms of the Z calculations to become 0. Instead, I’ll assume you just mean “really big”.

So the near:far ratio never reaches 0. Doubling the zNear is the same as halving the zFar, in terms of the overall precision distribution of the depth buffer. The only difference is one of distance: doubling a zNear of 1 makes it 2, while halving the zFar of 10,000,000 makes it 5,000,000. The back half of the world is lost to get the same gains from losing only one unit off the front.

They both say the same thing, but talking about a zFar at infinity (which is not a legitimate concept) would be confusing.

This is the exact reason why I suggested to do the calculation given zNear and zFar (infinity is ok) and z in that range (or really -z) to calculate how much z has to change to be picked up by a 24 bit depth buffer.

Most people don’t get to choose the Z values of their scene; that’s necessitated by what is being rendered. What a user has a choice over is zNear/zFar. Because of that, the most useful question you can ask is, “how much usable depth range does a given zNear/zFar pairing give me?” Just because you put your zFar out at 50,000,000 doesn’t mean that 50,000,000 units of space is actually usable.

In this case, I choose half-precision as the limit of “usable depth range”. The equation you’re suggesting has two unknowns: zNear/zFar, and the target Z value. The equation I use only has one unknown: zNear/zFar. This makes it much more obvious as to what’s going on and how the zNear/zFar ratio affects the resulting depth buffer precision.

Oh boy.

Lets write some equations down ok?

Lets take the standard frustum projection matrix with left=-right and bottom=-top, then:

z_clip = (n+f)/(n-f) * z_eye + 2nf/(n-f) * w_eye
w_clip = -z_eye

by “f=infinity” it is meant to let f–> infinity as a mathematical limit, the expression then becomes:

z_clip = -z_eye - 2n*w_eye
w_clip = -z_eye

performing w-divide and for the common situation where w_eye=1.0,

z_n = 1 + 2n/z_eye

Take a look at that expression now. Given z_eye, that is the normalized device co-ordinate when the far plane is infinity. Notice:

z_eye= -n gives: z_n = -1 and letting z_eye limit to negative infinity, you get z_n –> +1.

Now for the interesting questions that are worth asking:
Using a 24 bit depth buffer, given z_eye (negative), find delta_z (negative) so that the values written to the depth buffer are distinct for z_eye and delta_z. Do a similar exercise having a finite f as well. Call this function

delta_z(z_eye, n, f).

and here is the surprise for you:

delta_z(tz_eye, tn, tf) does not equal tdelta_z(z_eye, n, f) i.e. the affects of scaling n and f do not mean just pretend to scale z_eye too.

The concept of having the far plane out at negative infinity for shadow volumes is explained in the pdf: “Practical and Robust Shadow Volumes” here: Robust Shadow Volumes which is quite old.

You made an interesting suggestion. Well, I develop on Linux, and glIntercept has not been ported there yet. So I use the GL_DEBUG() macro. It’s not so bad really, even though it is ugly. Works in Windows too.

The concept of having the far plane out at negative infinity for shadow volumes

But we’re not dealing with uncapped shadow volumes; we’re dealing with projecting a scene for the purpose of rasterizing it. By thinking of the limit as zFar approaches infinity as the expected way to generate your projection matrices, you’re potentially throwing away valuable depth buffer precision.

Taking the limit of zFar as it approaches infinity may in fact be useful, but it’s not something you need to be introduced to when first doing perspective projections. And it’s certainly no solution for z-fighting.

delta_z(tz_eye, tn, tf) does not equal tdelta_z(z_eye, n, f) i.e. the affects of scaling n and f do not mean just pretend to scale z_eye too.

That’s because delta_z is a function of variables other than the near and far distances. The function I was using, determining the camera-space distance where 99.98% of the precision gets used, is purely a function of zNear and zFar. It does scale directly with the zNear/zFar ratio.

And I still think that it is easier to look at where 99.98% of the precision is going than to try to discuss a multi-variate function of camera z, zNear, and zFar. You can make a simple table or even a graph of the former, while the latter would require a 3D (or greater) graph.

I should never, ever post so late at night when so tired.

Many of my values for the coefficients are negated, everything is negated… sighs… and no one said anything… the correction is this:

z_clip = -z_eye - 2n
w_clip = -z_eye
so

z_n = 1 + 2n/z_eye

and this makes sense:
z_eye=-n --> z_n = -1 (in front)
z_eye=-infinity --> z_n = +1 (far in back)

Now missing details: it is not z_n that is important but z_w which is given by (unless glDepthRange has been called):

z_w= 0.5 + 0.5*z_n

and we had for zFar=infinity, z_n= 1 + 2n/z_eye so:

z_w = 1 + n/z_eye

Plugging in z_eye =-n we get z_w=0 so sanity check passes.
The interesting and important bit: z_w is really just a function of the ratio zNear to z_eye, this also makes sense too.

The more sadistic can do the case for zFar is not infinity:

z_clip = (n+f)/(n-f)*z_eye + 2nf/(n-f)

so

z_n = (n+f)/(f-n) + 2nf/(z_eye*(f-n))

so

z_w= 0.5 + 0.5*( (n+f)/(f-n) + 2nf/(z_eye*(f-n)) )

after a little simplifying :

z_w = (1 + n/z_eye)*f/(f-n)

thus the only difference is that factor f/(n-f), for n=1 and f=1000 the ratio is 1000/999, and taking f to 10,0000 it is 10000/9999 do the maths, you will see the precision loss on the bits of the depth buffer are minimal. Also on this subject one sees quite clearly that the ratio of zNear to zFar is not the important issue, but the ratios zNear to z and zFar to (zFar-zNear). The above also guides one very well on how to use a floating pointer depth buffer well (which all GL3 hardware can do AND many GLES2 hardware can do too!)

The above tell you exactly how z_eye must vary in order to prevent z-fighting.

Please THINK. Presenting projection matrices with the little bits of math behind it (we are talking high school algebra here) will take the magic out of the entire process which is critical to creating a good tutorial.