Hi all,
I know from the specs that dFdx and dFdy return the derivatives in screen space for a fragment shader. I have a simple test case I have implemented using GLSL and for comparison I need to do the exact case on CPU. I want to know how to do my own dFdx/dFdy functions on CPU. Based on some hints from the internet, this is how I have proceeded but it does not produce the correct output.

The specification does not say how exactly gradients are computed. It simply gives the general idea of what the functions return. The OpenGL specification isn’t intended to be byte-identical, so implementations are free to pick and choose.

However, what you propose is most certainly not it. The parameter given to the derivative functions does not take a position; it takes the value to take the derivative of. So it only samples pixels from an image if you’re taking the derivative of the value of a texture lookup, or what you’re writing to the framebuffer.

If you take dFdx(vec3), you get a vec3, where each value is the component-wise derivative of that value in the X direction. You don’t pass a vec2 and get a float.

However, what you propose is most certainly not it. The parameter given to the derivative functions does not take a position; it takes the value to take the derivative of.

That is what is not making sense to me. If there is a single value, how is it going to take the derivative since as i showed u need to have two values subtracting which the derivative is obtained.

Typically, GPUs calculate 2x2 blocks of fragments in parallel. This is true even if some of those fragments are outside the primitive being drawn, they just get masked before writing to the framebuffer.

The gradients are then calculated as the difference between the values provided for two neighbouring fragments.

Hi Xmas,
Thanks for your response. So based on you reply and Alfonse’s, it means that the value of the parameter that is passed in to dFdx/dFdy function will be evaluated at the current 2x2 neighborhood. These values will then be subtracted to obtain the derivatives. So it is something along these lines if I understood correctly. Please correct me if I am wrong.

//T is any generic type (float,vec2, vec3 ...)
//I am assuming forward finite difference
T dFdx(T x) {
T x2 = getValueOf(x, gl_FragCoord.x+1);
T x1 = getValueOf(x, gl_FragCoord.x);
return x2-x1;
}
T dFdy(T y) {
T y2 = getValueOf(y, gl_FragCoord.y+1);
T y1 = getValueOf(y, gl_FragCoord.y);
return y2-y1;
}

So this means that the value that we pass in to the derivative functions will be evaluated for each fragment.

mobeen, my understanding is that your code is only half of it, for even (or odd) values of gl_FragCoord.x/y. For odd (or even) values, it would be like :

T x2 = getValueOf(x, gl_FragCoord.x);
T x1 = getValueOf(x, gl_FragCoord.x-1);

Hi Zbuffer,
Hmmm so this means it alternates btw forward and backward difference depending on whether it is on an even or odd scanline?

Ok, so I think I understand how it works. One thing though to emulate this behaviour on CPU, if I have a single float value say (x), i think I would need to store (4 values of x) (2x2) neighborhood per pixel. For GPUs, do they keep these values in registers or are these stored elsewhere?

Does it matter whether they’re stored in registers? That’s an implementation detail.

The reason the derivative functions are implemented this way is because shaders always work in 2x2 blocks. You never execute a shader on a single fragment; even if the triangle only generates one fragment, it will still run it on a 2x2 block. The system just discards the three useless values.

The 2x2 block is really just one piece of hardware that does the same operation to four separate pieces of data at once. So every time you say vec4 * vec4, this instruction is replicated 4 times and executed on 4 separate pieces of data, and the result is written to 4 separate values.

Derivative functions are the only ones that break this logic. They cross the boundaries and look at the other guy’s values. Those values could be in registers, or they could be in some local memory. It doesn’t matter, because it’s all the same piece of hardware.

Thanks Alfonse and ZbuffeR for your answers. The reason I asked this was so that I could understand what the hardware does and then try to implement it on my CPU raytracer. The derivative functions are useful for doing anti-aliasing and based on the information you people shared, I managed to get it working in my CPU raytracer for my procedural checker texture.

Attached are two snapshots from my raytracer showing in first the results without anti-aliasing and the second with anti-aliasing using the derivatives.

Result of No anti-aliasing

Result of Anti-aliasing

Thanks once again for the information. You are really really helpful.