Extremely different behaviour between vec3(0.9999, 1.0, 1.0) and vec3(1.0, 1.0, 1.0)

tsojtsoj · October 31, 2019, 10:21am

Hi,
I am having an issue with this code:

#version 330

#define NUM_CASCADES 4

layout (location = 0) out vec3 shadow;

in VsOut
{
  vec4 frag_position_light_space[NUM_CASCADES];
  vec4 frag_position_world_space;
} vs_out;

uniform sampler2DArrayShadow depth_map;
uniform float cascades[NUM_CASCADES];

float directional_shadow_calculation(
  vec4 frag_position_light_space,
  sampler2DArrayShadow depth_map,
  int cascade
)
{
  vec3 proj_coords = frag_position_light_space.xyz / frag_position_light_space.w;
  proj_coords = proj_coords * 0.5 + 0.5;
  float current_depth = proj_coords.z;
  return texture(depth_map, vec4(proj_coords.xy, cascade, current_depth));
}

void main()
{
  int cascade = -1;
  float current_cascade_distance = length(vs_out.frag_position_world_space.xyz);
  for(int i = 0; i<NUM_CASCADES; i++)
  {
    if(current_cascade_distance <= cascades[i])
    {
      cascade = i;
      break;
    }
  }

  if(cascade == -1)
  {
    //shadow = vec3(0.9999999, 1.0, 1.0); // <- ISSUE IS HERE
    shadow = vec3(1.0, 1.0, 1.0);
  }
  else
  {
    shadow = vec3(1.0, 1.0, 1.0) * directional_shadow_calculation(
      vs_out.frag_position_light_space[cascade],
      depth_map,
      cascade
    );
  }
}

when I run my program with vec3(0.99999, 1.0, 1.0) it works as I intended (When an object lies within a shadowmap I will render with shadow information, else it will multiply the final color with effectively vec3(1.0, 1.0, 1.0)).
However, if I run the code with vec3(1.0, 1.0, 1.0) everything including the valid shadow information gets white except some few little weird artifacts.
Also, this issue doesn’t happen when I run it on a VirtualBox Lubuntu.
My machine:

System:    Host: tsoj-pc Kernel: 4.19.80-1-MANJARO x86_64 bits: 64 Desktop: Xfce 4.14.1 Distro: Manjaro Linux 
Machine:   Type: Laptop System: Acer product: Swift SF314-52 v: V1.08 serial: <root required> 
           Mobo: KBL model: Suntory_KL v: V1.08 serial: <root required> UEFI: Insyde v: 1.08 date: 11/28/2017 
Battery:   ID-1: BAT0 charge: 21.5 Wh condition: 35.2/50.8 Wh (69%) // <- i hate this fucking battery
CPU:       Topology: Quad Core model: Intel Core i5-8250U bits: 64 type: MT MCP L2 cache: 6144 KiB 
           Speed: 692 MHz min/max: 400/3400 MHz Core speeds (MHz): 1: 665 2: 633 3: 651 4: 622 5: 628 6: 616 7: 630 8: 633 
Graphics:  Device-1: Intel UHD Graphics 620 driver: i915 v: kernel 
           Display: x11 server: X.Org 1.20.5 driver: intel unloaded: modesetting resolution: 1920x1080~60Hz 
           OpenGL: renderer: Mesa DRI Intel UHD Graphics 620 (Kabylake GT2) v: 4.5 Mesa 19.2.2 
Audio:     Device-1: Intel Sunrise Point-LP HD Audio driver: snd_hda_intel 
           Sound Server: ALSA v: k4.19.80-1-MANJARO 
Network:   Device-1: Intel Wireless 7265 driver: iwlwifi 
           IF: wlp3s0 state: up mac: f8:59:71:6d:e5:8e 
Drives:    Local Storage: total: 238.47 GiB used: 182.60 GiB (76.6%) 
           ID-1: /dev/nvme0n1 vendor: Intel model: SSDPEKKW256G7 size: 238.47 GiB 
Partition: ID-1: / size: 233.43 GiB used: 182.60 GiB (78.2%) fs: ext4 dev: /dev/nvme0n1p2 
Sensors:   System Temperatures: cpu: 46.0 C mobo: N/A 
           Fan Speeds (RPM): N/A 
Info:      Processes: 232 Uptime: 2h 01m Memory: 7.67 GiB used: 4.40 GiB (57.4%) Shell: bash inxi: 3.0.36

Dark_Photon · October 31, 2019, 12:19pm

I’ll look back later when I have more time, but you might want to read this in the wiki:

Sampler_(GLSL)#Non-uniform_flow_control

A few other suggestions (likely not related to your problem):

I doubt you want to chose the split based on radial distance (on world space). More likely you want eye-space Z distance.

Until you get this solidly debugged, I’d recommend that you disable anisotropic filtering on your shadow map texture if it is on. Your code currently does not provide reasonable derivatives for shadow texture lookup on split boundaries (for multiple reasons), and you’re going to want to fix those if/when you enable anisotropic filtering.

Finally, check that you have a min filter set on the texture that doesn’t use MIPmaps (e.g.GL_LINEAR or GL_NEAREST). You’ve probably got this set right already though. MIPmap texture filtering is one other feature that will go nuts if your texcoord derivatives are nonsense.

tsojtsoj · October 31, 2019, 9:09pm

Ok, that link about uniform flow control helped. I changed the main function to this:

void main()
{
  float shadows[NUM_CASCADES];
  for(int i = 0; i<NUM_CASCADES; i++)
  {
    shadows[i] = directional_shadow_calculation(
      vs_out.frag_position_light_space[i],
      depth_map,
      i
    );
  }

  shadow = vec3(1.0, 1.0, 1.0);
  float current_cascade_distance = length(vs_out.frag_position_world_space.z);
  for(int i = 0; i<NUM_CASCADES; i++)
  {
    if(current_cascade_distance <= cascades[i])
    {
      shadow = vec3(1.0, 1.0, 1.0) * shadows[i];
      break;
    }
  }
}

And it works. However, it seems like it is slower. I don’t have a good idea, how costly a texture lookup is, but i guess the reason for the slowdown is that i always access all textures. Is there a way to make this more efficient without changing the fundamental design?

I doubt you want to choose the split based on radial distance (on world space). More likely you want eye-space Z distance.

Yes, what i had was not really accurate but because currently in my shaders the camera is always at 0,0,0 (basically frag_position_world_space is really already in camera space, maybe bad var-naming) it doesn’t matter too much. i changed it a little bit so that it is indeed just the z value which makes more sense regarding the creation of the shadow maps.
I didn’t understand the point with the derivatives completely, is this important even if I don’t use anisotropic filtering?

GClements · November 1, 2019, 12:35am

It matters for anisotropic filtering and for MIPmap selection. Both of these require scale factors which are normally derived from the partial derivatives of the texture coordinates with respect to screen-space X and Y. Specifically, the GLSL texture() function finds the derivatives of the supplied texture coordinates and uses these to calculate the scale factors. But derivatives aren’t defined within non-uniform control flow.

You can get around the problem by using the textureGrad() function which accepts derivatives as separate arguments. You can compute the derivatives outside of the conditional and pass them in. E.g. rather than calling directional_shadow_calculation (which both computes the coordinates and performs the texture lookup) in the loop, you only need to calculate the coordinates for each cascade, calculate the derivatives for each cascade (using dFdx and dFdy), then perform the texture lookup only for the cascade being used. You have to calculate the derivatives for each cascade because dFdx and dFdy can’t be used within non-uniform control flow.

Also, if you know that the texture coordinates for different cascades differ only by an affine transformation, you could calculate the derivatives for one set of texture coordinates and transform the derivatives accordingly.

tsojtsoj · November 1, 2019, 1:52am

Ok, if I understood correctly, then it doesn’t matter if I do a texture lookup in non-uniform-flow-control if I use textureGrad(depth_map, shadows_coords[i], d_x[i], d_y[i]) (with shadows_coords[i], d_x[i], d_y[i] caculated at the begin of the shader in uniform-flow-control)?

I tried now this code, which implements this idea, but I get some random noise again if I run this:

vec4 directional_shadow_calculation(vec4 frag_position_light_space, int cascade)
{
  vec3 proj_coords = frag_position_light_space.xyz / frag_position_light_space.w;
  proj_coords = proj_coords * 0.5 + 0.5;
  float current_depth = proj_coords.z;
  return vec4(proj_coords.xy, cascade, current_depth);
}

void main()
{
  vec4 shadows_coords[NUM_CASCADES];
  vec2 d_x[NUM_CASCADES];
  vec2 d_y[NUM_CASCADES];
  for(int i = 0; i<NUM_CASCADES; i++)
  {
    shadows_coords[i] = directional_shadow_calculation(vs_out.frag_position_light_space[i], i);
    d_x[i] = dFdx(shadows_coords[i].xy);
    d_y[i] = dFdy(shadows_coords[i].xy);
  }

  shadow = vec3(1.0, 1.0, 1.0);//<- this make noise, this works -> vec3(0.999, 1.0, 1.0);
  float current_cascade_distance = length(vs_out.frag_position_world_space.z);
  for(int i = 0; i<NUM_CASCADES; i++)
  {
    if(current_cascade_distance <= cascades[i])
    {
      shadow = vec3(1.0, 1.0, 1.0) * textureGrad(depth_map, shadows_coords[i], d_x[i], d_y[i]);
      break;
    }
  }

Dark_Photon · November 1, 2019, 12:45pm

tsojtsoj:

Ok, that link about uniform flow control helped. I changed the main function to this:
void main()
{
    ... // sample texture 4 times, and discard all results but 1
}
And it works. However, it seems like it is slower.

Yeah, you don’t want to do that.

Yes, definitely.

When I implemented Cascaded Shadow Maps a few years ago, I hit this too, and found several solutions. They’re detailed in this thread:

Cascaded shadow map bug between splits

The last post mentions some of the solutions I found, which I’ll just re-iterate here. Either:

Tweak the split indices in the shader so that pixels in the same GPU pixel quad always use the same split.
Compute analytic texcoord gradients in the shader.

Both are described in:

Shader X7, Chapter 4.1 (Practical Cascaded Shadow Maps), “Filtering Across Splits” section, pp. 321-327
…but the code they list for #1 (derived from a snippet by Andrew Lauritzen) is buggy.

You can find the correct, original snippet here:

Variance Shadow Maps Demo (D3D10) (Andrew Lauritzen, Beyond3D post, 4/25/07)
Use log2 or his lookup table as desired. Also the dot trick for computing the split saves a surprisingly number of frag instructions.

I ended up using this (i.e. solution #1). It’s a pretty slick trick, and it works well.

It definitely matters for MIPmap filtering and anisotropic texture filtering. However even if those are disabled (and you probably want them both disabled here), since texture() makes use of derivatives (gradiants), and in this case the gradiants are undefined, intuitively it could result in the texture lookup results being undefined as well depending on how the lookup is performed. The spec may explicitly define this case as undefined or not; haven’t checked.

system · October 19, 2021, 7:11pm

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.