I am seeing small borders on the edges of decals in our clustered forward+ renderer. There is a very faint line in the foreground of this image, on the left side:
Decals are rendered in the forward pass, the same way a light is drawn, so these texture coordinates only get calculated in the pixel shader, and are not interpolated between vertices.
At first I thought this was an issue with non-dynamically uniform texture reads, but I think the issue is the mipmap being used at those pixels. Is there any way around this, other than just making sure the outer edge of the texture always has 0 stored in the alpha channel?
I tried this code, but the result looks the same with it. I do not understand how it would be able to calculate a miplevel from a single point on the edge.
You’re already onto what I was going to suggest: incorrect texture derivatives … or bad MIPs in the MIP chain. Though it looks like the former.
To refute the latter, use textureLod() with constant level numbers.
Assuming the former (bad texture derivs), IIRC these feed both trilinear filtering (which you’ve presumably got enabled) as well as anisotropic filtering. So to focus on the former, disable the latter.
Is your decalCoords passed in from a vertex shader input and interpolated across the primitive? Or is it procedurally generated?
Given that this is a geometry edge, I guess this could be some bad wrap mode, like using GL_CLAMP instead of GL_CLAMP_TO_EDGE. Though that doesn’t explain everything you posted above.
The coordinates are procedurally generated in the fragment shader, based on intersection with the decal volume. If mip levels for texture sampling are calculated each 2x2 block of pixels, then there will always be some edge pixels that do not have a neighbor that performs the same sampling. I think this is the cause of the problem.
Yes, that sounds right. Particularly if this results in invalid texcoords outside the decal volume.
This reminds me of the texcoord deriv issue that occurs when adjacent pixels want to sample from different cascaded shadow map splits. The fix being to use an algorithm which ensures consistent texcoord deltas across the split, such as forcing all 4 pixels in a quad to sample from the same split instead of different ones.
I describe that here. Props to Andrew Lauritzen for the original technique.
Links to the referenced sources below.
Basically, use shader thread communication to have the threads in each quad “vote” on which split index to use and use that, rather than choose the one each thread would choose in isolation. Then the computed texcoord derivatives are sane. The alternative is to compute analytic derivatives in the fragment shader and bypass using the implicit ones computed for you.
Note the dates on those references. This was before we had “shader invocation group” (warp/wavefront) cross-thread communication functions in GLSL, so Andrew uses simple dFdx() / dFdy() here to share data with neighbors. Nowadays with warp vote, shuffle, swizzle, there are doubtless more intuitive ways to do this. For details, see:
Oh btw. Scanning the spec, I was reminded of “helper invocations” (e.g. gl_HelperInvocation). Maybe you can use these to get reasonable texcoord values (from a derivatives perspective, not a in-range perspective) computed for those slightly-out-of-range pixels so that the implicit derivatives (dFdx()/dFdy()) just work for you by default.
How often are helper invocations run? Can I count on them only being created when the code path that calls dFdx/y is hit? Discarding the decal code based on volume testing is an important optimization.
Rasterizers tend to run FS invocations in groups of 4 adjacent samples, arranged as a quad. However, at the edges of a primitive, not all of those samples lie within the space of a primitive. As such, those invocations are not permitted to write anything. But they still execute, since intermediate computations from a 2x2 quad are used to compute implicit derivatives and dFdx function results. That is, dFdx/y simply subtracts between the value from the neighboring invocation, horizontally or vertically, of the shaders in the same quad.
These invocations which execute but don’t (directly) contribute to the output are called “helper invocations”. You can’t really choose when they get run. The primitive is rasterized into 2x2 chunks, and each chunk gets 4 shader invocations run on it. Depending on the shape of the primitive being rasterized, those invocations outside of the primitive will be considered helper invocations.
Thank you. I think in this specific case the right solution is to just make sure decal color textures always use zero in the alpha channel of their border pixels. I did not know that helper invocations existed, maybe that knowledge will come in handy elsewhere.