Per Fragment depth mask

kRogue · May 21, 2012, 12:13pm

There are some pretty nifty uses for the following:

Per Fragment depth mask, i.e. a fragment shader can dictate if the depth buffer is to be written with the depth value of the fragment. Interaction with glDepthMask is that it is an AND, i.e. the depth buffer is written to if and only if both glDepthMask is GL_TRUE and the fragment shader does not say “don’t write depth”. To be precise, introduce a “function” in the fragment shader, “gl_DontWriteDepth()” then if called within the fragment shader prevents the depth buffer from being updated. Another approach is to create a new GLSL built in fragment shader only variable:


bool gl_FragmentDepthMask;

that is initialized as true and if the value of it is false when main() of the fragment shader exits, then the depth value for the fragment is not updated.

Along similar lines, same jazz for stencil buffer would be nice too. [For non-integer color buffers, one can emulate this via blending, for per-channel masking one can still do this with blending via GL_ARB_blend_func_extended].

Ilian_Dinev · May 21, 2012, 5:14pm

What are the nifty uses, that require this to save you from doing a 2-drawcall solution of alphatest/discard on pass 1 + depthmask(false) on pass 2 ? I mean, it’s a thing that has been possible at a tiny extra cost for the past 10-15 years, and no-one came-up with a use for it?

aqnuep · May 21, 2012, 8:11pm

I’d be also interested in the use cases. It sounds simple enough that even if current hardware is incapable of doing it, maybe future hardware can easily add support for it.
Please share your use cases, seriously interested!

kRogue · May 22, 2012, 12:08am

The two pass solution is what I do now, but if the fragment shader is heavy, it is not a happy place.

Roughly speaking the idea is a follows: when one runs the fragment shader, one can compute a coverage value. Now, if that coverage is not “enough”, then one would like to draw that fragment blended with the background [I am not talking about transparent stuff at all]. That bites as then that requires rendering back to front… so, break it into two separate buffers:

[ol]
[li]initialization: setup 2 target FBO with RGBA(opaque color), RGBA(transparent)[/li][li]set opaque target to blending add with operation being GL_SRC_ALPHA and GL_ONE_MINUS_SRC_ALPHA[/li][li]set transparent target with blending OFF[/li][/ol]

Then the shader code is like this


out vec4 opaque_color_target;
out vec4 transparent_color_target;

if(coverage<SOME_THRESHOLD)
{
   opaque_color_target=vec4(0.0, 0.0, 0.0, 0.0);
   gl_DepthFragmentMask=false;

   transparent_color_target=vec4(color, coverage);
}
else
{
  /*
    Note that we overwrite the transparent completely to nothing
   */
   opaque_color_target=vec4(color, 1.0);
   gl_DepthFragmentMask=true;

   transparent_color_target=vec4(0.0, 0.0, 0.0, 0.0);
}

Once the buffers are drawn then one presents the buffers, essentially


RGB=opaque_color_target.rgb + transparent_color_target.a*transparent_color_target.rgb

With this thing we get coverage based anti-aliasing where edges do not overlap (edges overlapping is really not that common enough in terms of number of pixels for me to freak). My target is not handling “jaggies” of primitives, but rather expensive font shaders that draw glyphs as quads and I want to render opaque UI’s in an out-of-order fashion.

Currently, I emulate this with a two pass technique, but that means the font shader (or another expensive shader) gets run twice which, well, sucks. Also there is no guarantee that the front most “AA-crud-edge” will gets its day in the sun. On the subject of AA-based on primitives, there is this: http://www.khronos.org/registry/gles/extensions/NV/EGL_NV_coverage_sample.txt which kind of smells like the above, but it only handles the stuff at a primitive level, not a fragment level which is needed for font rendering and for that matter any fragment shader that uses discard.

One can naturally extend this to the idea of multiple edges on a fragment supported in much the same way one does out of order transparency (which in fact this reduces to)… but the main use case is to just avoid running a primitive and all its fragment stuff again to get around something that might be just an API oversight.

aqnuep · May 22, 2012, 8:53am

Why don’t you simply use discard and image store on the transparent color target in the following way?

out vec4 opaque_color_target;
uniform image2D transparent_color_target;

if (coverage < SOME_THRESHOLD)
{
   imageStore(transparent_color_target, ivec2(gl_FragCoord.xy), vec4(color, coverage));
   discard;
}
else
{
   imageStore(transparent_color_target, ivec2(gl_FragCoord.xy), vec4(0.0, 0.0, 0.0, 0.0));
   opaque_color_target = vec4(color, 1.0);
}

This should be equivalent to your proposed approach and should have the same performance.

kRogue · May 22, 2012, 12:20pm

I had always figured that imageStore was slower than usual drawing… also imageStore is like using a 1000000 pound hammer when I really do not need something so… powerful… also I need to be a touch paranoid about the sync issues… I admit that my memory is a haze of how imageStore behaves on repeatedly setting the same texel, i.e. the last fragment logically may not be the last fragment that bops the imageStore… but I cannot remember well enough all the sync issues of it (usually folks use imageStore in combination with atomic ops so that one usually only writes to one location at most once in a frame), indeed from the spec:

  * The relative order of invocations of the same shader type are        undefined.  A store issued by a shader when working on primitive B        might complete prior to a store for primitive A, even if primitive A        is specified prior to primitive B.  This applies even to fragment        shaders; while fragment shader outputs are written to the framebuffer        in primitive order, stores executed by fragment shader invocations are        not.

this makes me a touch nervous, since I can imagine that if multiple primitives are being processed, and if they overlap, then who/what gets the last word on store could get icked up potentially…

aqnuep · May 22, 2012, 2:07pm

Fair enough. However, I feel that the spec language is a bit too conservative here. There should be an easy way to synchronize in such a simple use case.
Anyway, despite I like the idea of gl_DepthFragmentMask, I don’t feel that your particular use case really justifies the need for such hardware feature…

kRogue · May 22, 2012, 2:11pm

In brutal honesty that use case was/is my motivation, but I suspect there are other uses… i also somewhat suspect it would be possible since a fragment shader can change the depth value (though that being able to change the value is not a guarantee to have the ability to not write a value :whistle: )…

But doing AA this way is really cheap on memory and computation especially when compared to the computation and memory hammer of MSAA and that MSAA introduces it’s own set of issues [what filter, etc]. … this trick I want to do though does not handle when texture data is under-sampled (for example a fence shown edge on, though anisotropic filtering would mostly sort that out too). With the above combined with a geometry shader using triangles with neighbors to rasterize edge-crud, one can get very high quality AA…at little memory and computation cost.

My ideal anti-aliasing without MSAA “GL extension package” would have this:
[ul]
[li]ability for a fragment shader to control if depth is written[/li][li]a per-primitive and per-edge option controlled by the geometry shader to say essentially “draw a pixel for a primitive if any portion of the primitive goes through the pixel”[/li][li]another(!) GLSL variable, gl_Coverage that gives 1.0 if using usual GLSL rasterization rules and gives the coverage when using the new funky rule controlled by the geometry shader of “raw a pixel for a primitive if any portion of the primitive goes through the pixel”[/li][/ul]

Tho motivation for the 2 other things is so that one does not need to make a “guess” or icky computation on how much to inflate an edge to draw the anti-alias crud. Those above would give one the ability to do pretty high quality anti-aliasing with minimal memory and computation cost.

One can expand this to using the typical out of order transparency to also handle when there are multiple edge-cruds hitting one pixel (the images would also need to store a depth and the final present pass would ignore those values whose depth is greater than that which is on the depth buffer, this use case does not worry about the out of order thing since each entry within the image is written to at most once and the counter is incremented/used with atomic ops). I freely admit, I don’t think this expansion is worth the bother since it is worrying about when edges on the screen intersect which is not that many pixels…

aqnuep · May 22, 2012, 6:25pm

While your idea sounds neat, I don’t think that using a geometry shader for all the rendering would be less bandwidth and/or computation hungry. Having a geometry has its (pretty high) cost.
Also, the fact that the fragment shader can change the depth value does not necessarily mean that it can change the write mask too. One similar example is blending, where you can output one or two operands of the blend function, but you cannot change the blend function, equation or “enabledness” from the shader.

I don’t say that your idea is not good, and here I both mean the technique and the feature proposal. They are interesting. However, I’d be happy to hear more possible use cases for the depth mask thing as, personally, I don’t have any currently. But hey, that’s why this is a forum, anybody can share their ideas and use cases

Ilian_Dinev · May 23, 2012, 12:39am

You don’t need GS to do non-MSAA AA, either
http://www.humus.name/index.php?page=3D

kRogue · May 23, 2012, 11:51am

That method essentially makes the AA by comparing depth values (and requires a secondary depth buffer for back facing primitives) identifying edges by looking at depth value differences. Nothing wrong with the technique, i think it is nifty… other AA techniques without MSAA are on Humus too (for example GPAA which when I look at it smells like a workaround to not having that which I am suggesting/begging for).

My main thoughts on my begging of the first “don’t write to depth controlled by fragment shader” is that if (and it is a big freaking if) the hardware can do it but it is not exposed then it should get exposed. Likewise same jazz for the “AA extension pack” though, I strongly suspect that those other 2 things (change rasterization rule on a edge by edge basis and emit a coverage percentage) are NOT in hardware now…

Suggestions, begging, what’s the difference, eh?

Ilian_Dinev · May 23, 2012, 2:08pm

Look at the GBAA again, test it. It’s not just an edge-detection identification. It creates almost-perfect coverage info and result. Has a pitfall similar to your idea, but in extremely lesser degree.

kRogue · May 24, 2012, 12:42am

the GBAA uses an extra color buffer … and a geometry shader (but the geometry shader emits 1-triangle per triangle in, so not a nasty geometry shader) … the idea being that the gemetry shader emits extra information about the edge distance stuff, and then the fragment shader stores which direction to sample in case of an edge on the post pass… This method also has bits for handling when a primitive is clipped to an alpha value, but it requires that the alpha is computed inside of uniform control (i.e. it needs dFdx/dFdy operating on alpha)… for hard font rendering, which uses dependent texture look up or when the texture data to look up is not to be filtered, the alpha addition will not work correctly.

Each of the techniques at Humus for anti-aliasing without MS are neat, but it just seems to me an engineering effort to get around something that the hardware might be able to do…

The “Second Depth Anti-aliasing” technique could be tweaked so that rather than reading a depth value, the fragment shader writes an “object ID” to a target so that the comparison is simple != rather than worry about depth precision issues…

Ilian_Dinev · May 24, 2012, 11:29am

Ah actually I meant the “second depth” instead of GBAA.

With the “object ID”, how can you reconstruct the edge/coverage? You’ll need a drawcall IDX, primitive IDX, the modelviewproj matrix for the drawcall, and do slower complex maths. In the end you won’t get a much-better image result; the current technique seems to be already good enough.

P.S.
Btw I agree that HW should implement+expose a distance-to-nearest-edge data in the fragment shader. Still, it’s a chicken’n’egg thing, few people have regained trust in GS’s performance to start doing such a thing, that requires a GS; and so a thing that relies on distance-to-edge won’t become often-used/defacto-standard, and so won’t become accelerated. Unless some/all vendors decide to spare a few gates for this on a whim.

kRogue · May 24, 2012, 12:06pm

With the “object ID”, how can you reconstruct the edge/coverage? You’ll need a drawcall IDX, primitive IDX, the modelviewproj matrix for the drawcall, and do slower complex maths. In the end you won’t get a much-better image result; the current technique seems to be already good enough.

for me… I am mostly worried about rendering UI’s… for that which I work on, UI elements are batched together, so a UI element’s tag is essentially a tuple: (layer, gl-state-vector-ID, batch-ID, element-ID)… each of these values is less than 255, typically well under 50 often enough… so rather than do a depth comparison, I make a secondary RGBA and two pixels come from different objects if that value is different… for (flat) UI’s there are “no back facing primitives” and all edges reduce to silhouette edges… but I am beyond wary about putting this on an embedded device (AFAIK only embedded GPU out now that does MRT’s is Tegra AND Tegra does not support depth textures [and to add more pain Tegra’s depth buffer is 16-bits only]). Even if I restrict it to just Tegra, I am quire concerned on the bandwidth cost… Fullscreen X 2 buffers, then read back, then draw again… lots of bits flying there!