I’m trying to understand a bit better how OpenGL multisampling works. The Red Book doesn’t describe the blending of samples in much detail and I don’t see where the GL_ARB_multisample spec describes how the final fragment color is resolved.
My understanding is this:
[ul][li]For each fragment of a rasterized primitive, N subpixel samples are taken.For each of the N samples, the driver stores an interpolated depth value. Any operation that requires depth uses the sample’s depth.Only one color is stored, which is calculated once for the fragment. The spec states: “The single fragment color value is used for all sample operations, however, as is the current stencil value.” The Red Book (7th Edition) seems to contradict this, saying each fragment “has multiple colors, depths, and texture coordinate sets, based on the number of subpixel samples.” Which is right? Or where is the happy intersection I am not seeing?[*]For each fragment, a coverage bitmask is stored. If sample i falls within the geometric footprint of the primitive, bit i is turned on.[/ul][/li]
How is the coverage mask then used to modify the color? Naively, I’d suggest that the number of 1-bits divided by N would produce a coverage factor. This factor times the color would gradate between black and the fragment color. But that wouldn’t be appropriate for a white background.
It seems to me that it must be blending with the destination color somehow, but multisampling doesn’t require blending or depth-sorting.
How does it compute the fragment color?
Here’s one possible way to consider the implementation:
- Allocate the “sample buffer”. Each logical pixel in this buffer contains (color,depth,stencil) x SAMPLES.
- Primitive rasterization produces fragments. Each fragment covers some percentage of subpixel samples (the coverage mask.)
- Fragment varyings are evaluated once per fragment, at some sample location, and the fragment shader executes.
- The resulting fragment values (color, depth, and the context stencil ref) are fed to per-sample rasterops. Some rasterops (depth, stencil test etc) may discard samples. The depth value here is re-interpolated per sample prior to testing, if it was not written by the fragment shader. The original coverage mask produced by rasterization is also an input, and will discard samples outside of the primitive.
- Samples that are not discarded write their values to the corresponding locations in the sample buffer. Samples that were discarded will retain the previous values for that sample (i.e. at the edge of a triangle.)
- Later, the app swaps or blits, incurring a downsample resolve. The samples for each fragment are averaged and written to the output.
Thus, there is no blending or sorting. Per-sample values are stored during rasterization, and averaged during downsample.
A modern implementation is more complex:
- Typically all samples aren’t really written out to the sample buffer, since most of them are identical (full coverage mask.) Hardware uses compression to optimize framebuffer bandwidth.
- Varying evaluation may snap inside of the primitive, or not (centroid varyings.)
- The fragment shader may execute one per fragment, or more (ARB_sample_shading.)
- The rasterization coverage mask may be modified by alpha_to_coverage and/or SampleMask and/or gl_SampleMask.
Finally, the core specification is purposely ambiguous enough to allow a supersample implementation – simply allocating a buffer 2x or 4x larger and executing shaders once per sample is compliant.
I think I’m getting hung up on the idea that the fragment color is resolved immediately. Are you saying that each sample’s state persists throughout a frame draw and it can be updated independently of the other samples? This would allow for the “blending” of the background and the primitive.
Some time ago i did some experiments to get better understanding on how things worked, and concluded that nvidia does just when requested samples count > 16 (or 8, cant really remember now)
Yes. The spec wording implies that the resolve is done immediately. But an implementation only needs to ensure that the resolve appears to have been done immediately. Typically, the resolve is only done when needed (swap, blit, readpixels, copyteximage, etc.)
Consider glRenderbufferStorageMultisample(). You are allocating memory for each sample. You could render into it, then do other stuff, then render into it again. The renderbuffer could live forever (across many frames) so the values for each sample, pre-resolve, need to be maintained for that to work correctly.
This will depend on the hardware and driver. Some drivers combine multisampling and supersampling to achieve higher effective sample counts like 16x, 32x.
Exactly. arekkusu answered your question. But just to add to that, when you allocate a multisample (MSAA) framebuffer, inside the SwapBuffers, you can actually see that it takes more time than when you don’t have an MSAA framebuffer (if you’re free-running; that is, have disabled sync-to-vblank and are rendering frames as fast as possible). During this time, the driver/GPU is off downsampling the framebuffer at the last minute right before it makes it available for scan-out on the DVI.
If you are rendering off-screen to an MSAA texture or renderbuffer via framebuffer object (FBO), the downsample doesn’t occur until you actually do a glBlitFramebuffer to force the downsample (or as stated previously, do something else that would demand a downsample).
So yes. An MSAA render target can store all of the color/depth/stencil values for each sample individually.