Yeah. I definitely spotted that technique, did a scan, and bookmarked for later, but you’re probably way ahead of me in detail chasing.
As I recall it wasn’t really explicit about many details.
More SRAA links:
If I understand the algorithm, the idea is to rendering the depth and/or normal buffers at 4x coverage (using MSAA), resolve them, run the deferred lighting pass at 1x, and then go back and post-process. The post process can ‘guess’ the original sub-pixel coverage by looking at depth and normal variance and then blur neighboring shaded samples into place.
(First: did I get that right? Totaly borked?
That’s not exactly what I got. For instance, don’t get the impression that the the MSAA depth and normal buffers are resolved to 1X (what would that mean anyway?). But instead they are used as-is to provide sub-pixel depth and normal info for the filtering. Note that it says that:
which suggests that they’re may be grabbing different subsample normals/depths during the reconstruction of a pixel.
I suppose this will become much more clear later when I read up on their distance metric and get my head deep inside their CUDA pseudo-code.
Now here’s the part I am befuddled about: I thought that current hardware can only rasterize to uniform multiple render targets, meaning all the same width/height, and all the same multisample depth.
Did that limitation ever get relaxed in a new generation of hw?
I don’t think so.
And…if that limitation is correct, it seems to me that I’d need to use MSAA render targets on every layer of my G-Buffer.
Not necessarily.
What they do say is that you need MSAA depth, optional MSAA normal, and a 1X shaded image (obtained … somehow). But they acknowledge you can feed in more shaded and/or geometry samples per pixel if you want.
But they acknowledge there are a few ways to get that:
It’s not really clear here, but I presumed they meant: 1) rasterizing all the G-buffer channels MSAA, and then light just one sample per pixel to generate 1X shading buffer, OR 2) (they explain this later:) rasterize a few 1XAA buffers, each with a subpixel offset from the rest (old-style AA before MSAA).
Before you discount 1) out-of-hand, note that rasterizing the G-buffer 4xMSAA is surprisingly cheap, possibly due to MSAA bandwidth compression on most of the pixels (and other factors I’m not aware of).
While they say they wouldn’t recommend 16X AA rasterization, note that the 4XAA G-Buffer space consumption for SRAA is exactly 25% of 16X SSAA, suggesting full 4X MSAA space. However, their space numbers work out to 6.29 bytes/sample (??), …so I’m not really sure what to make of that…
(The white paper also mentioned simply doing a depth or depth/normal pass 4 times with sub-pixel camera movement, but since my app is bound by the cost of traversing the scene graph due to batches and vertex bus bandwidth limits, drawing everything 4x would be a show stopper.)
Yeah, like they say, probably just rasterizing with 4x MSAA is better.
Though there may also be a way to (and I vaguely remember there is – oh yeah, sure there is!) a way render all four 1xAA buffers with subpixel offsets in a single render pass. Seems like you can use the geometry shader to route different streams to different layers of a multilayer render target (google “layered rendering” in the OpenGL registry). Think this can also be used to render all the faces of a cube at once.
That said, 4x MSAA is pretty cheap and my guess cheaper than even doing 4 1xAA renders with layered rendering in one pass.