SRAA - GBuffer Costs

imported_bsupnik · April 22, 2011, 12:31pm

Hi Y’all,

I’ve been looking at subpixel reconstruction anti-aliasing:
http://graphics.cs.williams.edu/papers/SRAAI3D11/Chajdas11SRAA.pdf
and I’ve read the white paper twice now. I fear I’ve missed something obvious.

If I understand the algorithm, the idea is to rendering the depth and/or normal buffers at 4x coverage (using MSAA), resolve them, run the deferred lighting pass at 1x, and then go back and post-process. The post process can ‘guess’ the original sub-pixel coverage by looking at depth and normal variance and then blur neighboring shaded samples into place.

(First: did I get that right? Totaly borked?

Now here’s the part I am befuddled about: I thought that current hardware can only rasterize to uniform multiple render targets, meaning all the same width/height, and all the same multisample depth.

Did that limitation ever get relaxed in a new generation of hw?

And…if that limitation is correct, it seems to me that I’d need to use MSAA render targets on every layer of my G-Buffer.

So…it seems to me that this algorithm would greatly raise the VRAM and memory bandwidth costs of filling my G-Buffer. This seems like a serious limitation!

(The white paper also mentioned simply doing a depth or depth/normal pass 4 times with sub-pixel camera movement, but since my app is bound by the cost of traversing the scene graph due to batches and vertex bus bandwidth limits, drawing everything 4x would be a show stopper.)

Any hints on something I missed would be much appreciated.

cheers
Ben

PS the paper also says this:
“Our algorithm has one main limitation: in uniform regions it will introduce blur to the output.”
That seems like a really big issue to me in terms of final image quality!!

Dark_Photon · April 22, 2011, 7:38pm

Yeah. I definitely spotted that technique, did a scan, and bookmarked for later, but you’re probably way ahead of me in detail chasing.

As I recall it wasn’t really explicit about many details.

More SRAA links:

SRAA - Link to slides, pics, video, paper, etc.
Subpixel Reconstruction Antialiasing (SRAA) - NVidia research link
SRAA I3D Presentation
SRAA Paper link (another one)

If I understand the algorithm, the idea is to rendering the depth and/or normal buffers at 4x coverage (using MSAA), resolve them, run the deferred lighting pass at 1x, and then go back and post-process. The post process can ‘guess’ the original sub-pixel coverage by looking at depth and normal variance and then blur neighboring shaded samples into place.

(First: did I get that right? Totaly borked?

That’s not exactly what I got. For instance, don’t get the impression that the the MSAA depth and normal buffers are resolved to 1X (what would that mean anyway?). But instead they are used as-is to provide sub-pixel depth and normal info for the filtering. Note that it says that:

which suggests that they’re may be grabbing different subsample normals/depths during the reconstruction of a pixel.

I suppose this will become much more clear later when I read up on their distance metric and get my head deep inside their CUDA pseudo-code.

Now here’s the part I am befuddled about: I thought that current hardware can only rasterize to uniform multiple render targets, meaning all the same width/height, and all the same multisample depth.

Did that limitation ever get relaxed in a new generation of hw?

I don’t think so.

And…if that limitation is correct, it seems to me that I’d need to use MSAA render targets on every layer of my G-Buffer.

Not necessarily.

What they do say is that you need MSAA depth, optional MSAA normal, and a 1X shaded image (obtained … somehow). But they acknowledge you can feed in more shaded and/or geometry samples per pixel if you want.

But they acknowledge there are a few ways to get that:

It’s not really clear here, but I presumed they meant: 1) rasterizing all the G-buffer channels MSAA, and then light just one sample per pixel to generate 1X shading buffer, OR 2) (they explain this later:) rasterize a few 1XAA buffers, each with a subpixel offset from the rest (old-style AA before MSAA).

Before you discount 1) out-of-hand, note that rasterizing the G-buffer 4xMSAA is surprisingly cheap, possibly due to MSAA bandwidth compression on most of the pixels (and other factors I’m not aware of).

While they say they wouldn’t recommend 16X AA rasterization, note that the 4XAA G-Buffer space consumption for SRAA is exactly 25% of 16X SSAA, suggesting full 4X MSAA space. However, their space numbers work out to 6.29 bytes/sample (??), …so I’m not really sure what to make of that…

(The white paper also mentioned simply doing a depth or depth/normal pass 4 times with sub-pixel camera movement, but since my app is bound by the cost of traversing the scene graph due to batches and vertex bus bandwidth limits, drawing everything 4x would be a show stopper.)

Yeah, like they say, probably just rasterizing with 4x MSAA is better.

Though there may also be a way to (and I vaguely remember there is – oh yeah, sure there is!) a way render all four 1xAA buffers with subpixel offsets in a single render pass. Seems like you can use the geometry shader to route different streams to different layers of a multilayer render target (google “layered rendering” in the OpenGL registry). Think this can also be used to render all the faces of a cube at once.

That said, 4x MSAA is pretty cheap and my guess cheaper than even doing 4 1xAA renders with layered rendering in one pass.

imported_bsupnik · April 23, 2011, 8:00am

Resolve means: the 4x MSAA buffer is blitted with a down-scale to a 1x buffer using something like FBO-blit. My understanding is that normally MSAA buffers are ‘resolved’ either explicitly (by blitting to another 1x FBO) or implicitly (by doing something like a readback from an MSAA surface into a non-MSAA destination). The specs on MSAA + FBOs hint at this (see the first few paragraphs of http://www.opengl.org/registry/specs/EXT/framebuffer_multisample.txt).

So what I was suggesting is that the 4x MSAA G-Buffer is “resolved” (blitted) to a 1x copy that can then be used for shading calculations. But I suppose it would be possible to shade from the 4x version directly by simply reading one of four multisamples.

Note that it says that:

Right - clearly the actual post-process is happening at 4x MSAA resolution!

What they do say is that you need MSAA depth, optional MSAA normal, and a 1X shaded image (obtained … somehow). But they acknowledge you can feed in more shaded and/or geometry samples per pixel if you want.

Right…what caught my eye was…if I have 4x normals and 4x depth, don’t I end up with 4x albedo and materials too because my G-Buffer has to be uniform for all render targets.

Before you discount 1) out-of-hand, note that rasterizing the G-buffer 4xMSAA is surprisingly cheap, possibly due to MSAA bandwidth compression on most of the pixels (and other factors I’m not aware of).

Right - maybe the entire G-Buffer is 4x MSAA and that’s not as bad I think it is….

Though there may also be a way to (and I vaguely remember there is – oh yeah, sure there is!) a way render all four 1xAA buffers with subpixel offsets in a single render pass. Seems like you can use the geometry shader to route different streams to different layers of a multilayer render target (google “layered rendering” in the OpenGL registry). Think this can also be used to render all the faces of a cube at once.

That said, 4x MSAA is pretty cheap and my guess cheaper than even doing 4 1xAA renders with layered rendering in one pass.

Interesting - I hadn’t thought of using layered rendering….but…it does sound like they’re aiming at using 4x hw MSAA. My G-Buffer currently uses 32 bytes per pixel, so the notion of increasing its storage requirements by 4x still seem a little bit scary.

Cheers
Ben