MRT with different size,viewport,format

vkMarco · May 4, 2011, 5:53am

Hi,

i want to draw into a G-Buffer with MRT. The G-Buffer is composed by an FBO with 3 attached textures with different size and different format. So, in order to render correctly, i need different viewports too for each textures. I investigated it using layered rendering but, if i understand it correctly, it seems not possible.

What do you think about?

Thank you

BionicBytes · May 4, 2011, 6:51am

OpenGL 4.x introduced a new extension allowing multiple viewports so that may solve your issue.
I simply use MRT with the attachments of the same size.
Is there a reason you can’t do this too?

vkMarco · May 4, 2011, 7:00am

Hi,

thank you for your answer.

I know glViewportArray and i tried it with layered rendering but it is a different thing. Anyway with layered rendering you need to provide a 3D texture or an array texture with same size and same format for each layer.

The reason for a different size is saving bandwidth in a deferred shading or for any technique where you want to send multiple fragment output but different resolution.

Dark_Photon · May 4, 2011, 5:41pm

I may be able to help you there.

First, ignoring layered rendering (and SAMPLE_SHADING, where the frag shader runs per sample), your frag shader runs per pixel, right? This explains why the GL spec says, regarding rendering to textures/rbuffers of different sizes via FBO:

Given this, it would seem that your render targets at least have to have the same resolution, in pixels, assuming you don’t just want some unused pixels off off on the sides. Your frag shader runs per pixel, outputting per-pixel values for each render target.

So let’s consider the lowest res you’d like any G-buffer channel to have to be the resolution in pixels that “all” of them have.

How’s that help? Consider this:

Suppose the res you want for your “higher-res” G-buffer channels is 2x2 greater than the “lower-res”. Ok, so allocate each and every G-buffer channel as 4x MSAA textures with a pixel res that’s the “lower-res” resolution. Now rasterize your scene into the N-channel 4xMSAA G-buffer.

And right now you’re thinkin’, "Whoah! dude! I told you I wanted to “save” bandwidth, not flush it down the toilet. Well, I can tell you it’s surprisingly (and as some folks have said, “disturbingly”) cheap to rasterize a G-buffer with MSAA vs. with no AA. I don’t know half the special sauce the vendors use to make this so, but (to your bandwidth point) there’s this thing called MSAA bandwidth compression which from what I gather has become pretty standard in GPU tech for some time. If you’re rasterizing a pixel where all the subsamples get the same value (e.g. triangle interior – the typically case for most pixels with the frag shader running per-pixel), then only one sample’s worth of data needs to get shipped to GPU memory (even though there may be 4, 8, or 16 samples in that pixel). So you end up with roughly the same bandwidth cost for these pixels as for 1xAA. Except maybe for the depth buffer, which I’m not clear about. Anyway, it’s cheap.

So that addresses your rasterization bandwidth savings. What about readback bandwidth?

Well the nice thing is you have “total” control there. If you only want to read buffer A once per pixel, you can just grab one of the samples in each pixel via:

texelFetch( tex, texcoord, sample_num )

If you want to read buffer B once per sample, then grab all the samples (same method). If you want to read buffer C once for every 2x2 block of pixels, feel free. It’s totally up to you. If you want to prefilter a buffer down 2x2 res or 4x4 after rasterizing before readback, by all means, go for it!

BionicBytes · May 5, 2011, 6:50am

Interesting as I can switch on msaa on my g buffers at will. When I do I use FramebufferBlit to resolve all colour attachments and depth_stencil too. Rendering has a large performance penalty in this case.

Do you think there’s a price to pay for “resolving” the msaa buffers compared to using multisample textures as attachments?

vkMarco · May 5, 2011, 11:49am

Dark Photon, your technique seems to be really cool. Anyway let me check if i understand.

Suppose we have a screen resolution of 1600x800. To save bandwidth, we implement deferred shading using for now only two ( color and normals ) 4xMSAA textures with a resolution of 800x400 and different formats.
Now we must present it at screen resolution. Do you think that is possible to obtain same quality as if i render and rasterize directly into the framebuffer?

I tried it and the results was not so good. I make a simple shading calculation for each sample and incremental sum the color. At the end, i average the results. I don’t know exactly what happens when you access an msaa texture with a higher target resolution. Bilinear Interpolation between samples?

BionicBytes i want just to confirm you the vary bad performance of FramebufferBlit with msaa renderbuffer. I don’t know why but it should be the other. I always heard that renderbuffer is better than texture because the gpu has more chance to make omptimization. My experience said to me that is not so.

vkMarco · May 5, 2011, 2:54pm

Hi,

i have studied a lot :).

MSAA is good for edge but, for internal part of a primitive, each sub-sample will be evaluated with the same color/texture value. So rasterizing a 4xMSAA with 800x400 will be very lower quality than 1xAA 1600x800.

Given that, suppose we want to render 4xMSAA with 800x400 to a 1xAA 1600x800 framebuffer: again texelFetch( tex, texcoord, sample_num ) want absolute texel coordinate so there will not be any automatic bilinear interpolation ( 2x2 target pixel block will have same value ).

We are back again to my initial question. With different size i mean mantain full qality where i need and less quality where not. I realize that it will be difficult to obtain that with MRT cause it requires a different rasterisation process for each different resolution.

Maybe layered rendering with different viewport will be the way but for now i know that each layer have to have the same size and format. Furthermore, with layered rendering, you have to active a geometry shader for all your scene and this will not be very performance friendly.

Alfonse_Reinheart · May 5, 2011, 3:20pm

I know glViewportArray and i tried it with layered rendering but it is a different thing. Anyway with layered rendering you need to provide a 3D texture or an array texture with same size and same format for each layer.

The reason for a different size is saving bandwidth in a deferred shading or for any technique where you want to send multiple fragment output but different resolution.

The viewport array stuff should work fine. Yes, the resolution of the actual arrays will be larger than you need, but just because there’s texture data there doesn’t mean you have to render to it or read from it.

If your intent is to minimize bandwidth by determining that some areas of a scene need lower resolution than others, then the viewport array approach is probably your best bet.

Dark_Photon · May 6, 2011, 7:07pm

I wouldn’t do that. Because it’s gonna give you artifacts and poor quality. Besides if you stop and think about it, it doesn’t make sense to lerp (or even slerp) normals, depth values, albedoes, specular exponents, and etc. before lighting, particularly when they’re on different faces.

Instead you want to light the samples individually to get a color (radiance) and then blend the results.

Now if you do this brute force, you’re basically doing supersampled (SSAA) lighting on a multisampled (MSAA) material rasterization. You can do this a few ways. One is ARB_sample_shading, where you tell the GPU to run the frag shader per sample rather than per pixel, generate a multisample target, and then resolve that. But even that’s extra write and resolve (read+write) bandwidth that you don’t really need, even for SSAA lighting. So…

Another approach (which I believe Killzone 2 used years ago, among others) is to run the frag shader per pixel (as is typical) have each frag shader sample all the samples in the corresponding G-buffer pixels, light them individually, blend them together, and write a single per-pixel color (radiance) to the lighting render target. Less GPU memory bandwidth. And faster than you might think…

But if you want it even faster, there are some approaches out there to try and get the MSAA lighting (i.e. per-pixel lighting) speed-up with Deferred Rendering. One is to mark the “edge pixels” through various methods (from looking at the depth and normal G-buffer channels, or centroid trick, or …) and then do two passes: Pass 1 does per-pixel lighting on non-edge pixels, and Pass 2 does per-sample lighting on the edge pixels. This can save a good bit of time…

…but you end up with inefficient packing of the edge pixels into the GPU’s shader cores, which are optimized for work to be done in nice, adjacent “blocks”. So some (Andrew Lauritzen last year in SIGGRAPH for instance) repack those “edge pixels” into GPU threads for better efficiency (using a compute kernel IIRC).

Of course, if you want to handle edge AA via other means (blur, MLAA, SRAA, FXAA, etcetc.) you can do that too, each with their own pros and cons in quality and speed.