MSAA performance in deferred shading

I have implemented MSAA in deferred shading by calculate lighting at every sample of the pixel and then average it,something like this

vec4 pixelIntenisty = vec4(0.0,0.0,0.0,0.0);
for(int i=0;i<MSAACount;i++){
   float depth = texelFetch(depthTextureMSAA,texCoord,i);
   vec4 diffuse = texelFetch(diffuseTextureMSAA,texCoord,i);
   vec4 specular = texelFetch(specularTextureMSAA,texCoord,i);
   vec4 normal = textelFetch(normalTextureMSAA,texCoord,i);
   //calculate lighting
   pixelIntenisty += calculateLighting(......);
//average the pixel intensity
pixelIntensity = pixelIntensity/MSAACount;

while this give a good result ,the performane is totally killed.

so , I am thinking about doing antialiasing only at the edge pixel but that will also invole some kind of edge detection and stencil operation to mark the pixel(which might make thing even more slower).

Is there any other approch (in OpenGL 3.2-3.3 compatible way) to handle real antialiasing in deferred shader ?


Texel fetches are slow, but they can be executed in pararel with other operations (math), try mixing texture access with computation instructions.

Can you give some short example ?

I dont understand the “mixing texture access with computation instruction”.

Is it something like this ?

in stead of

float depth = texelFetch(depthTextureMSAA,,i).r;
pos.z = - planes.y / (planes.x + depth);
pos.xy = intViewVector.xy / intViewVector.z*pos.z;

do this

pos.z = - planes.y / (planes.x + texelFetch(depthTextureMSAA,,i).r);
pos.xy = intViewVector.xy / intViewVector.z*pos.z;


No, I meant,

instead of:

diffuse = texelFetch(...);
specular = textelFetch(...);
somethingelse = texelFetch(...);


diffuse = texelFetch(...);
(do some math here, but do not use 'diffuse' variable)
specular = texelFetch(...);
(math again, assume that 'diffuse' is ready, but do not use 'specular')

For me this trick boosted performance about 1.5x (I had 14 texelFetches in shader)

Thank a lot , will definately try this.

Is this trick also work with standard texture2D ? or the function is already fast enought ?

AFAIK all texturing functions are slow, but I’m not sure.
It is definitely worth testing :).

*** UPDATE ***

Look like trying to re-arrange texleFetch call doesn’t help much in my code.
May be its because the lighting code is so simple that it doesn’t matter when to call texelFetch.

Try running my unoptimized MSAA deferred shading code on my brother’s NVIDIA 9800GT give me a surprise.

when not using AA the result are roughly the same as my HD4670
but when using MSAA the performance are so much better than my 4670.

this is a result of rendering a scene with following configuration

  • deferred shade background (22374 triangles)

  • 2 forward rendering (smoothstep-cellshaded+hemisphere+normalmap+specularmap)
    animated character(using skylight only) (13433*2 = 26866 triangles)

  • 1 hemisphere sky light with VSM shadowmap

  • 10 pointlight (randomly regenerate every frame , each with approx 10-unit/metre radius compare to character size)

  • simple water surface with refraction/DUDV map

  • using bloom/tone mapping

  • using camera/rigid-body motion blur (NOAA only)

            9800gt  |  HD4670

NOAA 78 fps 68 fps
2xMSAA 67 fps 16 fps
4xMSAA 50 fps 9 fps

If texelFetch performance on both NVIDIA and ATI are the same,It look like even
nonexpensive (once) highend lastgen card can handle MSAA deferred shading very well.

Would like to try this on a newer ATI card.
Can someone suggest me ATI card that equivalant to 9800gt.

hope this help other people who want to try implementing MSAA deferred shading.

Could you test this on your RHD4670:

float _depths[MSAACount];
vec4 _diffuses[MSAACount];
vec4 _speculars[MSAACount];
vec4 _normals[MSAACount];

for(int i=0;i<MSAACount;i++)_depths[i] = texelFetch(depthTextureMSAA,texCoord,i);
for(int i=0;i<MSAACount;i++)_diffuses[i] = texelFetch(diffuseTextureMSAA,texCoord,i);
for(int i=0;i<MSAACount;i++)_speculars[i] = texelFetch(specularTextureMSAA,texCoord,i);
for(int i=0;i<MSAACount;i++)_normals[i] = textelFetch(normalTextureMSAA,texCoord,i);

for(int i=0;i<MSAACount;i++){
   float depth = _depths[i];
   vec4 diffuse = _diffuses[i];
   vec4 specular = _speculars[i];
   vec4 normal = _normals[i];
   //calculate lighting

I (wildly) speculate that on GeForce, the subpixels are different layers (something like texture_array_2d); while on Radeons they are contiguous in VRAM. Basing my speculations on notes from many sources that the cards handle/manage MSAA completely differently; and having met the same nice performance puzzles on GeForces.

As a side note, the above code IME decreases performance on GeForces. (reason: decreases max possible # of warps, by using too many registers, and doesn’t interleave ALU and TEX operations)

On the GF9800== RHD??? ,

HD3870 X2, HD4850, HD5750


Thus, HD5750 is most recommended.

thank for your comments :slight_smile:

using the above coding styles show the same performance as the old coding (~17 fps at 2XMSAA , ~10 fps at 4XMSAA).

Still have’t test this on my brother’s geforce 9800gt (its 1:32 AM here :sleeping: ).