Off screen particles rendering problems

Hi everyone.

I faced a fillrate bottleneck in my terrain rendering engine when I began to render weather effects and especially when flying through clouds.

Then, I’ve implemented the technique described in this excellent article of GPU Gems 3 :

Here’s the way I did it in pseudocode:

Step 1 : Draw scenery in a multisampled FBO that we will name M

Step 2 : Blit that FBO to a FBO S to get depth and color textures

Step 3 : Draw low res clouds in a downsized FBO P, taking S depth texture as shader uniform

Step 4 : Do an edge detection pass in a downsized FBO E, taking P color texture as shader uniform (and applying the sobel filter to that texture's alpha)

Step 5 : Do a composition pass in FBO S : 
        - mixing scene color with low res clouds color where there is no edges and setting stencil to 1, 
        - discarding pixel otherwise so that the stencil buffer is 0 where edges are detected

Step 6 : Draw full res clouds in FBO S only where stencil is 0 (edges)

Step 7 : use S color texture in a last pass drawing into the default framebuffer

The result is pretty much what it should be but I now have 2 problems to solve in order to make it worth it :

1 ) The step 6 is still killing my framerate as if I was drawing all my clouds in full resolution.
The weird thing is that the amount of edge detected has no impact on the framerate.
Even if I disable the edge detection so that the stencil buffer is filled with 1 and no modification should be applied to the color buffer, the framerate is really low.
I’m inexperienced with the stencil buffer so I probably made a mistake in this part of my code or misunderstood this part of the technique.

Here’s how I set the stencil buffer for step 7 :

    //composite main scene with low res particles except on edges to avoid artifacts
    glViewport(0, 0, m_SceneRT.getWidth(), m_SceneRT.getHeight());
    glStencilFunc(GL_ALWAYS, 1, 0xFF); //set stencil buffer to 1 unless pixel is discarded (edge)
    glStencilOp(GL_KEEP, GL_KEEP, GL_REPLACE);
    //redraw high res clouds on edges
    glStencilFunc(GL_EQUAL, 1, 0); // Pass test if stencil value is 0 (edge)
    glStencilMask(0x00); // Don't write anything to stencil buffer

  1. I loose the benefit of drawing my clouds into a multisampled buffer. Is it worth it to redo steps 1 and 2 (draw my high resolution edge clouds in a multisampled buffer and blit the result to a common FBO to get texture) ?
    I know this is the old way and it is possible to attach multisampled textures and then do all this pipeline with multisampled FBO but isn’t it too bad for performances ? (after all, maybe it can’t be worse than doing several blit…)

Can you help me on this ?
Thank you.