Stencil pass for deferred lighting

Leadwerks · May 31, 2008, 2:22pm

I want to try performing a stencil pass prior to each lighting pass. The stencil pass shader would figure out whether each fragment was affected by the light, and be used to discard fragments from the lighting pass, before the lighting fragment program even gets run. I don’t know if this will make any difference on shader model 4 hardware, but it will probably be an improvement on SM3 cards.

Can someone explain the basics of setting up the stencil pass?

Sunray · June 1, 2008, 7:37am

I guess you want to find all pixels that are intersecting a convex volume? If so, this is the same problem as stencil shadow volumes. Look up z-fail shadow volumes.

Leadwerks · June 1, 2008, 1:15pm

No, actually I just want to read a depth buffer and calculate the intersection myself. That is not a problem. I just don’t know how to use the stencil buffer to discard fragments.

sqrt_1 · June 1, 2008, 7:58pm

You have to use a stencil shadow volumes technique as a pre-pass to flag the areas hit by the light. I use it in my light indexed deferred rendering demo if you want example code:

http://code.google.com/p/lightindexed-deferredrender/

(Nvidia cards use a different code path using the depth range extension)

HellRaiZer · June 2, 2008, 12:27pm

No, actually I just want to read a depth buffer and calculate the intersection myself. That is not a problem. I just don’t know how to use the stencil buffer to discard fragments.

There is no need to do the intersections yourself if you are going to use the stencil buffer. You should do only one of the two. Either perform the intersections manually (in a shader) and output the result to a texture, which you will later use as a mask to discard fragments in the lighting shader (with a KIL command), or mark the pixels affected by the light volume on the stencil buffer and enable stencil testing when rendering the lighting pass.

Here are the basic steps for the stencil method (what Sunray and sqrt[-1] suggested) :

Clear the stencil buffer to an initial value (e.g. 1)
Set depth func to less and disable depth and color writes
Enable stencil testing, set stencil func to always and configure stencil ops for front and back faces similar to the z-fail algorithm.
3.1) For front faces set the stencil ops to (keep, incr, keep)
3.2) For back faces set the stencil ops to (keep, decr, keep)
Render the light volume (no fancy fragment shader needed; color and depth writes are disabled)
Disable stencil writes and set stencil func to pass if the value on the stencil buffer is equal to 0.

I use the above procedure for the same thing but in a forward renderer. There are different configurations for the stencil ops as well as the initial and final values of the stencil buffer. The key is that you start with a stencil buffer with the same value for every pixel and after rendering the light volume you get the pixels actually affected by it having a different value.

Hope that helps. Check the demo sqrt[-1] posted for code.

HellRaiZer

PS. Forgive my English

Leadwerks · June 2, 2008, 5:35pm

You are suggesting I use one unique full-screen texture for each light onscreen. :\

sqrt_1 · June 2, 2008, 7:21pm

Uh, no? Perhaps I mis-read the above posts - but you do not have to do that. Check my demo.

HellRaiZer · June 2, 2008, 11:09pm

You are suggesting I use one unique full-screen texture for each light onscreen. :\

No, don’t do that! I was afraid this would happen… My english don’t help me sometimes.

What i said (in reply to your previous answer about performing the intersections manually) was that you have two choices:

Either use a unique full-screen texture for each light, which will act as a mask OR
Use the stencil buffer for that!

Clearly no. 1 is a waste of time and resources. Even if you used only one channel of a RGBA8 texture for each light (packing 4 lights into one texture) this would require performing KILs in the lighting shader.

What i suggested was, that you SHOULD use the stencil buffer you already have (no need for extra textures) and let the card do the work for you through stencil testing (no extra work for the lighting shaders). The algorithm i described in my previous post, doesn’t require an extra texture for each light, and (i think) this is what you are looking for. Check the source code from sqrt[-1]'s demo for more details. Also, using the stencil buffer for rejecting unaffected pixels, has the advantage that the lighting shader stays the same as your current implementation (in contrast with option no. 1 which requires modifications). So you can switch stencil buffer masking on/off depending on hardware, if you like.

Remember that even if you use the stencil buffer in order to reject unaffected pixels, stencil testing (usually) comes after the fragment shader. So in order to minimize the work for each light, also use scissoring and depth bounds testing. I didn’t have the chance to check sqrt[-1]'s source code, but i’d imagine he does something similar.

Forgive me for the confusion. I hope it’s more clear now.

HellRaiZer

Leadwerks · June 3, 2008, 1:01am

Okay, it sounds like you know what you are talking about.

But if the stencil discard occurs after the fragment processes, there isn’t much reason to use it.

Xmas · June 3, 2008, 3:34am

The stencil (and depth) test is usually performed before the fragment shader, but there are some conditions that disable early-Z/stencil on some cards.

HellRaiZer · June 3, 2008, 3:54am

There may be a reason for using stencil testing even when it comes after the fragment shader. Blending isn’t performed and you might see a performance improvement (especially if you are rendering to a FP texture). I thought about blending since you are using this for light calculations and you may need multiple passes for all the lights.

You are right. Early-Z and stencil testing can happen before the fragment shader under specific conditions. I’ve read in nVidia’s programming guide about the things disabling early-z (alpha testing, depth writes in the fs, etc.), and i suppose something similar should apply to ATI’s GPUs. But i don’t remember reading something about early stencil testing and what things affects it. Do you have any info on that?

From a little demo i’ve made some weeks ago, i found out that when rendering to an off-screen render target (either FBO or pBuffer) early stencil testing is disabled. I was testing the same technique i described above, and no matter what the format of the texture was (i tried RGB(A)8 and RGB(A)16F with DEPTH24_STENCIL8) there was no performance difference whether i used stencil testing or not. Rendering directly to the window’s framebuffer, otoh, showed a clear performance improvement.

So any info on what’s affecting early stencil culling is much appreciated.

Thanks in advance and Leadwerks, sorry for hijacking your thread.

HellRaiZer

sqrt_1 · June 3, 2008, 5:58am

The typical way of doing this is:

Render light volume geometry in a stencil pass to get what geometry intersects the light volumes.
Render light volume again and only pass where the stencil has hit.

It seems like you are doing a depth compare in your shader when doing a light pass to see if the light hits the depth value. I can see how this can work for spheres (point lights) but it could get tricky and expensive if your light volume is a cone (spotlight)

Leadwerks · June 3, 2008, 1:43pm

Why would a cone make any difference? You are still just rendering a volume.

In the STALKER paper from GPU Gems 2 they talk about a stencil discard, but I don’t even want to try this until I know whether the stencil occurs before or after the fragment processing. I think potential blend savings is completely negligible.

Another option would be to calculate the light pixels in a pre-pass and write out a depth value, simply using 0.0 for affected pixels and 1.0 for not, since we know that depth testing will discard the fragment before it is processed. The bandwidth for the depth buffer would probably be worse than that of the stencil buffer. I am not that familiar with stencil stuff.

NeARAZ · June 3, 2008, 11:07pm

Yes, in most cases stencil is before fragment processing.

The bandwidth for the depth buffer would probably be worse than that of the stencil buffer. I am not that familiar with stencil stuff.

In most cases depth buffer and stencil live together (24 bit depth, 8 bit stencil), so reading one reads the other and vice versa. So there’s exactly the same bandwidth required to do either, and if you’re doing one of them (depth or stencil), then the other comes at zero additional bandwidth.