Shadow volumes sooo slow...


Shadow volumes eat so much fillrate. If I disable them (just not drawing them) I get framerates that are sometimes even 5 times higher! And I have already basically optimized my engine with portal culling so that a brush only casts a shadow when it is really lighted by a light-source. But shadow volumes still make that enormous difference in framerates…

At the moment, I’m using the infinte shadow volume technique described in the NVidia paper. Probably this is even taking more fillrate than the conventional “Carmack’s Reverse” approach. Does anyone know if there is a way to optimize the shadow volumes so that they are less fillrate-intensive? Or how are you solving this huge performance problem in your engines?

Thanks in advance

Are you projecting the volume or just the silhouette ?
If you’re using Carmack’s reverse, you HAVE to project the volume. If you’re not using Carmack’s reverse, then you can save some fillrate by projecting only silhouettes.

Another pretty straight-forward optimization is to project volumes only if it is useful. That is, if you can predict than an object “A” won’t be able to occlude light for any visible object in the viewing frustum, then don’t project the silhouettes/volume for that object “A”.

Another thing that helps fillrate is to use the scissor function. This has been discussed a bit on another thread about that nvidia paper on shadow volumes. Carmack is even using scissor clipping for the shadow volumes. I havn’t tried it myself yet, but I plan to soon.


You could try using some sort of constructive solid geometry to merge overlapping shadow volumes into one volume before rendering it.

I haven’t thought about it too much, so I don’t know if this could actually work, but you may be able to do the CSG in 2D screen space to simplify the problem.

– Zeno

Zeno : CSG would probably use the stencil which is already used by shadow volume algorithm, so you CSG+Shadow may be a bit messy to implement because you need to share the stencil buffer. I think CSG doesn’t help much there. It would be at best as fast as current technique, and could be at worst very slow.

LaBasX2 : I assume that you are rendering into the stencil buffer with turning lighting off, and textures off, etc. But do you send texture coordinates and normals ? I hope not : you should only call glVertex while rendering your black silhouettes/volumes into the stencil buffer. Each call to glNormal, glTexCoord, glColor and glFogCoord (and other instructions like that…) should not be called for drawing black silhouettes/volumes.

Also you should set glShadeModel(GL_FLAT).

What do you disable ? lighting ? texturing (for ALL texture units) ? fog ?

You should use scissoring, with some tweaks it gave me a 3x speedup in average cases.
You could also try to use Beamtrees or something else so that you can check for polygons that are in other polygon’s shadows.
(Note that beamtrees are a lot of work and you almost certainly need a bsp for it since you have to insert front to back)

Thanks for your help so far.

I will try using the scissor box. I hope that this will also help a bit in my case. Beamtrees and csg sound also interesting but since I don’t have a bsp-tree it will become hard to realize.

Nearly everything is disabled while drawing the volume faces (including color buffer writing). Also I’m only sending the vertex positions to the card (even using vertex arrays). I rather think that rendering “infinite” faces costs more fillrate than rendering manually stretched faces. But I have to check this out by implementing the conventional Carmack’s Reverse again and looking at the performance.


Are you projecting volumes or only silhouettes ?

I’m projecting volumes since I’m using the NVidia version of Carmack’s Reverse.

Could somebody point me to somewhere (or explain firsthand) how scissoring would help shadow volume rendering?

I don’t doubt it does, but I’d like to understand why (and I can’t figure it out in my head).



LaBasX2 : if you only projected silhouettes then you would save some fillrate. But that depends on what algorithm you use to reduce/eliminate artefacts of near/far plane intersection.

Mezz : for each shadowing object, if you know what objects will be shadowed, then you can bound the region where shadow occurs by a rectangle. Use this rectangle as scissor, so that OpenGL will eliminate sooner (in the per-fragment operation pipeline) pixels whose coordinates lie out of the scissor.

At worst, scissor testing willl be as slow as if scissor was disabled. At best it will speed up significantly your shadow algorithm. That is for GPU performance.
About CPU performance, you have to perform a few operations to detect which objects shadow which other objects.

You can do something similar with clipping planes, but I think clipping planes eat too much compared to scissor since it’s much easier for the GPU to optimize scissor testing than 3D clipping.

Another optimization could be to cache shadow volumes when both the occluder and the light are static. This is done at load time, once for all…
However, in modern 3D engines, everything’s becoming dynamic.


deepmind : you’re right, but unfortunately that won’t help much the fillrate. Anyway this may help the fillrate , in a special case : imagine that a static object A occludes the light of another static object B from a static light, then instead of projecting two shadow volumes for A and B, you can merge the volumes using CSG and then only draw one shadow volume AB. IMO it’s the only case CSG could be useful (I mean, fast enough), and to be honest this case should be pretty rare.

as far as i know, the idea of the beamtree is actually removing the need for a csg afterwards by merging them at generationtime… am i wrong or right?

Csg would probably really help but I think it is only suitable for static geometry where it can be precalculated. At the moment in my engine nearly everything is dynamic but I fear that I will have to give up that philosophy since performance is just too bad for the shadows.

What if I would not try to make the volumes infinite but only give them the size of the light’s radius? But I think that the scissor box already has the same effect, right?

limiting the light’s radius obviously improves fillrate. That’s what Carmack is using in his new Doom engine.
I don’t think it is related to scissor testing, even though both optimizations work on influence area/volume.

If you really want to save some fill then you really need to use beamtrees. It’s important not to try and make everything dynamic by default. Have a static and dynamic part for each light ( a dynamic part is required for static lights too, for things that animate/move ). Scissoring will save you quite a bit of fill as pointet out by Pentagram ( using infinite volumes for multiple lights is only practical in conjunction with scissoring and the optimizations mentioned above ).

But, having lights that are static is really not that bad. Most lights in the real world don’t move around and you might aswell take advantage of that . If you decide to move a “pre-compiled” light, just discard the static volumes and treat is as completely dynamic ( and possibly re-build the static part if/when the light won’t move for a while - split this work up over a few frames or simply leave it out ).

You can use the beam trees to cull animating models ( and of course other surfaces ) and thus avoid computing a shadow for it.

Remember that if you don’t use infinite volumes you’ll need to clip the shadow volumes for shadowing to work in all possible cases ( this has been discussed a few times on this board ).

When an occluder is outside the pyramid defined by the corners of the image plane and the light source, you know its shadow volume cannot be clipped by the near plane, so you can render the shadow volume of that object with in “zpass” mode (which only requires the extruded silhouette polygons – not the end caps).

This is a simple test that can be done on a bounding volume of each occluder, and it definitely reduces fill consumption.

Scissor is another Good Thing to use when you know you can crop the region of possible illumination.

Mark Kilgard and I are seriously considering a follow-up paper that focuses totally on optimization issues for stenciled shadow volumes.

                        Thanks -



A follow up paper sounds like a good idea . Unfortunately, the best optimizations require a lot of work and depends on what you’re doing ( and these are at the scene-graph level ). Take a deformable model, this is the absolute worst case ( I think the optimization you mentioned + scissor is the only thing that’ll reduce fill consumption for that case ).

I spent a short amount of time messing around with nvidia’s shadow volume paper. I implemented it in my own short demo. I have 2 dynamic models that animate and everything else is completely static. THe lights are static too. THe 2 animating characters take as long as the entire rest of the scene. This is simply a result of having to compute the 2 characters shadow volumes ever frame and the entire environment is only calculated once. Its a bummer. Things that are truely dynamic means just that. Anything can change and anything goes which means nothing can be precomputed.