Hi,
Many thanks to those who are interested with “Parallel-Split Shadow Maps”.
For your convenience, I updated the project webpage for PSSMs, you may get all related information from
http://www.cse.cuhk.edu.hk/~fzhang/pssm_project

By the way, I added some brief analysis for partioninig shadow map algorithms (like PSSMs, CSMs …) on my project webpage. Here I want to address two highlights in my PSSMs in comparison with other partitioning algorithms,

----------------------you may see more details on my webpage---------------
Question: What are the differences between PSSM and CSM (Cascaded Shadow Maps)?

Answer: Everything doesn’t come from nowhere. PSSMs are not an exception as well. The idea of using multiple shadow maps was introduced in Tadamura et al. (“Rendering optimal solar shadows with plural sunlight depth buffers”) 2001 and further studied in Lloyd et al. ( "Warping and Partitioning for Low Error Shadow Maps ") 2006, and it was also implemented as cascaded shadow mapping in Futuremark’s benchmark application 3DMark 2006.
PSSMs better handle the following two major problems in all these algorithms:

1. How to determine the split positions?
For this issue, we proposed the practical split scheme to achieve a better tradeoff between theory and practice. See our paper for more information. Surely, the split positions also can be sometimes pre-computed or manually adjusted. Let me share you my personal experience of studying an “practical” split scheme during the past. When I first saw the nice papers “light space perspective shadow maps (ESGR’04)”, I thought the logarithmic split scheme might be the best choice. This split scheme has been implemented in “Warping and Partitioning for Low Error Shadow Maps (ESGR’06)” and “logarithmic shadow maps (Sketch paper in SIGGRAPH’06)”. However, as explained in our paper, it might not be very good to EXACTLY simulate this split scheme on discrete buffers to produce the theoretically even distribution of perspective aliasing over the whole depth range. The logarithmic split scheme usually results in an “over-strong” split effect in practice. For example, for n=1 and f=1000 in PSSM(3), the first split part only occupies the first 1% of the depth range! I know this conclusion might be a little bit “subjective”, I strongly recommend you to try all the three splits schemes (uniform, logarithmic, practical) to get your own conclusion. In general, the practical split scheme is more flexible for most cases.

2. How to alleviate the performance drop caused by multiple rendering passes?
For this issue, in our Gems 3 paper, we thoroughly discussed this issue. For the split scheme PSSM(m) (the frustum is split into m parts), the number of rendering passes for 1) without hardware-acceleration, 2) with DX9-level HW-accel. and 3) with DX10-level HW-accel. are 2m, m+1 and 1+1 respectively. In particular, in comparison with the standard shadow mapping approach, we reduce ALL extra rendering passes in our DX10 implementation. For more details, see the upcoming book GPU Gems 3 and the accompanying source codes.

Any suggestion will be appreciated. Thanks.

Best Regards,
Fan Zhang

I can’t help thinking that we’re all missing a trick with shadows. Every solution has its problems and shortcomings, and they all seem overly expensive and complicated.
Come on guys, think! There must be a better solution than this.

There must be a better solution than this.
Ray tracing. It’s the simplest solution for everything.

It’s also the least performant.

Thank you, FanZhang, very interesting overview.

By the way, I was always thinking, why don’t you use some perspective tricks like TSM in each split. You don’t want to loose double-speed Z? Or what?

Don’t listen to all these envious persons, who can’t invent something worthwhile by themselves.

i was by no means trying to detract from fanzhang’s achievements - I’m grateful for the paper. It’s just that every time I see something like this it just makes me wonder that we’re missing some obvious shortcut.

I think, that if it had been some proper solution, which would have handled all the cases - then somebody would find it and tell us ))

We use PSSM and we are quite satisfied with it. Sure, it has some drawbacks, but it is simple enough and has no problems with offsets, as all perspective-transform solutions.

Hi, everyone here, especially Jackis:), I really appreciate your comments. A few explanations are listed below, hope they answer most of your questions.

1. PSSM + Warping Algorithms (e.g. PSM, LiSPSM, TSM).
Actually in my paper, I already integrated PSSM with other warping algorithms. Personally I would say, PSSM+LiSPSM seems to be a better combinition. You may refer to the part “Question: In which directions, we can further improve PSSMs?” at my webpage for more discussions.

2. PSSM+Filtering Techniques (e.g. PCF or VSM)
Unlike warping algorithms, PSSM won’t worsen the offset problem (i.e. incorrect self-shadowing). Even more, combined with PCF or VSM, we can fastly produce fake soft shadows. Surely, the soft shadows might be only visually-plausible, but they are suitable for most real-time 3D games.

3. Performance Issue.
Personally, I would say, this is not a problem at all on current GPUs. Even without DX9 or DX10 hw-accel. the rendering speed is fast enough in most cases. Please refer to the part “Games/Projects/Engines using PSSMs” at my webpage for more details.

PSSMs give us a flexible FRAMEWORK which can be integrated with other shadow rendering algorithms. I will appreciate anyone who can help me to further improve this framework. If any question, please feel free to send me email at fzhang@cse.cuhk.edu.hk.

Thanks a lot.

Best Regards,
Fan Zhang

Ok, FanZhang, I see.

One little question - why LiSPSM, not TSM? When we implemented it, we saw that TSM has much more control about near/far quality comparing to LiSPSM. Agrhh, actually, they are all the same, trying to get most “tight-fit” quadliteral to project on near plane, and I think TSM is a little bit more close to the nature of things.

Actually, when we did very first try in PSSM, we combined it with TSM in each split. We had to deal with fragment Z-replace to get proper biasing, so we lost double-speed on nVidia (which is actually not double, but about 1.5 .

The quality was quite the same between PSSM-TSM 2 1K1K splits and PSSM-SM 3 1K1K splits, and we decided not to deal with PSSM-TSM and stood at simple shadowmapping because of it’s simplicity and constant offset. You’re absolutely right about almost free render passes - when everything is from simple static geometry render-calls, so we can do lots of render passes without hurting perfomance.

Waiting for your GEMS chapter to know, how did you get N passes by only one on G80 )) Might be interesting!

Hi, Jackis, here are my answers.

1. TSM v.s. LiSPSM.
First of all, I don’t wanna get involved into a debate for which one is better. The explaination here is just my personal experience. Actually through my thoroughly theoretical research on warping algorithms, I can definitely explain why TSM usually (not always) produce better shadow quality than LiSPSM, although personally I really love the theory in LiSPSM. However, my research has not gotten published yet. Now what I can tell you is that, in some cases, TSMs might produce better shadow quality. but in comparison with the “stupid patent” of TSM (I really hate this behavior), LiSPSM should be the best choice for us, especially for commercial appliactions. Frankly, the major reason for not using TSM is that stupid patent held by TSM authors. They even claimed that the “trapezoidal approximation of the view frustum as seen from the light” is part of the patent. If every one filed everything (including trival things) as a patent, what can we do?
1. Would you please explain the term “double-speed Z”? i’m not very sure what u mean.

2. I really wanna disclose all details for how to reduce extra rendering passes here, but due to the copyright issue, i can’t talk much right now. I already sent a request to NVidia, to get permission to post the paper and codes at my website. Unfortunately, they didn’t reply so far. So what I can tell you is the key point is using the Geometry Shader. Once I get the permission from NVidia, I will definitely post the materials at my webpage. Thanks for your consideration!

Originally posted by Jackis:
Waiting for your GEMS chapter to know, how did you get N passes by only one on G80 )) Might be interesting!
The DX10 hw supports multiple render targets. When geometry shader on DX10 hardware generates primitive (e.g. triangle) it can specify render target to which will be that primitive rasterized. So possible implementation is that PSSM levels are bound to individual render targets and the geometry shader classifies each input triangle against ranges for individual levels and generates output triangle for each render target that needs it. This way all shadowmap levels can be rendered in one pass.

very slowly. that sounds like it would be slower than multiple passes.

Originally posted by knackered:
very slowly. that sounds like it would be slower than multiple passes.
It depends on how good the hw implementation is.
I do not have G80 so I can not test that. There is also capability to specify more viewports (and select from them in GS) so it might be faster to render into single render target using more viewports (UPDATE: I have not found equivalent of this DX10 functionality within the G80 extensions yet).

Originally posted by Komat:
When geometry shader on DX10 hardware generates primitive (e.g. triangle) it can specify render target to which will be that primitive rasterized.
Didn’t know about that! Well, I had no time to experiment with all these new extensions, and I didn’t know about specific render-target choice, thank you for the info! If it is so, so the solution is quite evident ))

Originally posted by FanZhang:
They even claimed that the “trapezoidal approximation of the view frustum as seen from the light” is part of the patent. If every one filed everything (including trival things) as a patent, what can we do?
Well, I see, I also didn’t know about that patent. By the way, when we implemented it, we improved their “trapezoidal approximation” by a much more general way, using quadliteral-to-nearplane method by Ilie and somebody else, don’t remember, so we didn’t used their “cool patented scheme” )) I think, that patenting such a things is not a good way at all. Why don’t they patent breathing, walking etc. ?? Now I understand, why you used LiSPSM.

Originally posted by FanZhang:
Would you please explain the term “double-speed Z”? i’m not very sure what u mean
I mean, that on nVidia hardware (for ATi - don’t know) rendering depth-only or stencil-only (with disabled colormask, alpha-test or frag-kills, and disabled depth replace) goes for a double-speed. Really it’s not double, but 1.5 times or better boost is guaranteed.