A conceptually simple(r) way to derive exponential shadow maps + sample code

A few months ago, while working on an improved version of exponential shadow maps, I stumbled on a new way to derive ESM equations which looks more simple and intuitive than previous attempts.

There is no need to invoke Markov’s inequality, higher order moments or convolutions. In fact all we have to do is to write the basic percentage closer filtering formula for n equally weighted occluders o_i and a receiverr

\displaystyle\frac{1}{n}\sum_{i=1}^{n}H(o_i-r)

The role of the step function H(x) is to perform a depth test on all occluders, depth test results are then averaged together to obtain a filtered occlusion term. The are many ways to write H(x) and a limit of exponential functions guarantees a fast convergence:

\displaystyle H(o_i-r) = \lim_{k \to +\infty} \frac{e^{ko_i}}{e^{ko_i}+e^{kr}}

We can rewrite the original PCF equation as:

\begin{array}{ccc} \displaystyle\frac{1}{n}\sum_{i=1}^{n}H(o_i-r)&=&\displaystyle\frac{1}{n}\sum_{i=1}^{n}\lim_{k \to +\infty} \frac{e^{ko_i}}{e^{ko_i}+e^{kr}} \\ &=&\displaystyle\lim_{k \to +\infty}\frac{1}{ne^{kr}}\sum_{i=1}^{n}\frac{e^{ko_i}}{e^{k(o_i - r)}+1} \end{array}

If we make the hypothesis that our shadow receiver is planar within the filtering window we are also implicitly assuming that the receiver is the most distant occluder (otherwise it might occlude itself, which can’t happen given our initial hypothesis), thus we have r > o_i.
Armed with this new assumption we observe that the term e^{k(o_i - r)} quickly converges to zero for all occluders:

\begin{array}{ccc} \displaystyle\lim_{k \to +\infty}\frac{1}{ne^{kr}}\sum_{i=1}^{n}\frac{e^{ko_i}}{e^{k(o_i - r)}+1} &\approx&\displaystyle\lim_{k \to +\infty}\frac{1}{ne^{kr}}\sum_{i=1}^{n}e^{ko_i} \\ &\equiv&\displaystyle\lim_{k \to +\infty}\frac{E[e^{ko}]}{e^{kr}} \\ \end{array}

As we already know k controls the sharpness of our step function approximation and can be used to fake soft shadows. Ultimately we can drop the limit and we obtain the ESM occlusion term formula:

\displaystyle \frac{E[e^{ko}]}{e^{kr}}

Exponential shadow maps can be seen as a very good approximation of a PCF filter when all the occluders are located in front of our receiver (no receiver self shadowing within the filtering window). There’s not much else to add, if not that this new derivation clearly shows the limits of this technique and that any future improvements will necessarily be based on a relaxed version of the planar receiver hypothesis.

For unknown reasons some old and buggy ESM test code was distributed with ShaderX6. You can grab here the FxComposer 2.0 sample code that was meant to be originally released with the book.

Another brief update: GDC08 and ShaderX6

How do you feel right before giving the first lecture of your life? The answer is: not well!

I felt so tense that I forgot half of what I wanted to say, and even though I was probably speaking a bit too fast I believe my lecture at GDC went quite well. The room was packed with people, despite the fact that was the last talk of the day. At the end of the presentation I was asked a fair number of pertinent questions, and I was kind of surprised to realize that some didn’t believe that such extremely simple approaches could possibly work!

For all those that couldn’t come be there Wolfgang Engel (organizer of the Core Techniques & Algorithms in Shader Programming day at GDC) has collected all the presentations in one handy page. You can also download my presentation about some new and exotic shadow mapping filtering schemes directly from this link. Comments were added to the most obscure slides and as usual feel free to ask questions, comments, point out errors, etc.. on this blog.

This talk was also represented the first occasion to publicly introduce Exponential Shadow Maps. The main idea has been going around for a while now (and it seems some highly anticipated game will soon ship with it..) but it wasn’t possible to fully explain the algorithm and why it works (or doesn’t work, in some case:) ) due the limited amount of time available.

On the other hand ShaderX6 is now widely available and you will be able to find in it a long and detailed article about ESM (and some sample code), among many other interesting contributions.

Time for a brief update

Four months ago I tried to stimulate your curiosity with this post.
Next month I will try to stimulate it even more with this talk at the upcoming Game Developers Conference in San Francisco.
A few simple approaches will be presented, though I will focus a bit more on a technique called ESM (short for Exponential Shadow Maps) which is also the subject of an article that will appear on ShaderX 6.
That’s all right now, hope to see you at GDC!

Fast Percentage Closer Filtering on Deferred Cascaded Shadow Maps

Here’s a nice trick for whoever has implemented (as a single pass that operates in screen space) deferred cascaded/parallel split shadow maps on hardware that does not allow to index texture samplers associated with different view space splits.

One way to address this issue is to store multiple shadow maps (usually one per split) into a single depth texture, then the correct shadow map can be sampled computing (per pixel!) all the possible texturing coordinates per each split and selecting the right one through predication. Another method quite similar to the first one involves removing the predication step and replacing it with dynamic branches. This can end up being faster the the predication method on some hardware, especially if we have to select the correct sampling coordinates among many splits.

But what if we want to take a variable amount of PCF samples per split without using dynamic branching? (I love DB but it’s not exactly fast on every hardware out there, is up to you decide when it’s a good idea to use it or not).
It’s indeed possible to take a dynamic number of samples per pixel using an hardware feature that was initially introduced by NVIDIA to accelerate…stencil shadows! (ironic, isn’t it?)

I am talking about the depth bounds test which is a ‘new’ kind of depth test that does not involve the depth value of the incoming fragment but the depth value written in the depth buffer at the screen space coordinates determined by the incoming fragment. This depth value is checked against some min and max depth values; when it’s not contained within a depth region the incoming fragment gets discarded. Setting the depth min and max values around a shadow map split is a easy way to (early!) reject all those pixels that don’t fall within a certain depth interval. At this point we don’t need to compute multiple texture coordinates per pixel anymore, we can directly evaluate the sampling coordinates that map a pixel onto the shadow map associated with the current split and take a certain number of PCFed samples from it.

Multiple rendering passes are obviously needed (one per split) to generate an occlusion term for the whole image but this is not generally a big problem cause the depth bounds test can happen before our pixel shader is evaluated and multiple pixels can be early rejected per clock via hierarchical z-culling (this approach won’t be fast if our image contains a lot of alpha tested geometry as this kind of geometry doesn’t not generate occluders in the on chip hi-z representation forcing the hardware to evaluate the pixel shader and eventually to reject a shaded pixel).

The multipass approach can be a win cause we can now use a different shader per shadow map split making possible to take a variable number of samples per split: typically we want to take more samples for pixels that are closer to the camera and less samples for distant objects. Another indirect advantage of this technique is the improved texture cache usage as the GPU won’t jump from a shadow map to another shadow map anymore. In fact with the original method each pixel can map anywhere in our big shadow map, while going multipass will force the GPU to sample locations within the same shadow map/parallel split.

I like this trick cause even though it doesn’t work on every GPU out there it puts to some use hardware that was designed to accelerate a completely different shadowing technique. Hope you enjoyed this post and as usual comments, ideas and constructive critics are welcome!