Fast Multi-Pass Exact Screen-Space Shadows using an Accumulation Buffer, a Stencil and a Z-Buffer ================================================================ Sebastien Loisel, zed@scylla.math.mcgill.ca This algorithm was first presented in a paper in 1991. I'll try to figure out the precise reference in the near future. I attempt to describe a fast shadow algorithm for point light sources using modern graphics acceleration hardware. The down side of this algorithm is that it is multipass, with 1 base pass and two passes per light source. It is also hard to incorporate this with A-buffering methods, forcing one to actually generate full blown high resolution pictures and filtering down to a lower resolution for anti-aliasing. The up side is that this algorithm is linear time in the number of polygons in the scene (which favorably compares with shadow volume methods, usually) and that it does not suffer from aliasing artifacts that are seen in shadow maps in some extreme cases. Each pass of the algorithm executes using O(1) space, namely some minimal space for execution and a frame buffer. That is, the algorithm does not need to store all the polygons thus we are not limited by memory space. All these operations can be performed with a solid implementation of an OpenGL machine. The algorithm does not handle transparent surfaces or light-bending lenses and mirrors. Overview ======== Everything is always drawn from the point of view of the eye (as opposed to the shadow map algorithms). The first pass fills in the Z-buffer with the z value of the closest thing in all the pixels, the RGB values are also generated assuming that the object falls inside the shadow of some other object (eg, only the ambient term is used). In the second pass, each triangle generates a (truncated) shadow polyhedron. We 'draw' each face of the shadow polyhedra and count how many of those shadow faces fall behind the z value for each pixel. If we get an even value, the pixel is in the light, if we get an odd value, the pixel is in the shade. In the third pass, we draw the picture again with all lighting options turned on (diffuse, specular, etc...) except ambient light, which is turned off for it has already been computed but only draw to pixels which have been found not to be in the shadow volumes. The second and third passes are repeated once for each light source and the results for all light sources can be accumulated with the picture from the first pass. The resulting picture will have correct screen-space shadows. First Pass ========== In the first pass, the frame buffer (RGB and Z values) are cleared. Then, the scene is drawn normally with no light except ambient light. Second Pass =========== For this step we need a stencil buffer. We first clear the stencil buffer. We have no need whatsoever of the RGB values generated in the first step, and we need read only access to the z-buffer generated in the first step. The first thing to do is to clear the stencil buffer to, say, all ones. We now need to decide on a L=(x,y,z) light position. Then, for each triangle given by three vertices P1, P2, P3, we generate the truncated shadow pyramid (NOT infinite). Let A be P1-L, B be P2-L and C be P3-L and normalize A,B and C. Then, for some constant k, you get the points Q1=L+kA, Q2=L+kB, Q3=L+kC. The shadow frustrum is then given by the 5 following sides: 1- the triangle P1P2P3 2- the quadrilateral Q1Q2P2P1 3- the quadrilateral Q1Q3P3P1 4- the quadrilateral Q2Q3P3P2 5- the triangle Q1Q2Q3 (the 'cap') Triangle 5 (Q1Q2Q3) is there simply to 'cap' the otherwise infinite shadow volume. We need our shadow volumes to be finite to allow us to check for containment in the shadow volume later on. Unfortunately, for any choice of k it is possible to cook up an example in the unit cube (or sphere or any compact set) where the shadow frustrum above misses some parts of the unit cube that it should not miss. This means that some stuff may be mistakenly lit. A fix is to use an heptahedron instead of the pentahedron above (a heptahedron is a polyhedron with 7 faces). Let V be A+B+C, normalized. Then let R be L+k*V, then the heptahedron is defined by the following faces: 1' through 4' are the same as 1 through 4 above. 5'- Triangle Q1Q2R 6'- Triangle Q1Q3R 7'- Triangle Q2Q3R 5' through 7' are now the cap. Now one can argue that the shadow volume delimited by 1' through 7' restricted to the sphere of radius k*sqrt(3)/3 is correct I think (I need to check this). This may be more reassuring than the previous case but on the other hand it increases time spent performing computations. Furthermore, if the light source can be guaranteed to be some sane distance away from everything else, the pentahedral trick may be made to work by choosing a suitably large k. Note that if A+B+C is of zero norm then this triangle does not project a shadow for the light is in the plane of the triangle. So now we have the shadow volume faces. We need to go through a rasterizing process. For each pixel, the Z check is performed. If the shadow polygon is found to lie behind the Z value computed in step 1, the stencil value of the pixel is negated (or some other sort of toggling operation). No other operations are performed, in particular, the Z values of the z-buffer are not updated. Note here that in a certain sense, we are doing a strange thing, all the work is being done when the Z test fails. Any Z test that passes is simply discarded. At this point, we can invoke the even/odd rule for checking whether a point is inside the shadow volume. For each pixel, we've effectively counted whether a ray coming in from infinity crosses the shadow volumes an even or odd number of times before it reaches the Z value stored in the Z buffer. If even, then the pixel is outside the shadow volume, if odd the pixel is inside the shadow volume. So if the stencil was cleared to 1 initially and we negate that for each shadow plane of interest, a 1 says that there is no shadow and a -1 says that there is a shadow. For numerical reasons, it is probably desirable to instruct the rasterizer to add a small epsilon to the Z values of the shadow volumes. This is because the frontmost faces will generate one of the faces for shadow volumes and both the shadow face and the 'real' face will have the same Z values for the pixels, plus or minus roundoff errors. The roundoff will make the face jump in and out of shadows in an apparently random fashion. Moving all the shadow volumes away a little (adding a small epsilon to their Z values) should settle this problem. This problem can also be eliminated completely with no numerical hack at the cost of a little extra complexity. In the first pass, each pixel is assigned an (even) primitive ID in the stencil. In the second pass, we toggle the low bit of the primitive ID in the pixel if and only if the shadow face is behind the Z value of the pixel as given by the Z buffer from the first pass AND the shadow face is not generated by the same primitive as one whose ID is in the stencil (modulo the lowest bit). In effect, this is the same algorithm excepted that a face is not affected by its own shadow volume. Third Pass ========== In this pass, the Z buffer is cleared and the RGB buffer is saved away then cleared to zero. The scene is drawn with diffuse and specular highlights but with stenciling turned on. We only want to draw the parts of the picture which are lying in the light (continuing with the example where a 1 means in the light and a -1 means in the shadow, we want to draw only to pixels for which the stencil contains a 1). Once we are done, we accumulate the RGB values with the saved RGB values. If the hardware/API allows this, the RGB buffer need not be saved and cleared, RGB values can be accumulated as we go. This will yield the correct shadows we are interested in. Conclusion ========== I have described a fast shadow rasterization method. There are some numerical glitches which complicate the algorithm but these can be done away with by a careful programmer. The only real limitation here is that we be able to easily find a k for step two which will work. Using the heptahedron variation, this can be easily done by finding the largest distance of a light to a vertex, which is linear in time as a function of the number of vertices. All other operations are linear in time as a function of the number of triangles (or vertices) or sublinear, so the overall performance of the algorithm for a single light is linear in time with the number of triangles in the scene. As we go to multiple lights, performance grows linearly with the number of lights. Therefore, the overall performance of the algorithm is O(n*m), where n is the number of triangles (or vertices) and m is the number of lights. Note that I am assuming that time to render a triangle is roughly constant. Unfortunately, no provisions are made for transparent surfaces et al. One can always draw the solid surfaces first with shadows and then transparent surfaces without shadows, hoping that the viewer will not notice this but a better solution should be hoped for. Unfortunately, it is noteworthy that transparent surfaces are very problematic for Z-buffering algorithms in general. The algorithm only requires one extra pass when compared to shadow maps. Indeed, shadow maps require 2 passes per light where as this algorithm requires two passes per light plus a set-up pass. We can thus hope that any performance difference between this and shadow maps will decrease as the number of lights increases. It will also have much less aliasing problems, which has proven to be an important issue in some applications (eg, visual simulation). The object-space shadow map algorithm has classically been hard to accelerate in hardware though it would make it easier to anti-aliase things without actually increasing the resolution of the picture we generate. It is also hard to control the number of fragments produced by object-space shadow volume algorithms.