Light Indexed Deferred Rendering - new technique

I have been experimenting with a lighting technique which I think might be new. This technique seems very obvious so I am hoping that people here can let me know of any prior papers. (before I make a fool of myself)

This approach simply assigns each light a unique index and then stores this index at each fragment the light hits, rather than storing all the light or material properties per fragment. These indexes can then be used in a fragment shader to lookup into a lighting properties table for data to light the fragment.

This technique can be broken down into three basic render passes:

  1. Render depth only pre-pass
  2. Disable depth writes (depth testing only) and render light volumes into a light index texture.
    Standard deferred lighting / shadow volume techniques can be used to find what fragments
    are hit by each light volume.
  3. Render geometry using standard forward rendering – lighting is done using the light index
    texture to access lighting properties in each shader.

What this achieves is the main advantages of deferred rendering (complex light object scene interactions) with ways around the disadvantages (fat buffer sizes, MSAA and transparency issues)

This technique has a obvious down side of limiting the number of lights that can hit a fragment - but this can be easily managed in a game editor context.
However, I think artists would prefer to have as many non-shadowing lights as they want and deal with overlap issues than the current situation of X lights per object and having to break up objects into small pieces.

I wrote up a paper explaining it fully here:
http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredLighting1.0.pdf

The demo with full source will follow soon when I have cleaned up the code.

Nothing new… you can accelerate the loop on a GF8 wit something like:


uniform vec4 col[8]; // dummy should be replaced by a true light calculation
uniform int mask; //the mask should be read from a integer texture
void main(void){
    unsigned int i = mask;
    while (i != 0){
        unsigned int  b = log2(i); //skip all zero bits
        gl_FragColor += col[b]; //calculate light b
        i -= 1 << b; // a xor calculation should do it too
        }
    }

A small problem in that loop are endless loops created by typing errors :stuck_out_tongue:

oc2k1, if it is not new, could you tell me what games/apps/papers use this technique so I can update my paper?

Yeah I know you can do bit-math on Geforec 8 - even mention it in the paper - but I did not want to add anything I could not test.

Although it might be “nothing new”, i don’t think any modern games use such an approach. I think the idea is good. How many overlapping lights one needs in practice, needs to be found out, but a assume a maximum of 4 lights per pixel should work pretty well for many games, since most games still use ambient lighting on a per-sector basis, instead of many non-shadow-casting lights.

I am not sure, whether this would work, but to store the light-indices, you might be interested in this demo by Humus:
http://www.humus.ca/index.php?page=3D

I haven’t yet read your paper, so i am not sure, how exactly you want to store the light-indices per fragment.

Jan.

Read your paper and like the technique.
As far as its advantages over my own forward rendering code, it’s just saving me some CPU work finding the lights that affect objects, and saving me shader swapping. That’s not much work saved, especially considering I sort by auto-generated shader. It also saves this work at the expense of a complicated shader, which won’t scale well backwards to 3 year old cards.
What other advantages over forward rendering is it offering? You seem to focus on its advantages/disadvantages over deferred shading.

@Jan - Humus actually took that idea from a discussion I had on these forums about data packing:
http://www.opengl.org/discussion_boards/ubbthreads.php?ubb=showflat&Number=230242#Post230242

@knackered - Well I wrote the test demo on a 3+ year old card (6800 GT) and with 80/60FPS and 1024x768 for one or two lights per pixel I think that is acceptable (and scene complexity is not affected with this technique - all fragment bound). (I also have not seriously profiled)
You did not say what sort of forward rendering you do, but assuming the “multiple lights in one shader” approach you will not really get any advantage if you only have a few lights on screen.
The best sort of case would be if your app features a terrain system with player point lights moving over it. You would typically not want to re-submit all the terrain for each point light (or break it up) and there may be more points lights than can easily be supported in a single shader.
So basically if you do not need to consider deferred rendering (eg. nighttime scene of city, or Christmas lights) this is probably not an ideal technique.
Also, I have worked with engines before where object ->light intersections were not exactly fast.

But half the beauty of this technique is is can be easily layered on top of existing forward rendering approaches. So if you want you can turn this on with high end cards only, and have lots of little PFX lights.(eg lights from plasma gun, lights from sparks falling, fire effects etc)

Interesting paper, thanks for the reference. The single pass packing via bit-shifting is really an awesome idea.

It would be REALLY interesting to see the performance of this on the newer hardware with constant buffers!

BTW, have you ever thought about trying to do image-space fake global illumination by using the final framebuffer RGB data and Z buffer, then simplify this (using something like a custom mipmap generation shader) to generate a much smaller framebuffer which contains a new point light source per pixel. Then use these new point light sources to light the next frame? Basically light the next frame with the bounce light from the previous frame … might work great in your current light indexed deferred rendering framework.

Actually, I think your blend Max packing version is more appropriate for games (when limited to two lights a fragment)

As constant buffers, when I release the source I am sure someone with a Geforce 8 could test this out wink

I actually did not think of faking global illumination like that, as I would not think the results would look any good. (as opposed to the light space positioning mentioned in the paper)

But hey, I never would have though screen space ambient occlusion would look and good either. When I run out of demo ideas I might give it a try. (going to try order independent transparency again and then mess with the Wii controller next)

Actually, I think your blend Max packing version is more appropriate for games (when limited to two lights a fragment)

As constant buffers, when I release the source I am sure someone with a Geforce 8 could test this out wink

I actually did not think of faking global illumination like that, as I would not think the results would look any good. (as opposed to the light space positioning mentioned in the paper)

But hey, I never would have though screen space ambient occlusion would look and good either. When I run out of demo ideas I might give it a try. (going to try order independent transparency again and then mess with the Wii controller next)

Send me a message when you get your example finished…

BTW, you should be able to get 4 bins in 1 pass using the max blend method if you separate the lights into two non-overlapping sets. There are some other tricks which can be used to do this as well.

Screen/image space methods are only now just beginning to be explored as the GPUs are fast enough to do them in real time. All sorts of stuff can be done in image space, that while fake, look awesome (great for games). Take Mario Galaxy for example, I’d bet that all the reflections/refractions (like on the crystals, etc) are done using a copy of the framebuffer. Image space “refractive” transparency is easily done this way. Image space subsurface scattering can be done very fast if you cheat! Now that image space ambient is here, it is only a matter of time before someone does image space global illumination approximations as well. You can even go all the way to parts of the lighting without ever using any of the traditional diffuse or specular methods and simply use only image space techniques. I’m doing this for my current project. Probably very tough to see in this screen shot, but

OK the demo is up now. Keep in mind this is only a “tech” demo and is not really flashy at all. (based off an old Humus demo)
(It will probably only work for Geforec 6/7/8 users - have not tested ATI at all)
Demo Link

Scene with 255 lights

Also there is a small revision to the main document:
Doc Ver 1.1

I like it. I’ve been racking my brains trying to think of ways of improving it. Like storing a texture containing all the combinations of lights and then using a single index to reference the fragments particular combination. Crap idea, I know, but at least it’s got me thinking.

Just installed Windows a few days ago. So I can actually try your binary, and on a XFX GeForce 8600 GTS XXX (overclocked), got for 4 lights/pix,

640x480 -> 100 fps deferred / 30 fps non-deferred
1024x768 -> 85 fps deferred / 28 fps non-deferred
1600x1200 -> 38 fps deferred / 25 fps non-deferred

Number of lights/pix didn’t make any difference in framerate.

FYI: If you “massage” the makefile you can probably compile it on Linux.

I find it strange that varying the number of “lights per pixel” does not change the frame rate. (I will see if I can run it on a 8600 to compare)

OK I managed to try it on a Nvidia 8600 GT and here are my results:

Resolution / Lights per pixel/ Framerate

640x480 4x - 175 FPS
640x480 2x - 245 FPS
640x480 1x - 295 FPS
640x480 non-deferred - 35 FPS

1024x768 4x - 95 FPS
1024x768 2x - 140 FPS
1024x768 1x - 180 FPS
1024x768 non-deferred - 29 FPS

So are you sure changing the lights per pixel has no difference? (also make sure deferred lighting is enabled - as it is only a deferred lighting setting)

Ok a update to the demo is now available:
http://lightindexed-deferredrender.googlecode.com/files/LightIndexedDeferredRendering1.1.zip

This should fix most ATI issues and by default it uses a moving light scene.

Without indexed lights it runs very slow (<= 5 FPS) when looking at the center of the room, and very fast when looking into the corners.

With indexed lights there is no lighting, though it runs smooth (> 60 FPS).

ATI Radeon X1600 Mobility, Catalyst 8.1.

Jan.

That is very strange, I tested a Radeon 9550 and Humus said he tested x1800 and HD2900 XT cards.

Did you try the 1 or 2 lights per fragment option? or toggling the stencil?

Perhaps even using the “Precision test” option and seeing if you only see yellow.

I checked all options. Toggling number of lights and stencil doesn’t change anything. Enabling “Precision Test” makes the whole screen yellow (except for the HUD, of course).

Jan.

sqrt[-1], and Humus, did you use cat 8.1 or older drivers?

Just speculating that 8.1 introduced … unintended behaviour.

Or could it be some issue that it’s “Mobility”? (UMA?)

I used Cat 8.1 on the 9550.