I’m trying to implement deferred shading in my engine, but I have one problem:
How can I use more than one material at a time? As long as I only change textures there is no problem, but what do I have to do when I want some objects drawn with a different fragment shader?
Consider for example reflection and refraction. When I render the scene to a cubemap, I get a fully shaded image of the scene, and I don’t want this to be shaded again when applied to an object.
I already thought about storing the depth of fragments in one of the textures and using one pass per fragment shader, and override the depth value in the shader so I can use the depth buffer to determine which pixel gets shaded by which material. This would also have the advantage that I get a correct depth buffer out of this, so I could draw transparent objects afterwards. But this method kinda defeats the purpose of deferred shading…
Am I missing something obvious? Or do we just have to live with this limitation of deferred shading?
It is absolutely critical to deferred shading that you have enough memory in the intermediate destination buffers to store all differentiating information for the pixels that ultimately win the visibility test. Exactly what you need to store depends heavily on how you represent your surfaces. You could store a texture color for example or a texture coordinate and texture ID. You could store material parameters or just one or two parameters for example diffuse RGB & a specular term in the alpha channel, or you could store a material ID. Imagine a material ID being used to look up one axis of a 2D texture in the deferred shader that stored all material properties along the other axis for example, it would be a lot of texture fetches but it illustrates the point that there really are no rules and no limits to the solutions you can come up with in a deferred shading implementation. It’s just a technique where you defer shading calculations until after the visibility test, you need all the data somehow to perform the shading. FWIW it’s never been less useful IMHO, since there’s coarse zbuffer and early z tests now and many complex variable inputs to each fragment that might require a lot of storage or a very complex defered shading pass (e.g. lot’s of deferred data fetches using indexes as I described).
I mean it’s interesting and would probably be more interesting if it wasnt for coarse and early z. Just eliminate all that complexity with a depth pass and you’re golden. Coarse and early Z effectively means that deferred shading is built into the hardware if you do a pass to the zbuffer and it’s a heluva lot easier to write that than storing all sorts of contorted crap to auxiliary buffers. It’s just an academic exploration of an old obsession in computer graphics.
All the hype and marketing is just that, there is nothing ‘special’ about deferred shading, it’s actually potentially stifling as a technique IMHO.
P.S. to summarize, the main problem here is shading data needs to come from framebuffer memory for the single pixel winning the visibility test. All fragments that win that test during rendering need to write data to the aux buffers. Normal shading would just use the data directly from the registers & texture fetches during the initial hit. Fetching that data is a big part of the cost anyway. There’s an explosion in data complexity feeding shaders, there’s also ubiquitous early z optimization. Reducing the shader data stored in a deferred algorithm means a lot of complicated fetching & resource useage in the final shading (even if you had all the precision you needed in storage). So, deferred shading is a silly thing to pursue, even when it works well there are trivial ways of getting the equivalent optimization with the hardware support that’s there now without the complexity & buffer storage :-). Topping this off deferred shading implies a certain homogeneity to the final shading algorithm, i.e. the monolithic Doom 3 data driven single shading algorithm vs a more arbitrary mix of shaders.
That’s just what I suspected. Although only one geometry pass independant of the number of lights sounds tempting… That would push up my maximum triangle count per scene quite a bit
But on the other hand, I guess its not long until we can write shaders that do an arbitrary number of lights in one pass
That’s one scheme, it could become a win if you shaded your 2D deferred passes with a light position & other light information, however this doesn’t address the need to handle shadows. It still helps with the passes I agree if your hardware can efficiently handle the material parameters in aux buffers.
P.S. you should bear in mind that management limits the scope of illumination from individual lights limiting the passes from lights to the portions of the scene they illuminate.
The Stalker guys had an interesting solution to this problem - they store a material lookup table in a 3D texture, indexed by (N.L, N.H, materialID). The texture only needs to be small (they used 64x256x4), and you get interpolation between the material types for free.
Excuse the plug, but I honestly recommend reading their chapter in GPU Gems 2:
I think you misunderstood what I meant with different materials. Of course I can get additional material attributes at the cost of an additional texture lookup, but the problem is that I may want to use an entirely different shader for some materials.
If I may borrow these terms from GLSL, the limiting factor is not “uniform” attributes (which could be put into an additional lookup texture), but “varying” attributes (that are not per material but per fragment).
This results in an extreme explosion of needed material parameters. For example if I need reflection and refraction, I need the color of the reflected beam, the color of the refracted beam, the diffuse color of the material, the world space position and the normal. And these are all parameters that can’t be put in an external lookup texture, so I only have a single float left for a material ID that may contain e.g. a specular exponent or material transparency (assuming the current maximum of 4 render targets and not wanting to multipass…). So I can’t have a glossmap or an alpha map, because for that I would need another per fragment parameter… The real bad thing is that I can’t have this for my WHOLE scene, not only for the reflecting and refracting object, because I have to fit all possibilities into a single shader, and therefore have to reserve space for every possible combination of needed parameters.
At the worst case I need a material ID that either gets fed into a big “if … else if …” statement or some similar construct using one pass per material and killing the fragment based on the ID…
This makes deferred shading rather useless for a generic engine design, because the final pass shader will either always limit me in what material types I can use in a single scene, or use extreme amounts of parameters.
1/ render depth pass
2/ for each shader ID in scene…
2.1/ render all objects in scene with this shader ID into the aux buffers.
2.2/ render fullscreen quad with shader enabled (skipping fragments where alpha is zero, or something)
2.3/ ‘combine’ this shaders output with the framebuffer in some way
That’s a nice design, isn’t it?
Early out in the shader means fill rate shouldn’t be an issue.
You’re not going to have that many shaders in your scene…certainly not so many as to make this approach impractical.
However, I’m probably missing the point, as I haven’t carefully read this thread.
No, I think you understood my problem.
Your algorithm is basically what I meant with “some similar construct using one pass per material”. I just didn’t think of the depth only pass first…
The combining of the different shader outputs shouldn’t be a problem, because the depth only pass at the beginning ensures that every shader writes only “its own” fragments, there should be no overdraw, eliminating my initial problem…
The final algorithm, including multipassing over the lights, should be the following:
- render depth pass to renderbuffer
- copy the depth buffer of the renderbuffer to screen (propably just another bind because i will most likely have a postprocessing stage after that)
- for each material
3.1) render objects into aux buffers
3.2) render fullscreen quad with ambient shader (if needed by the material)
3.3) for each light
3.3.1) render light volume with light shader
- render non-deferred materials (alpha transparency and similar things)
Sums up to two geometry passes and one shader pass per light (not per material, because I will have no overdraw).
Looks like it should work… It puts the materials in the outer loop instead of the lights if I don’t want to keep too many copies of the aux buffers around. But that’s no problem unless I do stencil shadows, which I’m currently not planning…
This approach would require GF6800 class hardware, because AFAIK killing a fragment does not save bandwith on older cards, or am I wrong here? What about the X800?
Anyways, thanks for the help. Sometimes I just need someone to point me to the obvious
Wow, I actually helped someone.
Weird warm feeling in my tummy…
Now, back to the trolling…
maybe use bindless texture is good idea.
instead of render material info to gbuffer, you can only render material id,
and get the all material texture in one pass by bindless texture technology.
here is the post and implementation from MJP:
github TheRealMJP DeferredTexturing repo, there is also link to the post.