Uniform branching vs preprocessor ifdefs?

Hi everyone, I’m developing a game engine material system. It should allow creating materials with some parameters such as using albedo texture or just a color, enabling/disabling normal map, using emissive map/color, etc. So I’m confused about which design approach is better?

  1. Create a general shader and pass material parameters via UBOs. Then, in the shader, we will branch on these parameters at runtime.
  2. Create a shader “template” where every material parameter branching point is wrapped in #ifdefs. This shader will be compiled with needed defines later at runtime, which allows to include only needed code in the shader binary.

As far as I know, the first way is called static branching and has very little overhead nowadays because all shader executions within the GPU warp take the same branch. For some reason, most of the modern game engines use the #ifdef approach despite all the drawbacks. But aren’t pipeline compilation and rebinding very expensive operations? Or am I missing something very beneficial about this approach?

Your question is a good one, and your assumptions are basically correct.

Rather than re-invent the wheel here, let me point you to some good URLs that discuss this (see the links on the forum thread link below). In particular, I’d start with the last two (the most recent). This’ll get you familiar with the pros/cons of each.

Then feel free to follow-up here with specific questions.

Just a few specific comments about your question.

Both are useful. Don’t think of this as an either-or that must be determined identically for all shader permutation states.

One brief comment on #2. To get this, you don’t need to use ugly #ifdefs that reduce readability. Standard language if/else/endif/switch/etc. on "const"ant expressions (whether declared constant or not) will typically trigger dead code elimination in the GLSL compiler/linker (but check the compiled output to be sure!). This is great for improving shader readability. The tradeoff being the compiler still needs to chew through this disabled code (e.g. “if (0)”) to discover that it needs to throw it out. If your shader sources aren’t huge and your permutation count isn’t huge, this is no problem. But…

Yes. However, there are other factors that can drive shader performance, such as total register usage, which includes the inactive parts of your shader you’re not using. Plus, wildly different inputs for some shader permutations might suggest approach #2 over #1 in specific cases.

Generally though, for a specific shader permutation state, if it has the same input resources and should have about the same register usage (check!), then I personally prefer #1. In any non-trivial engine or material system, at some point you have a “shader permutation explosion”. And that creates lots of problems and consumes non-trivial dev time that could be spent elsewhere.

You’re right. It’s a trade-off. More shader binds (potential perf–) and a sometimes a more streamlined shader program that might run faster (perf++), vs. fewer shader binds (perf++) and sometimes slower shader that takes longer (perf–). No easy answer. You just need to profile and see. Your intuition on this gets better with experience.

1 Like

Thanks for the quick response and great explanation.

Related : The Shader Permutation Problem - Part 1: How Did We Get Here?

Yep. That’s one of the two I was referring to here: