How to avoid code duplication in GLSL shaders?

I noticed that once number of shaders used in a project grows, the code duplication between shaders becomes an issue. It is mainly caused by two factors:

First one is the need to share constants/functions/UBOs etc. between various shaders. For example, a UBO that contains lighting information must be defined in all shaders that need to take light into account.

The second factor is configuration management. For example, if I want to give user a choice whether to apply light calculations on particles or not, I might need to have two almost identical shaders for the particle system: one that contains light calculations and one that doesn’t, and load appropriate shader in the runtime based on user settings.

Not only this leads to a lot of copy-pasting, but keeping all shaders code consistent with each other is a big overhead. Simply changing a UBO or function definition requires propagating (aka copy-pasting) this change to all shaders that use it.

In software development these type of issues are typically solved by either doing code translation from more high level language, that supports necessary features to avoid code duplication (e.g. TypeScript -> JavaScript), or by using code templating. So my question is, what do people (especially game developers) usually use to address the code duplication issues? Are there some well known templating engines or translators for GLSL? Or is everyone constructing shaders from pieces passed to glShaderSource()?

Bear in mind that a shader is more like an object file than a complete program. You can attach multiple shaders of the same type (vertex, fragment, geometry, etc) to a single program, so long as exactly one shader of each type defines a main() function, and you can attach a shader to multiple programs. For functions which are used by multiple programs, this should be preferred to simply replicating the definition in the source code of multiple shaders.

Structures and interface blocks need to be declared in each shader which references them, and there isn’t a #include directive. But I’m fairly sure that this is a major reason why glShaderSource() takes multiple input strings.

There’s also: ARB_shading_language_include.

But this doesn’t really get to the heart of the OP’s question, which is: are some folks using systems/mechanisms more complex than just appending source code fragments together (through whatever mechanism) to generate their shader source.

You can attach multiple shaders of the same type … For functions which are used by multiple programs, this should be preferred to simply replicating the definition in the source code of multiple shaders.

Why? Inquiring minds want to know.

This has reduced utility over programmatically including the same source code snippets due to lack of constant folding (the definitions of which often vary between shader permutations) and resulting dead code elimination.

Yeah, I’m aware of that, but unfortunately it doesn’t really solve the issue. You’d still need to have some system to figure out how to “compose” program from multiple shaders.

Why do you say it should be preferred way? Is there any difference compared to simply including the function definition in multiple shaders, apart from presumably smaller memory consumption on the GPU?

I’ve seen this extension, but it only partially solves the problem (same functionality can be easily achieved with something like M4 preprocessor). Besides, I don’t want to be dependent on this extension being supported.

You are correct, my question is not so much about “how to do it”, but more about “how do real game developers do it”. For example, in modern web development virtually noone writes on CSS directly, because it suffers similar issues as GLSL that lead to poor code reusability and copy-paste driven development. But there are LESS and SAAS tools that probably over 90% of developers use. So I’m just wondering if something like this exists for GLSL and being widely used in gamedev industry.

Real™ Game™ Developers™ don’t author shaders directly. Shader code export is part of the content toolchain.

Alternatively, a game may use a small number of shaders for 90%+ of use cases. Scenarios such as that described in the OP may be purely theoretical, and actual real programs may not need that level of flexibility. Or maybe the same effect is achieved by setting uniform values to 1.0 or 0.0, or binding an all-white or all-black texture. Doom (2016), for example, does 80% of it’s drawing with only 2 shaders (reference: https://mobile.twitter.com/idSoftwareTiago/status/968976047382433795).

Alternatively, look at the function prototype for glShaderSource, realise that you can use it - without extensions - to combine any number of multiple code snippets, and use it that way.

I figure that there must be some reason the specification explicitly allows multiple shader objects of the same type, in addition to making explicit provision for doing the same thing at the source code level. If an implementation doesn’t want to make use of it, it can make glCompileShader() almost a no-op (beyond what’s needed for testing compilation status) and put everything into glLinkProgram(). But it’s much harder to do the reverse.

OTOH, even if it has an effect, it’s unlikely to be significant overall. Compilation is a one-time cost, and I can’t imagine code size being an issue.

Ok, thanks GClements. Just curious if you knew of subtle optimization I hadn’t seen before.