Performance More Shaderprograms VS. IF Statements in Shader?


what is the better option, to compile and link various shaders for various tasks/objects (some are textured others not etc.) or should I just use one big shader and use If Statements in it? I heard that If statements are very taxing on the gpu.

Best regards, R.

It is a tough call.

There is significant cost to switching shaders - and you’re right that “if” statements can be expensive (especially in fragment shaders).

So this ends up being a tricky compromise. If you are switching shaders every few hundred polygons - then using an “if” statement would make a lot of sense. However, a lot depends on how different the two shaders are. If your “if” ends up with 50 lines of code in the “then” part and another 50 in the “else” - then using an if statement is going to be horribly expensive because many GPU’s will actually take as much time as it would take to run all 100 lines of code for every pixel! (From a performance perspective, it’s as though both the “then” and the “else” part are both being executed, no matter which way the “if” turns out).

So if (for example) your program only uses a couple of shaders and one has specular lighting but the other doesn’t - but which shader is uses switches from one object to the next - then using an “if” to eliminate shader switching wouldn’t be such a bad idea.

But if you have (for example) a horribly complicated shader for rendering water, and another horribly complicated one for rendering people - then using an “if” would be highly counter-productive because they’d share almost no code.

Sadly, there isn’t a single clear answer to this one because it depends so much on the application.

For most “general purpose” stuff - I use separate shaders. However, that pushes up the per-object setup cost - so I work very hard to keep my objects as large as possible (not split them up into many meshes) and I make sure to sort objects so that those that use the same shader are rendered consecutively in an effort to minimise shader switching time.

YMMV though.

Thanks for the input

Yeah same here, for most of my usual tasks I use separate shaders, but that does push up the setup costs for each object like steve brought up. The easiest way around this is to keep your objects as big as possible and I make sure to sort objects so that those that I make sure to use the same shader so that they are rendered one after the other. This has definitely minimized my shader switching time.

Yep - I agree.

It’s a tricky subject though - you’ll find people that’ll tell you either story. I’m definitely in the “more shaders, fewer if-statements” camp…but it does depend on the nature of the art, how old the graphics cards your users might have, etc.

I try hard to keep all of my meshes up over 512 vertices - which is in the “sweet spot” for most GPU’s. My art guys use all sorts of sneaky tricks to be sure to have no more than one mesh per real-world object…and we group objects together where possible if they are really low polygon count.

I sometimes sort by shader type - but most often it’s enough to make sure that if I have 30 objects of type A and 30 of type B, that I render all of the A’s before all of B’s and not go A,B,A,A,B,A,B,B,A,B… or whatever. For most of the kinds of scenes I draw, that cuts the amount of shader switching time down to almost negligable proportions.

If the code on either side of the ‘if’ is very small and contains no texture lookups - then it’s not so horribly expensive. What makes me flinch is when someone will say “if high quality lighting is needed then do some fancy normal mapped thing - else use a pre-lit texture instead”. Not realizing that far from optimizing their code - they just made it do BOTH the normal map lookup and all the fancy calculations AND the pre-lit lighting lookup and calculation - only to throw away the results from whichever one wasn’t needed!!

I’ve seen commercial game programmers (who were not graphics-savvy) do six varieties of lighting and shading in one shader with ‘if’ statements to pick the most efficient one under different situations because they found that their lighting was too slow with only two versions. They were blown away when I removed all but the most expensive lighting code path of all…and their renderer went four times faster!

The GPU may look like a general-purpose CPU - but it’s really not. Understanding why things like ‘dependent texture read’ (where the output of one texture map can influence which texel is read from a second texture map) is so bad is an important part of writing good shader code. Very often, things that would be really efficient in C++ are a disaster in GLSL - and vice versa.

However, GPU’s are getting gradually better - and gradually more general-purpose - and perhaps in 5 or 10 years, we’ll be able to ignore all of this arcane stuff and just write code.

– Steve

I wonder. My understanding is that the potential cost of if statements is associated with divergence – different threads in the GPU warp (ie, a gang of 32 threads) running different paths in the conditional. If it can be guaranteed that every thread in the warp is running the same path, then there are no thread stalls due to divergence, and the cost of the conditional is not as severe as if there was divergence. (Threads in a warp all execute the same instruction at the same time. When there is thread divergence due to different paths through conditional, then some threads stall while others execute different paths. This is what causes the performance hit for conditionals.)

It sounds like, in your usage, this conditional is more like a ‘pass’ mode, and it might be possible to guarantee that for a given pass mode, the warps stay lock-stepped, without stalls.

But in general, if the conditional is instance data dependent, like a local area gather filter, then there is no way to avoid divergence, and the conditional will cause stalls. The stalls serialize the threads in the warp that take different paths through the conditional.

I have found issues with some devices.

As far as I can tell, some of them are still creating SM1 code.

SM1 has no if statement, it only has conditional assignment.

So if you wrote

if (i==0)
30 lines of code mark 1
30 lines of code mark 2

what the gpu actually does is

30 lines of code mark 1
30 lines of code mark 2
res=mark 2
if (i==0)
res=mark 1

Of course if your device is generating SM2 code then things are much better.

This topic was automatically closed 183 days after the last reply. New replies are no longer allowed.