I’ve been optimizing a little my code, and found out, through some reading, that (obviously) many calls and state changes will lead to overhead, therefore, reducing the calls is always a good idea.
Regarding the Shading Area, sorting the shaded objects by shader, in order to reduce the binding of shaders is also a good ideia.
But, if we would want to go one step further, could it be better if I were to use only 1 single shader program for my entire application, and have multiple shader codes attached to it (like libraries) and perform the drawing switch style inside the shader main method rather than having multiple shaders?
Resuming: Is it beter to have 1 shader program with sections with different shading styles in it, that 5 or 6 shaders each one with its own style?
What you’re referring to is often referred to as an “ubershader” in case you want to search the net for references. While possible and definitely the better option in some instances, it’s a trade-off.
On-the-one-extreme, separate shader program per “shader generation state permutation” (will just refer to this as shader-gen state). Each program is very tightly optimized for only that permutation. So we get minimum register usage, minimum varyings/interpolators passed between stages, tightly optimized code, with no time spent on evaluting branch conditions and performing branches. That gives you the maximum number of warps/wavefront running on the shader multiprocessors (max occupancy for that shader-gen state) which gives you the maximum memory latency hiding potential.
On-the-other-extreme, one ubershader for all shader state permutations using branches in the shader on the individual shader permutation states. Coherent dynamic branches are pretty cheap nowdays, so that’s not really a concern. But in the limit, you end up with a shader with inputs, varyings/interpolators, and register counts that must encompass the worst-case across all of the shader permutations. Don’t need 5 texcoords and 5 textures? Too bad, you’re passing them in and through the shader stages anyway. That bloats the register footprint of your shader which reduces occupancy which reduces its memory latency hiding potential which can increase the chance that the shader will be memory access bound.
So probably what makes sense is to take this not on a “I want one extreme or the other” basis. But rather to take each each shader-gen permutation state case-by-case. Consider how the different values of that state change the number/type of shader inputs, varyings, and register footprint used by the shader. If they don’t change it at all, I’d definitely consider just using a dynamic branch in the shader to handle that! That should be very cheap. Fewer shader permutations and fewer shader binds/batches. On the other hand, if they wildly change the inputs/varyings/register footprint, you might want to consider having that state be used to generate separate shader permutations rather than “switching on and off” logic in the shader code using dynamic branches.
Ultimately though, it’s going to depend on your application and it’s usage what’s best. If you have an insane number of shader-gen state permutation states/axes and/or batches, it may very well make sense to push more shader-gen state to dynamic shader branches so you can do fewer shader binds (definitely consider binning batches per shader permutation of course, when possible, to avoid rebinding the same shader permutation repeatedly). OTOH, if your batch or shader permutation state counts are pretty low, you don’t need to lean so hard in the ubershader direction to get your batch count and shader bind counts down.