There’s not too much deep information about them on the web.
How are they implemented and how they work under the hood? I’d like to know that they cause any performance overhead or not, especially in the case of multiple subroutines to decide what’s the best practice to create different versions of shaders: subroutines or separate shaders.
Subroutine variables are pretty much like function pointers in C, or if we have to, we can call them “virtual functions” (as D3D refers to them).
They are probably implemented on all hardware as actual function pointers, i.e. calling a subroutine will translate to an indirect CALL instruction that takes the called address from a register/memory location.
Thus you can expect them to likely be more efficient than switch case statements in shaders or switching between multiple shaders.
However, this is all just how it should work, as there could be inefficiencies in some hardware or driver that make them slower than they could be, but that’s another story. Also, unfortunately, subroutines have an inherent inefficiency due to the API itself, namely that you have to re-specify the association between subroutine variables and subroutines (using glUniformSubroutinesuiv) every single time you bind a program that has subroutine variables. This is an unfortunate inheritance from D3D, as GL copied this weird behavior from there.
So subroutine variables perform better than shader swapping, recompiling or in-shader “if conditional” multifunctionality.
However in the aspect of the performance of a single shader it’s better without them.
I have found for a small set if tests (2-3) the if test is quicker to using subroutines; but there are good reasons for using them in complex shaders as they can make them a lot easier to expand functionality.