Beating a compiler’s optimizer for a CPU is already a task that most people fail at. For the majority of simple optimizations, compilers already know how to get it done.
Beating a GPU compiler’s optimizer is going to be so much harder. In the desktop space, x86 is dominant, and optimizing for AMD is mostly not much different from optimizing for Intel. With GPUs… that isn’t the case. Different GPUs can be incredibly different across companies. GPUs from different generations (or even within the same generation for cheaper/more expensive hardware) can be radically different. What was faster on one may be incredibly slow on another.
Furthermore, by the nature of how GPUs do their work, you have to take a holistic approach to optimizing if you ever want to get anything done. You can’t just optimize
sqrt; you need to optimize when you do each of the individual computations. Some GPUs can hide scalar operations by bundling them with some existing vector operation (if the vector operation is 3 component or less). That would make the sqrt computations take no extra time at all, but it requires finding an appropriate vector operation to hide it within. And that depends on how the rest of the shader is written.
You might hide it in a dot product in one shader, a cross product in another shader, a matrix multiply in a third, etc.
If such optimizations are possible, your GPU compiler is likely very good at finding ways to rearrange the shader code to hide the computation. Unless you are an engineer at AMD/NVIDIA/Intel, you are not going to beat the compiler in the vast majority of cases, even if you had access to their assembly.
In short, you’re not going to beat the compiler at generating code for the various GLSL standard functions. So just use them and stop worrying about it.