These are shader functions, of course they’re running on HW and are therefore ‘accelerated’, but whether the compiler simply puts a macro in there or there’s some more targeted hardware support is going to vary and is the stuff of GPU wars.
It would be trivial to implement transpose in HW, it’s a single specific case of a 16 register swizzle(on a mat4), the issue is would a designer look at this and think it’s worth the effort when you can just do a few copies. There’s also a transpose flag when sending in uniforms, which is where an app should set this if possible. A robust full inverse would not be so straightforward and is either a macro or some hybrid.
MAD is common enough and useful enough that it’s sure to be in there as a single instruction, probably was already and optimizing compilers would have been spitting out this instruction already.
So I think it’s ALL going to be hardware accelerated where the vendor supports the API. The real issue is how many instructions they use in hardware. It’d be nice to call inverse on a 4x4 in a shader for a one clock solution, but it ain’t gonna happen, but it will still run on the GPU and it will be HW accelerated, of course in a shader you’re doing a lot of potentially redundant matrix inversions if you throw that kind of code in there under the wrong circumstances, so make sure it’s justified.