GL_ARB_shader_subroutine

Has anyone tried GL_ARB_shader_subroutine? If so, do you prefer it over a series of if…else statements based on an int uniform? I wonder how subroutines perform compared to static branching - I suppose the real benefit is that you don’t need to manage a bunch of precompiled shaders with a different uniform value to fake polymorphism.

I assume this requires GL 4 class hardware since the extension requires ARB_gpu_shader5. What prevents this on older hardware?

Regards,
Patrick

Apparently that’s the case, since the latest NVidia GL4 drivers don’t advertise this extension on a GTX285.

What prevents this on older hardware?

That is an excellent question, and that it’s not available on GL3 hardware (apparently) strongly suggests that this isn’t just doing a silly “recompile/reoptimize the program” on uniform change under-the-hood.

Perhaps it’s dynamically changing call instructions in the program code and physically doing subroutine calls (which suggests the presence of a stack, or at least 1-level jump-back mechanism). If so, I wonder about max subroutine nesting level (e.g. main->A->B->C). Haven’t read the spec, but since I find no mention nest or level in it, this probably isn’t it…

Or is it merely more like just a “switch” statement, where the “jump address” is patched-in dynamically based on the uniform value selected, and all alternative subroutine paths rejoin (“jump back”) to the same spot regardless. If so, this should be like switching on a uniform, except that the branch is unconditional rather than conditional, so should be more efficient.

And this is versus the usual ubershader scheme, where each code path switches based on const bool/int/etc. values in the shader which are varied per ubershader variation, and the GLSL compiler “compiles out” the dead paths not used in that ubershader permutation:

const bool FEATURE_ON = false;  // Varies per ubershader permutation

if ( FEATURE_ON ) then
{
  ...feature_implementation...
}

I too am curious how this compares to, not only static branches, but “no” branches (i.e. traditional ubershader approach – that is, rebinding a different shader vs. presumably hot-tweaking some jump values in the currently-bound shader). But alas, no GL4 card on my desk to play with just yet… :frowning:

I didn’t see anything specific about nesting but the spec states there is an implementation-dependent limit on the number of subroutines:

There is an limit on the number of subroutines per shader stage (MAX_SUBROUTINES) and also a limit on the number of subroutine uniform locations (MAX_SUBROUTINE_UNIFORM_LOCATIONS).

Patrick

that it’s not available on GL3 hardware (apparently) strongly suggests that this isn’t just doing a silly “recompile/reoptimize the program” on uniform change under-the-hood.

Right. I think the benefit is you take a small (not sure how small) performance hit on each subroutine call but switching a subroutine uniform to a different function is very fast compared to swapping shaders or setting a uniform that causes a recompile. I suppose an application that previously had hundreds of shader swaps per frame may only need dozens now, and the performance gained here outweighs the subroutine call overhead.

Patrick