Is disabling a color attachment an optimization in a MRT, when it is not necessary?

Dark_Photon · October 1, 2021, 12:52pm

To the latter, there is a measurable performance cost to writing to 2 attachments vs. 1. Memory bandwidth you’d expect. But also…

To the former (and the latter), yes there’s a cost to the shader needlessly computing output values that aren’t needed (more registers, more cycles, more mem B/W, lower occupancy, etc.). However, it’s implementation dependent as to if the driver sometimes “discards the operations that involves these disabled [fragment color] outputs”, and if it does, what states will cause it to do so. ^(**)

Moreover, even if the driver does support this “state-based recompile” feature (e.g. for glColorMask*(), glDrawBuffers*(), etc.), there’s a serious real-time rendering performance cost to your app when the driver just up-and-decides that it needs to recompile your shader in the middle of rendering because you’ve never rendered with that shader + state combination before this run (or possibly ever!). For websearching, “state-based recompiles” are also referred to as shader patching, shader recompilation, shader reoptimization, or shader relinking. I’ve also included a few links on this below.

Bottom-line: On the shader side, as much as possible, I wouldn’t depend on the driver doing squat for you besides basic dead code elimination. And to the extent that it does more, you will want to bubble knowledge of that up to your shader permutation generator level so that you can “pre-bake” the permutations needed to avoid the driver needing to recompile your shader at run-time. This is one example of many where you should do this.

As to your question specifically, I would ensure that your shader doesn’t write to outputs that shouldn’t be written to in memory. Then the issue is all internalized to your shader code. If you can’t/don’t want to presume aggressive dead code elimination by the GLSL compiler, then you need to #ifdef out all code feeding those unused outputs. However, if you can (and want to) depend on the vendor’s GLSL compiler to aggressively eliminate dead code in your shader (as NVIDIA does), then you can do something like the following snippet. That is, have the code always compute all outputs in scratch vars (MY_ColorOut), but only declare and write to the ones that should be written to for this shader permutation (e.g. ColorOut[ i ] for i < NUM_COLOR_BUFFERS, a preprocessor define that you set at shader compile time).

#define NUM_COLOR_BUFFERS 1

layout( location = 0 ) out vec4 ColorOut[ NUM_COLOR_BUFFERS ];  

vec4 MY_ColorOut[ 2 ];

... Compute and write to MY_ColorOut[0..1] ...

ColorOut[0] = MY_ColorOut[0];
#if NUM_COLOR_BUFFERS >= 2
ColorOut[1] = MY_ColorOut[1];
#endif

^(**) There is evidence that NVIDIA’s driver will do this, probably for both glColorMask*() and glDrawBuffers*() (NV_command_list, Stuttering in Game Graphics: Detection and Solutions). However, you have no guarantees that any driver will do this, or do this reliably.

Is disabling a color attachment an optimization in a MRT, when it is not necessary?

State-based Recompile Links