Clarify meaning of merge block

seanbax · July 7, 2020, 5:48pm

gist.github.com

https://gist.github.com/seanbaxter/608e855c1247e921c59f5672bd65f9f9#file-foo-spvasm-L105

foo.frag

#version 450 core // note the GLSL version!
layout(location=0) out vec4 FragColor;
layout(binding=2) uniform ProgramBlock {
  ivec2 counter;  
};


int func(int x, int y) {
  int sum = 0;
  for(int i = 0; i < x; ++i) {

This file has been truncated. show original

foo.spvasm

; SPIR-V
; Version: 1.0
; Generator: Google Shaderc over Glslang; 9
; Bound: 70
; Schema: 0
               OpCapability Shader
          %1 = OpExtInstImport "GLSL.std.450"
               OpMemoryModel Logical GLSL450
               OpEntryPoint Fragment %main "main" %FragColor
               OpExecutionMode %main OriginUpperLeft

This file has been truncated. show original

What is the meaning of the OpSelectionMerge merge block? It’s not a post-dominator of the subsequent conditional branch.

int func(int x, int y) {
  int sum = 0;
  for(int i = 0; i < x; ++i) {
    sum += i * i;
    if((i % y) == 4)
      continue;
  }
  sum /= 2;
  return sum;
}

The merge block for the OpConditionalBranch corresponding to the if-statement is the next statement after the if-statement (which is empty and just hits the closing brace). But the body of the if-statement branches to the continue block of the containing loop. It doesn’t pass through the merge block.

Does making the merge block a post-dominator of the conditional branch improve performance by allowing partial reconvergence? glslc also emits multiple return instructions, which obviously prevents the merge block from being entered after a conditional branch.

Is it safe just to threat the merge block as “the next statement after this control flow structure,” even when the body of the control flow statement branches beyond the merge block?

dneto · August 19, 2020, 10:54pm

Yes, that’s a good way to think about it.

Merge instructions and their corresponding merge block become significant with uniform control flow: if control was uniform at entry to the construct, and all invocations exit the construct through the merge block then control will be uniform again at that merge block. (See the definition of “Uniform control flow” in section 2.2.5 of the SPIR-V spec.)

But if there were the equivalent of break, continue, discard, or return which bypass that merge block, then the rule doesn’t apply.

seanbax · August 31, 2020, 1:55pm

Afraid I’m still very confused. Consider this function:

float get_range_attenuation(float range, float distance) {
  if(range <= 0)
    return range;

  return distance;
}

gist.github.com

https://gist.github.com/seanbaxter/644a6b765ab0fccad64c301303141b53

branch.comp

#version 450

float get_range_attenuation(float range, float distance) {
  if(range <= 0)
    return range;

  return distance;
}

layout(binding=0)

This file has been truncated. show original

branch.spvasm (with spirv-opt -O)

       %main = OpFunction %void None %3
          %5 = OpLabel
         %34 = OpAccessChain %_ptr_StorageBuffer_float %_ %int_0 %uint_0
         %35 = OpLoad %float %34
         %38 = OpAccessChain %_ptr_StorageBuffer_float %_ %int_0 %uint_1
         %39 = OpLoad %float %38
               OpSelectionMerge %62 None
               OpSwitch %uint_0 %55
         %55 = OpLabel
         %57 = OpFOrdLessThanEqual %bool %35 %float_0

This file has been truncated. show original

spvasm generated by Circle for C++ shader

%_Z9comp_mainv = OpFunction %void None %8
          %9 = OpLabel
         %13 = OpAccessChain %_ptr_StorageBuffer_v4float %buffer %uint_0
         %15 = OpAccessChain %_ptr_StorageBuffer_float %13 %uint_0
         %16 = OpLoad %float %15
         %17 = OpAccessChain %_ptr_StorageBuffer_v4float %buffer %uint_0
         %19 = OpAccessChain %_ptr_StorageBuffer_float %17 %uint_1
         %20 = OpLoad %float %19
         %23 = OpFOrdLessThanEqual %bool %16 %float_0
               OpSelectionMerge %24 None

This file has been truncated. show original

This is the IR I generate for the C++ shader:

         %23 = OpFOrdLessThanEqual %bool %16 %float_0
               OpSelectionMerge %24 None
               OpBranchConditional %23 %25 %24
         %25 = OpLabel
               OpBranch %26
         %24 = OpLabel
               OpBranch %26
         %26 = OpLabel
         %27 = OpPhi %float %20 %24 %16 %25

error: line 41: block 24 branches to the selection construct, but not to the selection header 9
%24 = OpLabel

glslc + spirv-opt produces the identical sequece of instructions:

         %57 = OpFOrdLessThanEqual %bool %35 %float_0
               OpSelectionMerge %60 None
               OpBranchConditional %57 %58 %60
         %58 = OpLabel
               OpBranch %62
         %60 = OpLabel
               OpBranch %62
         %62 = OpLabel
         %64 = OpPhi %float %35 %58 %39 %60

Why does glslc’s code pass and mine fails? I notice it encloses the function in a single-case switch:

               OpSelectionMerge %62 None
               OpSwitch %uint_0 %55
         %55 = OpLabel
         %57 = OpFOrdLessThanEqual %bool %35 %float_0
               OpSelectionMerge %18 None
               OpBranchConditional %16 %17 %18
...

The OpSwitch isn’t in the -O0 output. It’s added by spirv-opt when it performs --merge-return. Is the OpSwitch a necessary backstop? It’s a weird non-local CFG transformation. I kind of understand why it’s there, but it feels like a workaround for glslc allowing edges out of control flow that don’t pass through the merge block. What am I supposed to be doing here?

seanbax · August 31, 2020, 2:16pm

I hoisted the OpSwitch out of the get_range_attenuation and put it and its merge block into the shader entry point. That breaks spirv-val. Apparently functions must be treated as integral units with a OpSelectionMerge backstop, even if the calling function has its own backstop.

seanbax · September 11, 2020, 3:13pm

I can’t deal with structured control flow.

gist.github.com

https://gist.github.com/seanbaxter/5a962b2726d175ed989d162f8bbbdcda

comp.cxx

struct buffer_t {
  ivec4 data;
};

[[using spirv: buffer, binding(0)]]
buffer_t buffer;

[[spirv::comp(1)]]
void comp_main() { 
  for(int i = 0; i < buffer.data.x; ++i) {

This file has been truncated. show original

shader.spvasm

               OpCapability Shader
               OpExtension "GL_EXT_scalar_block_layout"
               OpMemoryModel Logical GLSL450
               OpEntryPoint GLCompute %_Z9comp_mainv "_Z9comp_mainv" %buffer
               OpExecutionMode %_Z9comp_mainv LocalSize 1 1 1
               OpName %buffer_t "buffer_t"
               OpMemberName %buffer_t 0 "data"
               OpName %buffer "buffer"
               OpName %_Z9comp_mainv "_Z9comp_mainv"
               OpDecorate %buffer_t Block

This file has been truncated. show original

spirv-val

error: line 40: Header block 25[%25] is contained in the loop construct headed by 17[%17], but its merge block 24[%24] is not
  %25 = OpLabel

An if-statement conditional inside a loop is generating this code:

               ; This shader fails to validate when the merge block is %23, 
               ; even though it dominates both branch targets %31 and %32.
               ; It passes when the merge block is %32, even though that 
               ; does not dominate both branch targets.
               ; OpSelectionMerge %32 None 

               OpSelectionMerge %23 None
               OpBranchConditional %30 %31 %32

         %31 = OpLabel
               OpBranch %23
         %32 = OpLabel
               OpBranch %23

         %23 = OpLabel
         %33 = OpLoad %uint %11
         %34 = OpIAdd %uint %33 %uint_1
               OpStore %11 %34
               OpBranch %13   ; branch back to loop header

Header block 24[%24] is contained in the loop construct headed by 13[%13], but its merge block 23[%23] is not
%24 = OpLabel

The merge block %23 terminates with a branch back to the loop header block, and all paths through %24 go through %23, so I don’t see how spirv-val contends that it’s not contained by the loop construct.

seanbax · September 11, 2020, 6:05pm

I spent a good hour debugging spirv-val. My suspicion is that you can’t name a single block as the merge target of the selection and the continue target of the enclosing loop, although that’s not stated in the specification.

dneto · September 11, 2020, 6:20pm

I spent a good hour debugging spirv-val. My suspicion is that you can’t name a single block as the merge target of the selection and the continue target of the enclosing loop,

That is correct. It comes from the rule:

when a construct contains another header block, it also contains that header’s corresponding merge block if that merge block is reachable in the CFG

The selection’s header is inside the loop’s loop-construct. So the selection’s merge block must also be in the loop-construct. Note that the loop-construct is disjoint from the corresponding continue-construct.

But then, why is it valid if %32 is the declared as the merge block of the selection? Well that’s because now that makes block %31 a continue-block: it contains a branch that escapes the selection and goes to the nearest enclosing continue target. In C/GLSL terms the structure is now:

  if (cond) { 
      // This is block %31
   continue; // branch to %32
 } 
 // This is block %32
 // And this is block %23, the continue target

dneto · September 11, 2020, 6:24pm

I massaged your SPIR-V assembly example to run it through the SPIR-V → WGSL flow from Tint

Specifically:

rename the entry point so it doesn’t have a leading underscrore
removed %buffer from the entry point interface (since Tint only understands up to SPIR-V 1.3)
changed the continue target for the selection to be %32

Then running it through the translator to produce (draft) WGSL, I get the following. That should show fairly directly what’s going on structure-wise.

$ tint --format wgsl a.spvasm 
entry_point compute = Z9comp_mainv;

type buffer_t = [[block]] struct {
  [[offset 0]] data : vec4<u32>;
};

[[binding 0, set 0]] var<storage_buffer> buffer : buffer_t;

fn Z9comp_mainv() -> void {
  var x_11 : u32;
  x_11 = 0u;
  loop {
    const x_14 : u32 = x_11;
    const x_19 : u32 = buffer.data.x;
    if ((as<i32>(x_14) < as<i32>(x_19))) {
    } else {
      break;
    }
    const x_28 : u32 = buffer.data.y;
    const x_29 : u32 = x_11;
    if ((as<i32>(x_28) < as<i32>(x_29))) {
      continue;
    }

    continuing {
      const x_33 : u32 = x_11;
      x_11 = (x_33 + 1u);
    }
  }
  return;
}

Note that I really would like conditional-break and conditional-continue, but the W3C community group rejected that. So we get weird constructs like the first conditional-break in the loop: a full if-then-else where the break is in the else.

dneto · September 11, 2020, 6:29pm

To answer your earlier question about the switch in the inlined function:
There are only certain prescribed ways to exit a structured construct. However, returning from a function can occur in nearly arbitration places (anywhere except from within a continue construct).

When a function that is being inlined has a return from the middle of the control flow, spirv-opt has to work hard to make that “return” into a proper structured exit. In past we’ve wrapped the code in a single-iteration loop; but that doesn’t work well when the function has a loop already. So now the transform wraps the code with a switch. That too has limitations but has worked better for us.

(The validator does not decide validity based on which generator generated the code.)

seanbax · September 11, 2020, 6:38pm

At this point I’d rather give up than keep at it. Is there a way to get spirv-opt to structurize my IR?

seanbax · September 11, 2020, 7:17pm

I see the distinction you’re making: the merge block of the selection is part of the continue construct of the loop, not the loop construct. However I don’t see why the continue construct is disjoint from the loop construct. Control flow from the if-statement necessarily passes to block %23, where the threads will reconverge, then hits OpBranch %13 and goes back to the loop header. I don’t see why %23 is considered outside of the loop construct when it’ll be visited during every iteration of the loop. It terminates with a back-edge, but that doesn’t effect the stuff that executes above it. I suppose the answer is “that’s just what the spec says.”

If I were to just break out the back-edge OpBranch and make that its own block, that would result in valid code? This disjoint aspect seems contrived.

The proximate cause of my invalid code is my simplify-cfg pass. My codegen does seem to lower enclosed selections so that they entirely lie within the enclosing loop construct. But the LLVM-like simplify-cfg is too eager.

https://llvm.org/docs/Passes.html#simplifycfg-simplify-the-cfg

What are the additional semantics for a SPIR-V simplify-cfg? I just added checks that neither of the two blocks being merged are both merge or continue targets, but I suspect that’s an insufficient defense.

dneto · September 11, 2020, 7:52pm

Is there a way to get spirv-opt to structurize my IR?

It can’t legalize for you, as without it already being valid, we can’t tell what you intended in the first place. The optimizer is only intended to operate on already-valid code (that passes validity checks).

However I don’t see why the continue construct is disjoint from the loop construct.

It’s from the definition:

a loop construct : includes the blocks dominated by a loop header, while excluding both that header’s continue construct and the blocks dominated by the loop’s merge block

It’s not what you expected, probably. But they are self-consistent definitions.
If there are no loop breaks or returns, then on each iteration of the loop that reaches the backedge, both the loop construct and the continue construct are executed. Pretend that “loop construct” was renamed “first-part-of-loop construct”.
So yes, in spirit “the spec says what it says”, and it’s not what you were expecting.

Here’s another exercise: A single block loop is valid:

 %loop_head = OpLabel
 OpLoopMerge %merge %loop_head None
 OpBranchConditional %cond %loop_head %merge

%merge = OpLabel  ....

In this case the “loop construct” is empty (surprise!), and the continue construct consists of the %loop_head block. This fits the definitions.

If I were to just break out the back-edge OpBranch and make that its own block, that would result in valid code? This disjoint aspect seems contrived.

The continue construct can have many blocks in it. The only constraints are that the continue target block dominates the backedge block, and the backedge block post-dominates the continue target.
Why on earth did we do that? Well, it’s so that the definitions continue to work even if the continue construct included a call to a complex (multi-block) function, and some step in a toolchain inlined that function.
That might look contrived, but it’s exceedingly common for shader compilers to fully inline the whole call tree. So SPIR-V needed to be resilient to that pattern, while still capturing the essential structure.

The proximate cause of my invalid code is my simplify-cfg pass.

There you go. LLVM is a CPU compiler, in essence. It doesn’t have to concern itself with uniformity concerns that end up being critical to GPU compilation. There are all manner of challenges when you let LLVM mutate the CFG. You very easily lose essential information. For example, this is why the SPIR-V backend to DXC completely avoids generating LLVM IR.

Lastly, you mentioned reconvergence.
See the example at Introduce Subgroup Operations Extension by mehmetoguzderin · Pull Request #954 · gpuweb/gpuweb · GitHub to see how intuition about reconvergence is often mistaken, particularly when subgroup operations are involved.