Is there a way to get spirv-opt to structurize my IR?
It can’t legalize for you, as without it already being valid, we can’t tell what you intended in the first place. The optimizer is only intended to operate on already-valid code (that passes validity checks).
However I don’t see why the continue construct is disjoint from the loop construct.
It’s from the definition:
a loop construct : includes the blocks dominated by a loop header, while excluding both that header’s continue construct and the blocks dominated by the loop’s merge block
It’s not what you expected, probably. But they are self-consistent definitions.
If there are no loop breaks or returns, then on each iteration of the loop that reaches the backedge, both the loop construct and the continue construct are executed. Pretend that “loop construct” was renamed “first-part-of-loop construct”.
So yes, in spirit “the spec says what it says”, and it’s not what you were expecting.
Here’s another exercise: A single block loop is valid:
%loop_head = OpLabel
OpLoopMerge %merge %loop_head None
OpBranchConditional %cond %loop_head %merge
%merge = OpLabel ....
In this case the “loop construct” is empty (surprise!), and the continue construct consists of the %loop_head block. This fits the definitions.
If I were to just break out the back-edge OpBranch and make that its own block, that would result in valid code? This disjoint aspect seems contrived.
The continue construct can have many blocks in it. The only constraints are that the continue target block dominates the backedge block, and the backedge block post-dominates the continue target.
Why on earth did we do that? Well, it’s so that the definitions continue to work even if the continue construct included a call to a complex (multi-block) function, and some step in a toolchain inlined that function.
That might look contrived, but it’s exceedingly common for shader compilers to fully inline the whole call tree. So SPIR-V needed to be resilient to that pattern, while still capturing the essential structure.
The proximate cause of my invalid code is my simplify-cfg pass.
There you go. LLVM is a CPU compiler, in essence. It doesn’t have to concern itself with uniformity concerns that end up being critical to GPU compilation. There are all manner of challenges when you let LLVM mutate the CFG. You very easily lose essential information. For example, this is why the SPIR-V backend to DXC completely avoids generating LLVM IR.
Lastly, you mentioned reconvergence.
See the example at Introduce Subgroup Operations Extension by mehmetoguzderin · Pull Request #954 · gpuweb/gpuweb · GitHub to see how intuition about reconvergence is often mistaken, particularly when subgroup operations are involved.