Understanding OpPhi and its Performance

I have a question regarding the OpPhi instruction.

Following the standard:

https://www.khronos.org/registry/SPIR-V/specs/unified1/SPIRV.html#OpPhi

3.42.17. Control-Flow Instructions

Within a block, this instruction must appear before all non-OpPhi instructions (except for OpLine and OpNoLine, which can be mixed with OpPhi ).

A) Does this mean that OpPhi must be always placed at the very beginning of a basic block?

B) Is there any report/paper about performance analysis of the impact of OpPhi by removing the need for loads/stores?

Thanks, any pointers will be appreciated.
Juan

For thread context…

1 Like

OpPhi is not a performance function. It’s a common tool for static single assignment (SSA) languages when dealing with conditional branching.

In SSA, each variable a function creates can only ever be written to once. But if you want to conceptually have a conditional branch that writes to a variable based on that condition, that seems impossible with SSA. You would have to have two writes to the same variable, which SSA does not allow.

The way SSA languages works around this is with the “Phi” operation. In the two branches, each branch writes to a different ResultID. After the branches, the Phi function writes to a third ResultID, using data read from ResultIDs generated in the two branches. Which one’s fills data fills in the third ResultID is determined by which branch was taken at runtime. In effect, the Phi instruction merges results in the two branches.

SPIR-V’s way of handling Phi operations deals with blocks. You use a conditional branch instruction to jump into some number of blocks. At the end of both blocks, you use an unconditional branch to jump into the same subsequent block. Within that block, you execute OpPhi instructions to merge the results from the conditional branch together.

So OpPhi being at the start of a block also means that OpPhi instructions are always the first thing you do after a conditional branch. This avoids situations where the compiler has to look ahead in the branching structure to see where you’re merging conditional branching results.

It isn’t there to “remove the need for loads/stores”. It’s a tool that makes SSA work.

1 Like

Thank you very much @Alfonse_Reinheart . This actually solves my questions.

Just to give a bit of context, I am working on a tool that generates this type of structure:

%B1_kernel0 = OpLabel 
        %55 = OpPhi %uint {%51 %B0_kernel0} {%54 %B2_kernel0} 
        %56 = OpSLessThan %bool %55 %25
              OpBranchConditional %56 %B2_kernel0 %B3_kernel0 
%B2_kernel0 = OpLabel 
        ... // LOOP BODY
        %54 = OpIAdd %uint %68 %55
              OpBranch %B1_kernel0
%B3_kernel0 = OpLabel 
              ... 

Spirv-validator passes and execution on Intel HD graphics seems to be correct. But I was wondering if I could place the OpPhi in different a different position. But as you mention, it should be possible to create a new block, merge results with an OpPhi and then jump accordingly.

Thanks