Hi,
I have a question regarding the behavior of the unary negation operator -x for floating-point operands x.
Q: May the float unary operation be flushed to zero or not?
In the OpenCL30 spec, it’s specified: “Denormalized single precision floating-point numbers passed as input or produced as the output of single precision floating-point operations such as add, sub, mul, divide, and the functions defined in math functions, common functions, and geometric functions may be flushed to zero”.
Take for example the following two pieces of code:
Code 1:
float a = 1.4013e-44f; /* Denormal */
float b = 0.0f - a;
Code 2:
float a = 1.4013e-44f; /* Denormal */
float b = -a;
If I understand what’s written in the specification correctly, if ‘a’ is a denormal and denormals are flushed to zero:
- In Code 1, ‘b’ will be 0.0f as it’s a binary float subtract operation.
- In Code 2, it’s unclear what the behavior should be.
I couldn’t find anywhere in the Spec an indication to what the output of Code 2 should be in case denormals are flushed to zero.
If we compile the above 2 pieces of code to LLVM, we get that the operation in Code1 is mapped to an “fsub” operation, and the operation in Code2 is mapped to an “fneg” operation.
The following OpenCL code:
float Code1(float a) {
return 0.0f - a;
}
float Code2(float a) {
return -a;
}
is compiled to:
define dso_local noundef float @Code1(float noundef %0) local_unnamed_addr #0 {
%2 = fsub float 0.000000e+00, %0
ret float %2
}
define dso_local noundef float @Code2(float noundef %0) local_unnamed_addr #0 {
%2 = fneg float %0
ret float %2
}
According to the LLVM spec, fneg is considered a bit manipulation operation, therefore, it’s not allowed to flush inputs to zero. Quoting the LLVM spec:
“For the purpose of this section, bitcast
as well as the following operations are not “floating-point math operations”: fneg
, llvm.fabs
, and llvm.copysign
. These operations act directly on the underlying bit representation and never change anything except possibly for the sign bit”.
All the above make me believe that the value of ‘b’ in Code2 shouldn’t be 0.0f as in Code1.
Am I correct?