# Maybe a bug in VC++

Did anyone encounter the following unknown result from VC++?

Assume both “var3” and “plane.x_low_bound” are double floating point variables and are absolutely the same numerically.

if (var3 == plane.x_low_bound)

“var3 == plane.x_low_bound” returns “false”

Yes, this is a common problem. It is due to imprecession in floating point numbers. Performing 2 difference calcultions that should theoretically arrive at the same result can actually have slightly different results.

And due to the way the IDE rounds values before displaying them in the watch, the two values may even look the same. For instance, one value may be 1.000000 and the other value may be 1.000001, but the IDE will show them both as 1.000, so they look equal.

The trick to testing floating point numbers for equality is to accept that the numbers may have a slight bit of error in them, and then see if the difference between these numbers is less than a set threshold.

EX:
if ((var1-var2) < 0.000001)
cout << “var1 and var 2 are equal” << endl;
else
cout << “var1 and var 2 are NOT equal” << endl;

[This message has been edited by LordKronos (edited 01-21-2001).]

Oh, Lord Kronos, I think it’s just because you write so fast…
Actually your method works only if the first floating point value is bigger than the second.

it has to be:

#define ALLOWED_DELTA 0.0000001f
#define ARE_FLOATS_EQUAL(x,y)
(((x-y)<ALLOWED_DELTA) &&
(((y-x)<ALLOWED_DELTA) )

Maybe that’s after all worng too…

Originally posted by Michael Steinberg:
Oh, Lord Kronos, I think it’s just because you write so fast…
Actually your method works only if the first floating point value is bigger than the second.

In the words of a very wise man named Homer…Doh!

Yes, you are right, I was typing faster than I was thinking. I meant to do

(fabsf(var1-var2) < 0.000001f)

But your way will work too. Not sure which is faster though. I tend to trust the library functions, hoping that they will use something faster (like hardware instructions) that doesn’t involve conditionals (which cause pipeline stalls in the CPU). Using VC++, trusting the libraries and the optimizer usually pays off, but Im not sure about this case.

[This message has been edited by LordKronos (edited 01-21-2001).]

Library functions are always that fast, they tend to like to set FPU state and stuff instead of just doing what they’re supposed to do. If you want it fast, you should write a small assembler functions and use them instead to the library ones. Like this:

#pragma warning(disable: 4035)
float sin(float x){
__asm {
FLD x
FSIN
}
}

float cos(float x){
__asm {
FLD x
FCOS
}
}

float sqrt(float x){
__asm {
FLD x
FSQRT
}
}

float fabs(float x){
__asm {
FLD x
FABS
}
}

> (fabs(var1-var2) < 0.000001f)
this version is faster: floating point comparison is slow.

It may be even faster to compare as integer numbers:

float small_delta = 0.000001f; // must be >=0
//===============
float fcmptemp = fabs(var1-var2);
if((int)&fcmptemp < (int)&small_delta)
{
}

float fabs(float x){
__asm {
FLD x
FABS
}
}

This replacement is slower then regular fabs().
(for VC with optimization).

Originally posted by Serge K:
This replacement is slower then regular fabs().
(for VC with optimization).

Actually, his version of fabs (and the other functions) dont do anything, since they dont return any values.

Actually, function should return floating point result on top of the FPU stack.

Floating Point Coprocessor and Calling Conventions

Originally posted by Serge K:

Actually, function should return floating point result on top of the FPU stack.

Nope, when I tried to compile his fabs function, VC++ complained to me about not returning a value and refused to compile it. I dont really know much about integrating assembly into VC++ code, so maybe there is some keyword that you need to use to indicate the function returns it’s value on the FPU stack, but as it stands it wouldnt compile for me.

[This message has been edited by LordKronos (edited 01-22-2001).]

Is there any printable book in the internet which teaches assembler language very well? However, I was always working platform independent. Maybe my version is faster than the normal fabs function, where the assembler optimization only works on x86. Maybe not?

Originally posted by LordKronos:
Nope, when I tried to compile his fabs function, VC++ complained to me about not returning a value and refused to compile it.

Nonsense.
With #pragma warning(disable: 4035) you should not have any warnings about these functions.

BUT if you’re using <math.h>, you have to rename cos, sin, fabs, … to something different.

Originally posted by LordKronos:
maybe there is some keyword that you need to use to indicate the function returns it’s value on the FPU stack

No. It is the standard calling convention for x86.

Originally posted by Serge K:
Nonsense.
With #pragma warning(disable: 4035) you should not have any warnings about these functions.

OK, my mistake. It generated a warning, not an error, so it can compile (maybe I had another error in my code that I confused with it, or maybe I was just being dumb).

OK, quick test using VC++ compiling with full optimizations. Called fabs in a loop 1,000,000,000 times. As it turns out, the library version of fabs is about 5 times faster than Humus’ version of fabs. However, using the debug build, Humus’ version is about 40% faster than the VC++ library version of fabs, so thats probably why he thought it was faster.

Guess this just helps to reinforce the idea that you should NEVER try to optimize your code based on the timings taken in the debug build. I remember one time long ago when I spent about 2 hours optimizing some code. It seemed that my optimizations had made the code about 3 times faster. Then when I switched to release build, my optimized release build was the exact same speed as the unoptimized release build. You can bet I havent made that mistake since then.

Somehow, this discussion got way off topic

I’d like to see that library function…

Okay, I did a little benchmark with my own inline assembler fabs function.
In the worst case scenario (a rand for the float, so the compiler can’t optimize too much for the fabs function) mine takes 46 seconds for a great number (it’s too great than I could remember) whereas fabs takes 38 seconds. (The code to bench is exactly the same for both I believe).
If anybody could tell me how to only get a single byte from the float into a register that could optimize it a bit more, I guess.

Forgot __inline … those speed up considerably. I haven’t benchmarked it much but with __inline I get essentially the same speed as the release libs, except for fabs which was much slower (compiler perhaps just does a AND with 0x7FFFFFFF). The reason there’s no speed improvement is because the FPU will still internally work with doubles, just as the lib functions. So, you should set the FPU to float precision and then use those functions. Then you’ll see speed improvements. For inline assembler to really pay off you should however use longer functions than those since the function calling overhead may destroy all performance gains.

Hey, I’m actually also only doing that and masking out the sign bit. But mine is as said a bit slower than the compiler. (it is inline).

I wonder how far a compiler optimizes a __inline call. Does anyone know if it still pushes the arguments on the stack or does it store them in registers?