Fast Distance Algorithm

Hi, I am trying to optimize a particle system that uses several distance equation calculations for each particle.


I am wondering if it possible to get a relatively similar result without using the square root function. The distance function doesn’t have to be exactly accurate, only close.

Use aa and bb instead of using pow(). And use the distance squared instead if you can. Sorting by distance for example, the actual distance is not needed, the distance squared will work.

Some people approximate with max( a, b ). This simulates the behaviour that if one of the components is large, the other has little impact.

There’s a quick-and-dirty approximate distance formula in one of the Graphics Gems books. They are at home and I’m at work right now, so I can’t post it right now

Some processors (e.g. PPC) have fast reciprocal sqrt estimate instructions. So you can replace:

float factor = aa + bb;
float norm = 1.0f/sqrt (factor);


float factor = aa + bb;
float norm = __frsqrte(factor);

which is thousands of times faster at the cost of precision.

You can improve the precision cheaply by using the Newton-Raphson method:

norm = (1.5f - (0.5ffactor * norm * norm));

typically one iteration is enough for on-screen graphics precision.

you could use the manhattan (aka city block distance). this looks like:

dist = abs(a) + abs(b)

which looks like this from the x

3 2 3

3 2 1 2 3
3 2 1 x 1 2 3
3 2 1 2 3
3 2 3


For what it’s worth…

I’ve written just about every kind of sqrt approximation under the sun over the last 10 years or so. My favorite is forcibly converting an already-floating-point value using int-to-float, which is essentially the same as taking the log to base 2. Multiply by a power, add in a “magic number”, and convert back to float and you’ve basically got a good approximation of raising to an arbitrary float power very cheaply. For sqrt, you’d use a power of 0.5.

These days when I need a sqrt, I just use plain ol’ sqrt(), being careful that the compiler optimizations are enabled to translate this into the fpu’s sqrt instruction instead of emulating. The sqrt is relatively expensive as a single instruction, but it’s almost always better than 4 or 5 simpler instructions, especially if there are memory fetches involved, and it will eat up fewer registers. Except for the most rudimentary approximations, the full-precision CPU instruction will be faster. When you consider that you’re just going to be bottlenecked by cache fills anyway, why bother.

So, long story short, sqrt(aa + bb) will be about as fast as it gets assuming math optimizations are properly enabled.

As previously mentioned, many times sqrt isn’t needed. Gravity, for example, is proportional to distance^2, and sqrt(aa + bb)^2 == aa + bb.

Correction: gravity is inversely proportional to distance^2.

[This message has been edited by deshfrudu (edited 01-06-2004).]

Originally posted by deshfrudu:
When you consider that you’re just going to be bottlenecked by cache fills anyway, why bother.
[This message has been edited by deshfrudu (edited 01-06-2004).]

I am somewhat shocked each time I read it.

Anyway, I would just say that if I am not wrong, SSE has a packed float square root and packed float inverse square root instruction. Maybe you can pull out something for that, provided you want to use SIMD of curse.