(Last updated: February 4, 2019)

**Back To:**

y-cruncher uses Newton's Method iteration for performing the Inverse Square Root.

This operation is needed for the following constants:

- Square Root of x
- Golden Ratio
- Pi - Chudnovsky/Ramanujan

Find:

Start with an initial guess:

Iterate:

Since this is a 1st order iteration, the number of correct digits doubles with each iteration.

There are several ways to arrange the formula. This particular arrangement was chosen because *r _{n}^{2} x - 1* has destructive cancellation of roughly half the digits. This cancellation reduces the size of the final multiply by

In y-cruncher, the initial "guess" is obtained by simply calling std::sqrt() in <cmath> which gives 53 bits.

At each stage, y-cruncher's implementation will choose either of the following iterations:

Which it chooses depends on whether *r _{n}* turns out to be an under-estimate or an over-estimate of the correct answer. If it is an under-estimate, it uses the second formula to avoid negative numbers. Over-estimates are faster because subtraction

Therefore, y-cruncher's implementation will purturb the result of each iteration slightly to increase the chances that it will be an overestimate for the next iteration.

Lastly, there are two multiplications by *r _{n}*.

*r*_{n}^{2}*r*_{n}* (...)

Since multiplies are the same size, y-cruncher reuses the forward FFT for *r _{n}*.

**Complexity (Asymptotic):**

Inverse Square Root: *O( n log(n) )*

The complexity of each iteration is ** O( M(n) )**. Since Newton's Method is a self-correcting algorithm, full precision is not needed thoughout the iteration. Therefore the sum of all the iterations is a geometric sequence resulting in the same complexity:

If we assume that multiplication is ** M(n) = n log(n)**, then the entire operation is

**Complexity (Practical):**

Memory: *1.33 M(n)*

Swap: **1.50 M(n)**

Each iteration contains:

- 1 Half-precision square
- 1 Half-precision multiply
- 2 Add/Subtracts (negligible)
- 1 Divide-by-two (negligible)

One forward FFT is reused for a total of 2 forward and 2 inverse FFTs.

For memory computations, we can assume that forward and inverse FFTs are of equal cost. Therefore a multiplication is equal to 3 FFTs.

For swap computations, the inverse FFT is about 2x the cost of a forward FFT. Therefore a multiplication is approximately 4 forward FFTs.

Operation |
Size |
Quantity |
Total Cost |
Total Cost (Memory) |
Total Cost (Swap) |

Forward FFT | 0.5 p | 2 | 2 * F(0.5 p) | 0.33 * M(p) | 0.25 * M(p) |

Inverse FFT | 0.5 p | 2 | 2 * I(0.5 p) | 0.33 * M(p) | 0.50 * M(p) |

Add/Subtract | 1.0 p | 1 | - | - | - |

Divide by 2 | 1.0 p | 1 | - | - | - |

Total | 0.66 * M(p) | 0.75 * M(p) |

Summing these up through a binary geometric sequence gives exactly twice these numbers.