(Last updated: February 5, 2024)

**Back To:**

y-cruncher uses Newton's Method iteration for performing the Inverse Square Root.

This operation is needed for the following constants:

- Square Root of x
- Golden Ratio
- Pi - Chudnovsky/Ramanujan

Find:

Start with an initial guess:

Iterate:

Since this is a 1st order iteration, the number of correct digits doubles with each iteration.

There are several ways to arrange the formula. This particular arrangement was chosen because *r _{n}^{2} x - 1* has destructive cancellation of roughly half the digits. This cancellation reduces the size of the final multiply by

In y-cruncher, the initial "guess" is obtained by simply calling std::sqrt() in <cmath> which gives 53 bits.

At each stage, y-cruncher's implementation will choose either of the following iterations:

Which it chooses depends on whether *r _{n}* turns out to be an under-estimate or an over-estimate of the correct answer. If it is an under-estimate, it uses the second formula to avoid negative numbers. Over-estimates are faster because subtraction

Therefore, y-cruncher's implementation will purturb the result of each iteration slightly to increase the chances that it will be an overestimate for the next iteration.

Lastly, there are two multiplications by *r _{n}*.

*r*_{n}^{2}*r*_{n}* (...)

Since the multiplies are the same size, y-cruncher reuses the forward FFT for *r _{n}*.

**Complexity (Asymptotic):**

Inverse Square Root: *O( n log(n) )*

The complexity of each iteration is ** O( M(n) )**. Since Newton's Method is a self-correcting algorithm, full precision is not needed thoughout the iteration. Therefore the sum of all the iterations is a geometric sequence resulting in the same complexity:

If we assume that multiplication is ** M(n) = n log(n)**, then the entire operation is

**Complexity (Practical):**

y-cruncher's inverse square root for small x is currently:

Memory: *1.33 M(n)*

Swap: **1.50 M(n)**

Each iteration contains:

- 1 Half-precision square
- 1 Half-precision multiply
- 2 Add/Subtracts (negligible)
- 1 Divide-by-two (negligible)

One forward FFT is reused for a total of 2 forward and 2 inverse FFTs. y-cruncher also has an implementation for large x, but since those aren't needed for Pi, it's poorly optimized and won't be covered here.

For memory computations, we can assume that forward and inverse FFTs are of equal cost. Therefore a multiplication is equal to 3 FFTs.

For swap computations, the inverse FFT is about 2x the cost of a forward FFT. Therefore a multiplication is approximately 4 forward FFTs.

Operation |
Size |
Quantity |
Total Cost |
Total Cost (Memory) |
Total Cost (Swap) |

Forward FFT | 0.5 p | 2 | 2 * F(0.5 p) | 0.33 * M(p) | 0.25 * M(p) |

Inverse FFT | 0.5 p | 2 | 2 * I(0.5 p) | 0.33 * M(p) | 0.50 * M(p) |

Add/Subtract | 1.0 p | 1 | - | - | - |

Divide by 2 | 1.0 p | 1 | - | - | - |

Total | 0.66 * M(p) | 0.75 * M(p) |

Summing these up through a binary geometric sequence gives exactly twice these numbers.