## By Alexander Yee

(Last updated: February 4, 2019)

Back To:

For bignum libraries that use a binary representation, there are generally two types of conversions:

Radix-to-Binary is needed for converting string input to internal representation. Binary-to-Radix converts internal representation to readable strings.

Radix-to-Binary is faster and easier to implement.

Since y-cruncher only outputs digits, it only has Binary-to-Radix conversion. Nevertheless, this article will cover both directions.

Basecase Algorithms for Integers

Given an array A that represents a integer in an arbitrary base:

Number = A + A*base1 + A*base2 + A*base3 + ...

Converting it to binary is as simple as evaluating this polynomial in binary representation. The done easily using Horner's Method.

Working Example: Base 10 -> Base 16

Input: "68312548"

x = 6 = 616

x = x * 10 + 8 = 4416

x = x * 10 + 3 = 2ab16

x = x * 10 + 1 = 1aaf16

x = x * 10 + 2 = 10ad816

x = x * 10 + 5 = a6c7516

x = x * 10 + 4 = 683c9616

x = x * 10 + 8 = 4125de416

This algorithm is simple and runs in O( n2 ) time.

Converting from binary to radix works the same, but in reverse. It uses division and modulus rather than multiply-add.

Working Example: Base 16 -> Base 10

Input: 4125de416

x = 4125de416

out = x % 10 = 8

x = x / 10 = 683c9616

out = x % 10 = 4

x = x / 10 = a6c7516

out = x % 10 = 5

x = x / 10 = 10ad816

out = x % 10 = 2

x = x / 10 = 1aaf16

out = x % 10 = 1

x = x / 10 = 2ab16

out = x % 10 = 3

x = x / 10 = 4416

out = x % 10 = 8

x = x / 10 = 616

out = x % 10 = 6

out = {8,4,5,2,1,3,8,6}

Reversing this array gives "68312548" which is the original string from the first example.

Since this conversion uses division, it is significantly slower than the radix-to-binary algorithm.

Nevertheless, it still runs in O( n2 ) time.

Fast Algorithms for Integers

The O( n2 ) basecase algorithms from the previous section can both be generalized into Divide-and-Conquer algorithms that run in O( log(n) * M(n) ) time.

If we assume that multiplication is M(n) = n log(n), then both directions of radix conversions can be done in O( n log(n)2 ).

It's unknown when the Divide-and-Conquer radix conversion was first discovered, but it is detailed in Richard Brent's Modern Computer Arithmetic.

Today, these Divide-and-Conquer algorithms are widely used in bignum libraries including GMP.

The basecase algorithms work by converting one digit at a time. For the Divide-and-Conquer version, we convert (n/2) digits at once and do it recursively. The method works for both directions of conversion (binary <--> radix).

1. Split the number in half with N/2 digits each: {high, low}
2. Recursively convert low and high into binary representation.
3. Compute: X = low + high * bN/2

The final result X will be the original number converted into binary representation.

The binary -> radix conversion is the same but in reverse.

Start with an N-digit number X in base 16. You wish to convert this into an M-digit number R in base b.

1. Compute: high = floor( X / bM/2 )
2. Compute: low = X - bM/2 * high
3. Recursively convert low and high. Append the results.

The final result R will be the original number converted into base b.

Complexity Analysis:

These Divide-and-Conquer recursions are of logarithmic depth.

• The radix -> binary algorithm uses multiplications and therefore has a complexity of O( log(n) * M(n) ).
• The binary -> radix algorithm uses divisions which is slower but has the same complexity as multiplication. Therefore the complexity is the same.

Therefore, radix conversions to and from binary can both be done in O( log(n) * M(n) ) time. O( n log(n)2 ) if we assume M(n) = n log(n) multiplication.

Although both conversions have the same complexity, binary -> radix is a lot slower because it uses divisions.

Optimizations:

The most basic optimization is to stop the recursion below a (tunable) threshold and switch to the basecase algorithms. In addition to that:

• Both directions of conversions involve using the same powers bM/2 many times. Therefore, these can be computed once and re-used multiple times.
• The binary -> radix conversion requires division by the same powers. This can be improved by pre-computing the reciprocal of the powers. This converts the divisions into multiplications.

After applying these optimizations:

• The radix -> binary conversion requires 1 multiplication of size N/2 per recursive call.
• The binary -> radix conversion requires 2 multiplications of size N/2 per recursive call.

So the binary -> radix conversion is slower by a factor of two. In practice, the time needed to precompute the powers and reciprocals of the powers is not negligible. Therefore the binary -> radix conversion is more than 2x slower than a radix -> binary conversion.

Assuming FFT-based multiplication, one final optimization is to precompute the forward transforms of the powers and their reciprocals. While this can bring up to an additional ~30% improvement, it significantly increases the memory requirement.

Floating-Point Conversions

Conversions of floating-point numbers can be done using only integer conversions using a bit of math.

A floating-point number will be one of the following forms:

 Form Readable Representation Example Integer xxxxxxxxx.0 123456789.0 Fractional 0.xxxxxxxxx 0.123456789 Split xxxxx.xxxxx 1234.56789 Large xxxxx0000000000000 ... 000000.0 123456789 * 101000 Small 0.00000000000000 ... 0000xxxxx 123456789 * 10-1000

All the constants that y-cruncher can compute are of the "fractional" form or the "split" form with an small integer part. Therefore y-cruncher is only equipped to handle conversions of the "fractional" form. Nevertheless, we will discuss methods that will handle any form.

There are many ways to convert floating-point numbers with varying levels of complexity and performance. Here we will only describe one approach for each direction.

Radix-to-Binary conversions are fairly easy. You simply evaluate the expression using the native (binary) arithmetic.

For example 123456789 * 101000 can be converted by first converting all of the integers in the expression:

• 123456789
• 10
• 1000

The powering can be done using Exponentiation by Squaring. If the exponent is negative, ignore the sign and take the reciprocal.

The only tricky part are the split expressions: "1234.56789"

These can be broken up into:

1234 + 56789 * 10-5

So again it breaks down to easy math.

The goal is to convert: (using base 10 and 16 as examples)

BigFloat = (a0*160 + a1*161 + a2*162 + ... ) * 16p

to

BigFloat = (b0*100 + b1*101 + b2*102 + ... ) * 10q

The "mantissa" portion is merely an integer. We already know how to convert that from the previous sections. But the difficult part is the 16p term.

In fact, there is no way to convert it. So instead, the trick is to multiply the entire number by a power-of-ten such that 16p disappears.

(a0*160 + a1*161 + a2*162 + ... ) * 16p * 10s    -->    (a'0*160 + a'1*161 + a'2*162 + ... ) * 10s

Now that 16p has been eliminated, we can convert the mantissa as an integer. The 10s becomes the decimal exponent.

Finding the right scaling factor 10s will require some logarithmic math. Essentially the goal is to solve the equation: 16p = 10-s as closely as possible using an integer s.

s = -log10(16) * p

The choice on how to round s to the nearest integer is implementation specific. Being off by one is common. So it is often necessary to shift the mantissa to eliminate the last factor of 16 or 16-1 before performing the integer conversion on the mantissa.

Working Example:

 1 Start: x = 4125de416 * 16-100 2 Find Scaling Factor: s = -log10(16) * -100 = 120.412 ~ 120 3 Multiply by Scaling Factor: x * 10120 = 193aa8616 * 160 4 Convert Mantissa: 193aa8616 = 26454662 5 Output: x = 26454662 * 10-120

Choosing how to display the final result (decimal? scientific?) is of personal preference and will not be covered here.

Fractional Part Algorithms

All the radix conversions method discussed so far all boil down to an integer conversion. But here, things start to get interesting with fractional-part algorithms.

This is the class of conversion algorithms used by y-cruncher and Fabrice Bellard's TachusPi which have both set world records in computations of Pi.

Fractional-part algorithms are largely unknown. Fabrice Bellard seems to be the first to use it when he computed 2.7 trillion digits of Pi in 2009.

But despite this world record and all the publicity it generated, fractional-part conversion algorithms are still largely unused. None of the major bignum libraries use it.

Integer algorithms convert between: bbbbb.0 <--> rrrrr.0

Fractional-part algorithms convert: 0.bbbbb -> 0.rrrrr

Below is a table detailing the different conversion algorithms:

 Algorithm Type Direction Number Forms Primary Operation Integer Radix -> Binary rrrrr.0 -> bbbbb.0 Multiplication Integer Binary -> Radix bbbbb.0 -> rrrrr.0 Division + Modulus Fractional Part Binary -> Radix 0.bbbbb -> 0.rrrrr Multiplication Fractional Part Radix -> Binary 0.rrrrr -> 0.bbbbb Division + Modulus

The power of the fractional-part algorithm is that it allows Binary -> Radix conversion to be done using multiplications rather than divisions. Furthermore, no scaling is needed to output numbers like 3.14159... since they are already of the "fractional" form.

Here we describe a simple O( n2 ) fractional-part conversion algorithm.

Converting from binary to radix involves using repeated multiplications. At each step, you multiply by the radix. The integer part becomes the next digit of the output. Chop off the integer part and repeat until you have all the digits you want.

Working Example:

x = 0.4125de416

x = x * 10 = 2.8b7aae816

out = Floor(x) = 2

x = x * 10 = 5.72cad1016

out = Floor(x) = 5

x = x * 10 = 4.72cad1016

out = Floor(x) = 4

x = x * 10 = 4.d739a4016

out = Floor(x) = 4

x = x * 10 = 8.684068016

out = Floor(x) = 8

x = x * 10 = 4.128410016

out = Floor(x) = 4

x = x * 10 = 0.b928a0016

out = Floor(x) = 0

out = {2,5,4,4,8,4,0} = 0.2544840

This approach can be thought of as a "mirror image" of the integer Binary -> Radix algorithm that uses divisions to generate each additional digit.

While this method also runs in O( n2 ) time, it is much faster (smaller Big-O) since it uses multiplications rather than division.

The major drawback of the fractional part algorithm is that it is inherently a floating-point algorithm. So results are inexact and error-analysis is required to determine how many digits are actually correct.

A Radix -> Binary algorithm can be devised as an exact reverse of the Binary -> Radix algorithm. But since it uses divisions, it is slower than the integer algorithm. Therefore it is not particularly interesting except for academic purposes.

Floating-Point Conversions:

Earlier, we described how converting an arbitrary floating-point number can be reduced to only integer conversions using scaling.

The same approach can be used to reduce an arbitrary floating-point conversion to use only fractional-part conversions.

Scaled Remainder Tree

Just like the integer conversion algorithms, it is possible to derive Divide-and-Conquer versions of the fractional-part algorithms. The intuition is the same, rather than converting one digit at a time, we recursively convert N/2 digits.

Background:

When Fabrice Bellard computed 2.7 trillion digits of Pi, he used a conversion algorithm which he called, the "Bernstein Scaled Remainder Tree" - possibly eluding to this publication by Daniel Bernstein.

Fabrice Bellard left no details about his implementation or the algorithm. So we don't know precisely what it is. But there is strong evidence to suggest that the

"Scaled Remainder Tree" is in fact the Divide-and-Conquer version of the fractional-part conversion algorithm.

1. TachusPi's radix conversion exhibits CPU/memory usage patterns that are consistent with a non-parallelized infix recursive Divide-and-Conquer algorithm.
2. The fractional-part algorithm can be either infix or prefix recursive. Conventional algorithms can only be prefix recursive.
3. The fractional-part algorithm has the same operation count and optimization opportunities as described in Fabrice Bellard's paper.
4. When attempting to parallelize y-cruncher's implementation, there was a major issue involving unbounded memory usage. Eventually, an elaborate work-around was found - but not before breaking through layers of design encapsulation and redesigning the entire algorithm. This begs the question of whether the same issue prevented Fabrice Bellard from parallelizing his implementation.

A recent publication seems to also suggest that the Scaled Remainder Tree and the Divide-and-Conquer fractional-part algorithm are the same thing.

Performance Analysis:

The Scaled Remainder Tree (SRT) has a similar recursive structure as the Divide-and-Conquer integer algorithms. And it only requires one multiplication of size (3/4)N at each stage. However, SRT's only multiplication can be done using FFT Middle-Product wrap-around. This reduces the effective size of that multiply to N/2. Furthermore, SRT does not require computation of the reciprocals that the optimized integer binary -> radix algorithm needs.

The result is that SRT is more than 2x the speed of the (conventional) integer binary -> radix algorithm.

When y-cruncher switched from the integer algorithm (with precomputed powers/reciprocals) to SRT, its binary -> radix conversion improved by a factor of 2.5x.

Implementations:

The Scaled Remainder Tree is still a relatively new and little-known algorithm. While there seem to be plenty of researchers who have experimental implementations, as of 2014, y-cruncher and TachusPi appear to be the only fully working implementations that are in production.

• Scaled Remainder Tree
• Middle Product
• Swap Mode (out-of-core) Computation
• possibly others...

• Scaled Remainder Tree
• Middle Product
• Removal of Trailing Zeros
• Precomputation of the forward FFTs of the powers.
• Parallelization with bounded memory usage.
• Swap Mode (out-of-core) Computation

A fully optimized and parallelized Scaled Remainder Tree is actually quite difficult to do:

• The inexact nature of the algorithm means there are nasty corner cases involving carryout.
• Middle product requires support from the large multiplication and is full of corner cases.
• Precomputation of FFTs requires a transform-only interface for the large multiplication.
• Removal of trailing zeros brings a sizable speedup to bases like 10. But it complicates the middle product operations.
• Parallelizing the algorithm the naive way will lead to O(N log(p)) memory usage (p is the # of processors). Getting it down to O(N) without sacrificing performance will screw up everything unless you design for it from the very beginning.

Putting it Together

The integer algorithms are best at Radix -> Binary conversions.

The fractional-part algorithms are best at Binary -> Radix conversions.

So in an optimal bignum implementation:

• Radix -> Binary conversions are done by scaling to an integer and using the integer conversion algorithms.
• Binary -> Radix conversions are done by scaling to a fractional-part number and using the fractional-part algorithms.