y-cruncher - A Multi-Threaded Pi-Program

From a high-school project that went a little too far...

By Alexander J. Yee

(Last updated: November 4, 2018)

 

Shortcuts:

 

The first scalable multi-threaded Pi-benchmark for multi-core systems...

 

How fast can your computer compute Pi?

 

y-cruncher is a program that can compute Pi and other constants to trillions of digits.

It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.

 

y-cruncher has been used to set several world records for the most digits of Pi ever computed.

 

Current Release:

Windows: Version 0.7.6 Build 9487 (Released: October 14, 2018)

Linux      : Version 0.7.6 Build 9487 (Released: October 14, 2018)

 

Official HWBOT thread.

Official XtremeSystems Forums thread.

 

News:

Computation of Pi using a Custom Formula

 

Custom Formulas: (October 28, 2018) - permalink

 

For those of you who have been paying attention to the record lists, you'll have noticed some new constants which y-cruncher has not supported before. Long story short, I've been experimenting with a custom formula feature similar to that of PiFast's User Constants. And at this point, it's in a good enough shape to announce.

 

y-cruncher's focus has always been to focus on a small number of popular constants and take them to the extreme. But it's been on the back of mind for years to add some sort of formula feature to allow people to use y-cruncher's capabilities to compute other things as well. I'm glad to say that this will finally be coming in v0.7.7.

 

The new formula feature will allow users to input arbitrary (finite sized) formulas with functions such as:

These encompass nearly all of y-cruncher's internal functionality. The AGM and large number square root are new to y-cruncher v0.7.7 and didn't exist before. This set of functions is not final and will probably grow a bit more before v0.7.7 is released.

 

A list of formulas can be found here: https://github.com/Mysticial/y-cruncher-Formulas/tree/master/Formulas

 

 

This custom formula feature may seem like it's coming out of nowhere. But it's been in the making for years. There's a long list of reasons why it took so long:

 

The custom formula feature is not yet complete. Given the scale of such a feature, it will require more than the usual amount of testing. So it's still months away.

 

 

Older News

 

Records Set by y-cruncher:

y-cruncher has been used to set a number of world record sized computations.

 

Blue: Current World Record

Green: Former World Record

Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.

Date Announced Date Completed: Source: Who: Constant: Decimal Digits: Time: Computer:
October 29, 2018 October 29, 2018 Screen Alexander Yee Gamma(1/4) 50,000,000,000 Compute:  15.5 hours

Verify:  19.1 hours

Intel Core i7 5960X @ 4.0 GHz

64 GB - 16 x 2TB 7200 RPM

October 6, 2018 October 6, 2018 Screen Alexander Yee Zeta(5) 20,000,000,000 Compute:  25.3 hours

Verify:  17.3 hours

Intel Core i9 7940X @ 3.6 GHz

128 GB DDR4

September 23, 2018 September 28, 2018   Tizian Hanselmann Golden Ratio 2,600,000,000,000 Compute:  9.25 days

Verify:  9.03 days

2 x Intel Xeon E5620 @ 2.4 GHz
128/144 GB DDR4
4 x 4TB

August 28, 2018 August 28, 2018  

"yoyo" &

Ryan Moore

Golden Ratio 2,500,000,000,000 Compute:  42.2 hours

Verify:  169 hours

2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
768 GB

Intel Core i7 7700K @ 4.2 GHz

32 GB

August 9, 2018 August 4, 2018   Gerald Hofmann e 8,000,000,000,000 Compute:  28.5 days

Not Verified

2 x AMD Epyc 7551 @ 2.0 GHz

256 GB

August 24, 2017 August 23, 2017   Ron Watkins Euler-Mascheroni Constant 477,511,832,674

Compute:  34.4 days

Verify:  141 days

4 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
2 x Xeon X5690 @ 3.47 GHz - 128 GB
August 14, 2017 August 13, 2017   Ron Watkins Zeta(3) - Apery's Constant 500,000,000,000

Compute:  19.7 days

Verify:  29.8 days

8 x Xeon 6550 @ 2.0 GHz - 512 GB

2 x Xeon X5690 @ 3.46 GHz - 142 GB

November 15, 2016 November 11, 2016 Blog
Sponsor
Peter Trueb Pi 22,459,157,718,361 Compute:  105 days

Verify:  28 hours

Validation File

4 x Xeon E7-8890 v3 @ 2.50 GHz
1.25 TB DDR4
20 x 6 TB 7200 RPM Seagate
September 3, 2016 August 29, 2016   Ron Watkins e 5,000,000,000,000

Compute:  48.6 days

Verify:  48.7 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
July 11, 2016 July 5, 2016   "yoyo" Golden Ratio 10,000,000,000,000

Compute:  6.2 days

Not Verified

2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
768 GB
June 28, 2016 June 19, 2016   Ron Watkins Square Root of 2 10,000,000,000,000

Compute:  18.8 days

Verify:  25.2 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
June 4, 2016 May 29, 2016   Ron Watkins Lemniscate 250,000,000,000

Compute:  91.7 hours

Verify:  270 hours

4 x Xeon E5-4660 v3 @ 2.1 GHz - 1TB
4 x Xeon X6550 @ 2 GHz - 512 GB
April 24, 2016 April 18, 2016   Ron Watkins Log(2) 500,000,000,000

Compute:  12.8 days

Verify:  14.4 days

4 x Xeon X5690 @ 3.47 GHz - 141 GB
April 17, 2016 April 12, 2016   Ron Watkins Catalan's Constant 250,000,000,000

Compute:  204 hours

Verify:  207 hours

4 x Xeon E5-4660 v3 @ 2.1 GHz
1 TB
April 9, 2016 April 3, 2016   Ron Watkins Log(10) 500,000,000,000

Compute:  14.4 days

Verify:  15.2 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
February 8, 2016 February 6, 2016   Mike A Catalan's Constant 500,000,000,000

Compute:  26.1 days

Not Verified

2 x Intel Xeon E5-2697 v3 @ 2.6 GHz
128 GB
July 24, 2015 July 22, 2015
July 23, 2015
Source Ron Watkins
Dustin Kirkland
Golden Ratio 2,000,000,000,000

Compute:  77.3 hours

Verify:  76.33 hours

Compute:  79.3 hours

Verify:  80.8 hours

4 x Xeon X6550 @ 2 GHz - 512 GB
Xeon E5-2676 v3 @ 2.4 GHz - 64 GB
October 8, 2014 October 7, 2014  

Sandon Van Ness

(houkouonchi)

Pi 13,300,000,000,000

Compute:  208 days

Verify:  182 hours

Validation File

2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
December 28, 2013 December 28, 2013 Source Shigeru Kondo Pi 12,100,000,000,050

Compute: 94 days

Verify: 46 hours

2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB

See the complete list including other notably large computations. If you want to set a record yourself, the rules are in that link.

 

 

Features:

 

The main computational features of y-cruncher are:

 

Download:

Sample Screenshot: 1 trillion digits of Pi

Core i7 5960X @ 4.0 GHz - 64 DDR4 @ 2133 MHz - 16 HDs

 

Latest Releases: (November 1, 2018)

OS Download Link Size

Windows

y-cruncher v0.7.6.9487b.zip

39.7 MB

Linux (Static)

y-cruncher v0.7.6.9487-static.tar.xz

25.3 MB

Linux (Dynamic)

y-cruncher v0.7.6.9487-dynamic.tar.xz

18.7 MB

 

 

 

 

 

 

 

 

The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.

 

The Windows download comes bundled with the HWBOT submitter which allows benchmarks to be submitted to HWBOT.

 

System Requirements:

Windows:

Linux:

All Systems:

Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.

 

Version History:

 

Other Downloads (for C++ programmers):

 

Advanced Documentation:

 

 

Benchmarks:

Comparison Chart: (Last updated: October 19, 2018)

 

Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.

The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.

 

Blue: Benchmarks are up-to-date with the latest version of y-cruncher.

Green: Benchmarks were done with an old version of y-cruncher that is comparable in performance with the current release.

Red: Benchmarks are significantly out-of-date due to being run with an old version of y-cruncher that is no longer comparable with the current release.

 

 

Laptops + Low-Power:

Processor(s): Core i7 3630QM VIA C4650 Pentium N42001 Xeon E3-1535M v5 Core i7 6820HK Core i7 8850H
Generation: Intel Ivy Bridge VIA Isaiah Intel Apollo Lake Intel Skylake Intel Skylake Intel Coffee Lake
Cores/Threads: 4/8 4/4 4/4 4/8 4/8 6/12
Processor Speed: 3.2 GHz 2.0 GHz 1.1 - 2.5 GHz 2.9 GHz 3.2 GHz ?? GHz
Memory: 16 GB - 1600 MT/s 16 GB 4 GB 16 GB 48 GB - 2133 MT/s 16 GB
Version: v0.7.6 ~ Hina v0.7.2 ~ Hina v0.7.2 ~ Ushio v0.7.1 ~ Kurumi v0.7.6 ~ Kurumi v0.7.6 ~ Kurumi
Instruction Set: x64 AVX x64 AVX x64 SSE4.1 x64 AVX2 + ADX x64 AVX2 + ADX x64 AVX2 + ADX
25,000,000 3.895 17.207 11.739 1.865 1.656 1.539
50,000,000 8.761 39.049 26.289 4.102 3.591 3.309
100,000,000 19.338 87.626 65.147 9.007 7.916 7.546
250,000,000 57.716 277.711 192.473 25.444 21.959 20.011
500,000,000 132.171 587.516 493.551 56.566 48.781 47.37
1,000,000,000 304.784 1,350.868   130.055 108.108 109.754
2,500,000,000 895.365 3,884.838     308.514 291.586
5,000,000,000         681.761  
10,000,000,000         1,535.329  
Credit: Oliver Kruse Tralalak Kaupo Karuse   yoyo

 

 

Mainstream Desktops:

Processor(s): Core i7 7700K Ryzen 7 1800X Core i7 8700K Core i7 9700K Core i9 9900K
Generation: Intel Kaby Lake AMD Zen Intel Coffee Lake Intel Coffee Lake Intel Coffee Lake
Cores/Threads: 4/8 8/16 6/12 8/8 8/16
Processor Speed: 4.9 GHz (OC) 3.7 GHz 4.9 - 5.0 GHz (OC) 4.6 GHz 4.7 GHz
Memory: 64 GB - 3200 MT/s 64 GB - 3000 MT/s 16 GB - 3600 MT/s 16 GB - 3600 MT/s 32 GB - 3600 MT/s
Program Version: v0.7.6 ~ Kurumi v0.7.6 ~ Yukina v0.7.6 ~ Kurumi v0.7.6 ~ Kurumi v0.7.6 ~ Kurumi
Instruction Set: x64 AVX2 + ADX x64 AVX2 + ADX x64 AVX2 + ADX x64 AVX2 + ADX x64 AVX2 + ADX
25,000,000 1.049 1.247 0.930 0.730 0.675
50,000,000 2.314 2.655 2.023 1.630 1.496
100,000,000 5.077 5.759 4.352 3.605 3.259
250,000,000 14.277 16.115 11.925 10.213 9.032
500,000,000 31.878 35.783 25.883 22.960 20.018
1,000,000,000 70.806 79.345 56.387 50.819 44.175
2,500,000,000 204.042 228.840 157.515 145.464 125.223
5,000,000,000 448.258 498.923     279.321
10,000,000,000 976.451 1,092.887      
Credit: Oliver Kruse   Nehal Prasad ji lcpd
Processor(s): Phenom II X3 720 Core i7 920 FX-8350 Core i7 4770K Core i7 5775C
Generation: AMD K10 Intel Nehalem AMD Piledriver Intel Haswell Intel Broadwell
Cores/Threads: 4/4 (unlock from 3/3) 4/8 8/8 4/8 4/8
Processor Speed: 2.8 GHz 3.5 GHz (OC) 4.0 GHz 4.0 GHz (OC) 3.8 GHz (OC)
Memory: 12 GB - 1333 MT/s 12 GB - 1333 MT/s 32 GB - 1600 MT/s 32 GB - 2133 MT/s 16 GB - 2400 MT/s
Program Version: v0.7.6 ~ Kasumi v0.7.5 ~ Ushio v0.7.6 ~ Miyu v0.7.6 ~ Airi v0.7.1 ~ Kurumi
Instruction Set: x64 SSE3 x64 SSE4.1 x64 AVX + XOP x64 AVX2 x64 AVX2 + ADX
25,000,000 9.357 5.046 3.239 1.524 1.730
50,000,000 19.678 11.117 7.167 3.365 3.940
100,000,000 43.794 24.855 15.700 7.527 8.739
250,000,000 127.530 73.794 43.787 20.766 25.073
500,000,000 283.572 164.814 97.843 46.358 56.343
1,000,000,000 648.422 375.974 219.344 102.451 125.967
2,500,000,000 1,832.422 1,066.704 633.021 291.632 369.738
5,000,000,000     1,408.939 645.998  
10,000,000,000          
Credit:         André Bachmann

 

 

High-End Desktops:

Processor(s): Core i7 5820K Core i7 5960X Threadripper 1950X Core i9 7900X Core i9 7940X
Generation: Intel Haswell Intel Haswell AMD Threadripper Intel Skylake X Intel Skylake X
Cores/Threads: 6/12 8/16 16/32 10/20 14/28
Processor Speed: 4.5 GHz (OC) 4.0 GHz (OC) 3.5 - 3.7 GHz

4.3/4.0/3.6 GHz*

4.6/4.0/3.6 GHz*
3.0 GHz cache 2.8 GHz cache
Memory: 32 GB - 2400 MT/s 64 GB - 2133 MT/s 128 GB - 3000 MT/s 128 GB - 3600 MT/s 128 GB - 3466 MT/s
Program Version: v0.7.3 ~ Airi v0.7.6 ~ Airi v0.7.6 ~ Yukina v0.7.6 ~ Kotori v0.7.6 ~ Kotori
Instruction Set: x64 AVX2 x64 AVX2 x64 AVX2 + ADX x64 AVX512-DQ x64 AVX512-DQ
25,000,000 1.287 0.812 0.747 0.522 0.520
50,000,000 2.499 1.942 1.516 1.117 1.052
100,000,000 5.401 4.072 3.203 2.362 2.177
250,000,000 14.732 10.991 8.733 6.209 5.409
500,000,000 32.294 23.929 19.139 13.204 11.412
1,000,000,000 71.225 52.768 42.345 28.827 24.232
2,500,000,000 200.323 149.365 119.526 79.854 66.592
5,000,000,000 443.543 330.414 266.066 178.786 147.719
10,000,000,000   722.456 579.380 394.887 323.079
25,000,000,000     1629.994 1119.634 911.097
Credit: Sean Heneghan   Oliver Kruse    

*All-core non-AVX/AVX/AVX512 CPU frequency.

 

 

Multi-Processor Workstation/Servers:

 

Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.

Processor(s): Xeon E5-2683 v3 Xeon E5-2687W v4 Xeon E5-2696 v4 Xeon E7-8880 v3 Epyc 7601 Xeon Gold 6130F Xeon Platinum 8124M
Generation: Intel Haswell Intel Broadwell Intel Broadwell Intel Haswell AMD Naples Intel Skylake Purley Intel Skylake Purley
Sockets/Cores/Threads: 2/28/56 2/24/48 2/44/88 4/64/128 2/64/128 2/32/64 2/36/72
Processor Speed: 2.03 GHz 3.0 GHz 2.2 GHz 2.3 GHz 2.2 GHz 2.1 GHz 3.0 GHz
Memory: 128 GB - ??? 64 GB 768 GB - ??? 2 TB - ??? 256 GB - ?? 256 GB - ?? 137 GB - ??
Program Version: v0.6.9 ~ Airi v0.7.6 ~ Kurumi v0.7.1 ~ Kurumi v0.7.1 ~ Airi v0.7.3 ~ Yukina v0.7.3 ~ Kotori v0.7.5 ~ Kotori
Instruction Set: x64 AVX2 x64 AVX2 + ADX x64 AVX2 + ADX x64 AVX2 x64 AVX2 + ADX x64 AVX512-DQ x64 AVX512-DQ
25,000,000 0.907 0.490 0.715 1.176 2.459 1.150 0.540
50,000,000 1.745 1.072 1.344 2.321 4.347 1.883 0.981
100,000,000 3.317 2.303 2.673 4.217 6.996 3.341 1.905
250,000,000 8.339 6.196 6.853 8.781 14.258 7.731 5.085
500,000,000 17.708 13.046 14.538 15.879 24.930 15.346 10.372
1,000,000,000 37.311 27.763 31.260 32.078 47.837 31.301 21.217
2,500,000,000 102.131 76.202 84.271 78.251 111.139 82.871 55.701
5,000,000,000 218.917 165.046 192.889 164.157 228.252 179.488 118.151
10,000,000,000 471.802 356.487 417.322 346.307 482.777 387.530 247.928
25,000,000,000 1,511.852 1,006.131 1,186.881 957.966 1,184.144 1,063.850  
50,000,000,000   2,202.558 2,601.476 2,096.169      
100,000,000,000     6,037.704 4,442.742      
250,000,000,000       17,428.450      
Credit: Shigeru Kondo Cameron Giesbrecht "yoyo" Jacob Coleman Dave Graham Jacob Coleman
Processor(s): Xeon X5482 Xeon E5-2690
Generation: Intel Penryn Intel Sandy Bridge
Sockets/Cores/Threads: 2/8/8 2/16/32
Processor Speed: 3.2 GHz 3.5 GHz
Memory: 64 GB - 800 MT/s 256 GB - ???
Program Version: v0.7.2 ~ Ushio v0.7.5 ~ Nagisa v0.6.2/3 ~ Hina
Instruction Set: x64 SSE4.1 x64 AVX
25,000,000 4.548 4.248 2.283
50,000,000 9.779 9.148 4.295
100,000,000 20.834 19.580 8.167
250,000,000 60.049 56.226 20.765
500,000,000 134.978 126.448 42.394
1,000,000,000 308.679 286.903 89.920
2,500,000,000 874.588 824.820 239.154
5,000,000,000 1,946.683 1,836.808 520.977
10,000,000,000 4,317.677 4,000.065 1,131.809
25,000,000,000     3,341.281
50,000,000,000     7,355.076
Credit:     Shigeru Kondo

 

 

Fastest Times:

The full chart of rankings for each size can be found here:

These fastest times may include unreleased betas.


Got a faster time? Let me know: a-yee@u.northwestern.edu

Note that I usually don't respond to these emails. I simply put them into the charts which I update periodically (typically within 2 weeks).

 

 

Performance Tips:

 

Decimal Digits of Pi - Times in Seconds

Core i9 7940X @ 3.7 GHz AVX512

Memory Frequency: 2666 MT/s 3466 MT/s
25,000,000 0.839 0.758
50,000,000 1.424 1.338
100,000,000 2.701 2.425
250,000,000 6.489 5.877
500,000,000 13.307 11.917
1,000,000,000 27.913 24.915
2,500,000,000 76.837 68.322
5,000,000,000 168.058 148.737
10,000,000,000 365.047 322.115
25,000,000,000 1,037.527 916.039

High core count Skylake X processors are known to be heavily bottlenecked by memory bandwidth.

Memory Bandwidth:

 

Because of the memory-intensive nature of computing Pi and other constants, y-cruncher needs a lot of memory bandwidth to perform well. In fact, the program has been noticably memory bound on nearly all high-end desktops since 2012 as well as the majority of multi-socket systems since at least 2006.

 

Recommendations:

Don't be surprised if y-cruncher exposes instabilities that other applications and stress-tests do not. y-cruncher is unusual in that it simultaneously places a heavy load on both the CPU and the entire memory subsystem.

 

 

 

Parallel Performance:

 

y-cruncher has a lot of settings for tuning parallel performance. By default, it makes a best effort to analyze the hardware and pick the best settings. But because of the virtually unlimited combinations of processor topologies, it's difficult for y-cruncher to optimally pick the best settings for everything. So sometimes the best performance can only be achieved with manual settings.

*These are advanced settings that cannot be changed if you're using the benchmark option in the console UI. To change them, you will need to either run benchmark mode from the command line or use the custom compute menu.

 

Load imbalance is a faily common problem in y-cruncher. The usual causes are:

  1. The number of logical cores is not a power-of-two.
  2. The cores are not homogenous. Common reasons include:
    • The cores are clocked at different speeds.
    • The cores have access to different amounts of memory bandwidth due an imbalanced NUMA topology.
    • The cores are different generation cores hidden behind a virtual machine.
  3. CPU-intensive background processes are interfering with y-cruncher's ability to use all the hardware. This applies to all forms of system jitter.

 

Swap Mode:

 

This is probably one of the most complicated features in y-cruncher.

 

 

Known Issues:

 

Everything in this section is in the process of being re-verified and moved to: https://github.com/Mysticial/y-cruncher/issues

 

 

Performance Issues:


Algorithms and Developments:

 

FAQ:

 

Pi and other Constants:

 

Hardware and Overclocking:

 

Academia:

 

Programming:

 

Program Usage:

 

Other:

 

Links:

Here's some interesting sites dedicated to the computation of Pi and other constants:

 

Questions or Comments

Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.