y-cruncher v0.1.0 - Benchmarks
By Alexander J. Yee
(Last updated:
August 1, 2009)
π = 3.14159265358979323846...
Ram Only
Chudnovsky Formula
All times in seconds. All times include base-conversions.
| Processor(s): |
2.8 GHz Pentium D
920 Presler |
2.0 GHz Core 2 Duo
T7200 Merom |
2.67 GHz Core i7
(Overclock to 3.2 GHz)
920 Bloomsfield |
Dual 3.2 GHz Quad-Core Xeon
X5482 Harpertown |
| Memory: |
3 GB DDR2
533 MHz (dual channel) |
4 GB DDR2
667 MHz (dual channel) |
6 GB DDR3
1600 MHz (triple channel) |
64 GB DDR2 FB-DIMM
800 MHz (quad channel) |
| Courtesy Of: |
Alexander Yee |
Raymond Chan |
Johnny Sun |
"Nagisa" - my gaming rig |
| Version: |
v0.1.0.6013 (x86 SSE3) |
v0.1.0.6013 (x64 SSE3) |
v0.1.0.6013 (x64 SSE3) |
v0.1.0.6351 (x64 SSE3 - AV*) |
| Decimal Digits |
1 thread |
2 threads |
Scaling |
1 thread |
2 threads |
Scaling |
1 thread |
8 threads |
Scaling |
1 thread |
8 threads |
Scaling |
| 1,000,000 |
4.240 |
3.179 |
1.33 |
2.229 |
1.380 |
1.62 |
1.169 |
0.499 |
2.34 |
1.375 |
0.579 |
2.37 |
| 2,000,000 |
9.048 |
7.086 |
1.28 |
4.954 |
2.950 |
1.68 |
2.606 |
0.889 |
2.93 |
3.001 |
1.063 |
2.83 |
| 5,000,000 |
27.988 |
20.838 |
1.34 |
14.430 |
8.483 |
1.70 |
7.551 |
2.340 |
3.23 |
8.734 |
2.484 |
3.52 |
| 10,000,000 |
67.706 |
46.700 |
1.45 |
32.320 |
19.57 |
1.65 |
16.817 |
4.882 |
3.44 |
19.562 |
4.750 |
4.12 |
| 20,000,000 |
154.935 |
105.123 |
1.47 |
71.340 |
40.127 |
1.78 |
36.815 |
10.452 |
3.52 |
43.406 |
9.110 |
4.76 |
| 50,000,000 |
467.09 |
297.864 |
1.56 |
207.272 |
115.285 |
1.80 |
105.831 |
28.033 |
3.78 |
125.672 |
22.375 |
5.62 |
| 100,000,000 |
1060.7 |
643.124 |
1.65 |
460.481 |
255.772 |
1.80 |
235.248 |
59.873 |
3.93 |
279.485 |
46.094 |
6.06 |
| 200,000,000 |
|
|
|
1017.690 |
571.168 |
1.78 |
519.479 |
128.403 |
4.05 |
617.657 |
97.421 |
6.34 |
| 500,000,000 |
|
|
|
|
|
|
1478.928 |
365.200 |
4.05 |
1759.440 |
271.641 |
6.48 |
| 1,000,000,000 |
|
|
|
|
|
|
|
|
|
3,861 |
594.062 |
6.50 |
| 2,000,000,000 |
|
|
|
|
|
|
|
|
|
8,473 |
1276.34 |
6.64 |
| 5,000,000,000 |
|
|
|
|
|
|
|
|
|
23,326 |
3523.2 |
6.62 |
| 10,000,000,000 |
|
|
|
|
|
|
|
|
|
|
7,801 |
-- |
*AV stands for "Author's Version". AV versions are essentially my test builds that have extra redundancy checks as well as debugging information.
| Pi - 1 billion digits (10 minutes) |
Pi - 10 billion digits (2 hours, 10 minutes) |
| v0.1.0.6376 (x64 SSE3 - AV) |
v0.1.0.6376 (x64 SSE3 - AV) |
 |
 |
Dual 3.2 GHz Quad-Core Xeon
X5482 Harpertown |
Dual 3.2 GHz Quad-Core Xeon
X5482 Harpertown |
64 GB DDR2 FB-DIMM
800 MHz (quad channel) |
64 GB DDR2 FB-DIMM
800 MHz (quad channel) |
A few comments:
- y-cruncher scales better on x64 than x86 for small computations. On larger computations, x86 tends to scale better because it is slower and less bandwidth limited.
- On Core i7, y-cruncher does appear to be able to scale more than 4x with Hyper-Threading enabled.
- On the entire Core 2 Quad family (which includes Kentsfield, Yorkfield, Clovertown, and Harpertown), y-cruncher is somewhat limited by memory bandwidth and has difficulty scaling past 80% multi-core efficiency when running x64 (x86 is fine because it's too slow to be memory bottlenecked).
- Version 0.2.1 has in place a few bandwidth-reducing optimizations and appears to achieve 90% multi-core efficiency on Nagisa for x64 (more than 7x scaling across 8 cores).
- On a side note, I can easily get Nagisa to scale to 95% multi-core efficiency (7.6x scaling) if I clock her down to 2.4 GHz. But of course, that's cheating... Goes to show how desperately Intel needed to get rid of that Front Side Bus for some Nehalem Core i7 awesomeness...
- Going back to a few months before we built Nagisa, I was fortunate enough to get my hands on a 2.4 GHz Core 2 Quad Kentsfield to play with and run some tests on it.
- Based on those tests, I was able to anticipate that a dual Harpertown setup would be severely bandwidth limited. (But still better than dual Opteron Barcelonas...)
- So when Raymond and I had to choose between the Xeon X5482 (3.2 GHz, 1600 FSB) and Xeon X5470 (3.33 GHz, 1333 FSB), it was an easy decision to sacrifice some clock speed for the much needed 20% boost in memory bandwidth. Hence we chose the X5482...
- The X5492 (3.4 GHz, 1600 FSB) was also an option, but was a little beyond our budget... Not to mention that the extra 200 MHz wouldn't help much in an already bandwidth limited system.
- The QX9775 (Core 2 Extreme, 3.2 GHz, 1600 FSB, unlocked multiplier) was another option, but our motherboard didn't support overclocking... Also a little beyond our budget...
- In retrospect, the X5470 could be BSEL modded to 4.0 GHz, 1600 FSB... But considering that we needed absolute stability under months of sustained 100% CPU (which is much more than your average overclocker)... That probably wouldn't have been a good idea...