y-cruncher - Version History

(Last updated: October 14, 2017)

 

 

Back To:

Version History:

Download older versions here.

Release Date Version   Changes

October 14, 2017 v0.7.4.9477

Windows + Linux
 

New Features:

  • Stress-Tester Revamp:
    • New tests: BBP and SFT. These two tests are designed to stay in cache and avoid the memory bottleneck that plagues the Skylake X processor line.
    • Full control has been added to choose which logical cores to test and which to leave idle.
    • On Linux, thread pinning is now more robust to unusual hardware configurations like disabled cores.
    • Tests now display how much memory is actually used.
    • Tests now show a slider indicating whether it is CPU-bound or memory-bound.
    • The stress-tester now has support for configuration files.

Fixes:

  • Fixed a bug that caused excessive memory usage with large task decompositions. It is now possible to do 25 billion digit Pi benchmarks within 128GB of memory with a lot of threads.
  • Fixed a subset of the BufferTooSmallException errors that occur in swap mode computations using a very small amount of memory. These are caused by incorrect pre-computation memory calculations. A more comprehensive fix is slated for the next feature release.

Retunings:

  • The "00-x86", "04-P4P", and "05-A64 ~ Kasumi" binaries have been re-tuned for AMD Phenom II.
  • "16-KNL" and "17-SKX" have been retuned for smaller caches.
  • Tuning parameters for most of the other binaries have been updated.

Optimizations:

  • Bulldozer: ~6%
  • Haswell: ~4%
  • Skylake: ~5%
  • Ryzen: ~6%
  • Skylake X: ~10%

September 14, 2017 v0.7.3.9475

Windows + Linux
 

Fixes:

  • Fixed a bug that may cause ram only computations of e to fail with "Memory Allocation Failed".

August 15, 2017 v0.7.3.9474

Windows + Linux
 

Fixes:

  • Fixed a bug that prevented the I/O benchmark from being run from a config file via command line.
  • Fixed a bug in the stress-tester that may cause improper thread binding.
  • Errors during stress-testing will be printed out immediately.
  • Speculative fix for bugged NUMA detection on AMD Threadripper and Eypc on Linux.

July 12, 2017 v0.7.3.9472

Windows
 

Fixes:

  • Fixed a bug in Windows involving processor groups that prevents the stress-tester and the Push Pool framework from setting thread affinities on AMD Epyc systems with exactly 128 vcores. (Thanks to Dave Graham for reporting this.)

July 6, 2017 v0.7.3.9471

Windows + Linux
 

New Features:

  • 17-SKX ~ Kotori: A new binary tuned for Skylake Purley processors with full-throughput AVX512.

  • 16-KNL: An untuned AVX512 binary that will run on Knights Landing Xeon Phi host processors.

  • Configuration Files:
    • The Custom Compute and I/O Benchmark menus now have support for saving to and loading from configuration files.
    • The functionalities for these menus can also be triggered directly from the command line with a configuration file.

  • Memory Allocator Improvements:
    • New memory allocators that support node interleaving. This may improve performance on NUMA systems.
    • Allocators now have sub-options that can be selected.
    • The memory allocator can now be set with the command line options.
    • On Windows, it is now possible to lock pages without using large pages.

Fixes:

  • Improved detection of available memory in Linux. In the past, it only detected unused pages. Now it detects all available memory including those used for caching which can be released if needed.

  • Invalid command-line options will now terminate the program instead of being silently skipped.

  • Fixed a bug in swap mode that causes the primary formula for the Euler-Mascheroni Constant to require much more memory than is necessary. This bug was introduced when fixing the "working memory is too small" bug in v0.7.2.

  • Fixed an issue in the Custom Compute swap mode menu that may cause it to fail an internal assertion. This is also related to the Euler-Mascheroni Constant memory calculations.

Other:

  • The dynamically-linked Linux binaries now have a dependency on libnuma due to the new node-interleaving allocators.
  • The concept of local and global "Min I/O Size" has been renamed and replaced with "Bytes per Seek".

 

The new features in this release may seem a bit loaded in terms of their magnitude given that it's only been a 4 months since the last major release. But in reality, most of the stuff has been a work-in-progress for a long time.


June 3, 2017 v0.7.2.9469

Windows + Linux
 

Fixes:

  • Fixed a serious bug in the Push Pool that could cause failures when limiting the # of threads. This bug was introduced in one of the many refactors between v0.7.1 and v0.7.2.

March 14, 2017 v0.7.2.9468

Windows + Linux
 

The refactorings from v0.7.1 continue into this release. Other than that, there are few user-visible changes.

 

Due to the time-constraints with trying to make this happen for Pi Day, this release is not as well tested as it should've been. So don't be surprised if stuff breaks.

 

Fixes:

  • The Linux binaries will now properly read the CPU topology on multi-socket systems.
  • Fixed an integer overflow bug in I/O benchmark for the x86 binaries.
  • Fixed a corner case where computations of the Euler-Mascheroni Constant may fail with, "Working memory is too small."

New Features:

  • 17-ZD1 ~ Yukina: A new binary tuned for AMD Zen processors.

  • In Windows, y-cruncher will automatically create a minidump file if it crashes. This should make it easier to debug crashes which cannot be reproduced locally in the development environment.

  • The BBP app now has the option to pin all threads to different cores. This solves the problem of imbalanced processor groups.

Optimizations:

  • Minor speedups for most processors:
    • Penryn: ~6%
    • Nehalem: ~7%
    • Sandy Bridge: ~5%
    • Bulldozer: ~7%
    • Haswell: ~7%
    • Skylake: ~5%
    • Zen: ~14%

Retunings:

  • The default "Min IO" parameter has been raised to 1 MB to keep inline with modern hard drives.

Other:

  • The binaries have been renamed. Instead of using the name of an instruction set, they now use a year and an acronym for the processor architecture instead. This was done because there are simply too many instruction sets and newer ones don't necessarily imply the existance of all the older ones.

September 16, 2016 v0.7.1.9466

Windows + Linux
 

Fixes:

  • Fixed a bug in swap mode that causes objects to under-allocate. This sometimes leads to assertion failures about writing beyond the end of a swap file. This bug dates all the way back to the "risky" refactorings from v0.6.6. This took almost 2 years to catch because it only seems to affect the Euler-Mascheroni Constant.

For what it's worth, v0.7.1 has been surprisingly stable given the sheer magnitude of the internal changes.


May 16, 2016 v0.7.1.9465

Windows + Linux
 

This is the first version since v0.6.1 that is dedicated primarily to paying back technical debt. While there are no major functional changes, expect to see a lot of minor differences from v0.6.9 which are remnant of the internal refactorings.

 

New Features:

  • Spot Checking: For computations of major constants, the digits are automatically spot-checked against a table of known digits. This was originally an internal feature meant to streamline QC testing. But it turned out to be useful enough to enable publicly. This feature completely replaces the ad hoc digit-checker used for Pi benchmarks.

  • x64 ADX ~ Kurumi: A new binary tuned for desktop Skylake processors. It utilizes the new add-with-carry instructions. This binary will also run on Broadwell processors.

  • The x86 binary (no SSE) is back. It disappeared for all of v0.6.x for performance reasons, but it's back for v0.7.1. The binary won't be able to run on any old processors anyway since Windows Vista or later is a requirement. But it's there for the purpose of comparing the effects of the various instruction sets.

  • Disk I/O buffer sizes are now configurable. In the past, they were locked to 64MB/path. This resulted in suboptimal performance when the path(s) were themselves a RAID of multiple drives.

  • When entering the number of digits, you can now specify suffixes. (e.g. "25m", "10b")

  • y-cruncher will now attempt to use large pages and lock them in memory to prevent destructive disk swapping by the OS.

HWBOT Integration Improvements:

  • Improved detection of the processor topology.
  • Added detection of the operating system version.
  • Added detection for the motherboard and memory configuration.
  • Added detection of the reference clock.
  • The validation files have been renamed so that they don't overwrite each other anymore.

Fixes:

  • Previously, y-cruncher would fail to pick up the username when running the binaries directly. This has been fixed.

  • Unicode is now properly handled everywhere in the program. Note that the ability to display unicode characters is still subject to the limitations of the console window.

  • Fixed a bug where checkpoint-restart would fail if any path had an equals character (=) in it.

  • The stress-tester and BBP app will now be able to use multiple processor groups on Windows.

Optimizations:

  • Global retunings for all processor-specific binaries. Expect both speedups and slowdowns across the board for all computations and on all processor targets. This is most noticable on older processors.

  • A custom thread pool has been added to the parallel frameworks. This is the default for Linux. On Windows, it is the default when there is only one processor group.

  • Swap mode computations are slightly more aggressive with keeping things in memory.

  • The initial memory allocation is now parallelized.

Other:

  • y-cruncher no longer requires administrator privileges to run basic computations. Instead, it will request that it be re-run with elevation for the following features:
    • Swap mode in Windows: Requires "SeManageVolumePrivilege".
    • Large pages in Windows: Requires "SeLockMemoryPrivilege".
    • Locked pages in Linux: Requires "CAP_IPC_LOCK".

  • In Windows, when you run y-cruncher by double-clicking, it will set the console window dimensions to 80 x 25. Windows has historically defaulted to a window size of 80 x 25. But in Windows 10, they increased this to 120 x 30 - which looks a bit weird.

  • The Custom Compute menu options have been rearranged a bit.

  • The licensing has been reworded to explicitly allow the use of y-cruncher for tech reviews even if they are for commercial purposes.

March 1, 2016 v0.6.9.9464

Windows + Linux
 

Fixes:

  • Checkpoint restart will now properly save and restore the parallel framework. Previously, it would always load the default framework upon resuming a computation from a checkpoint. This bug has existed since v0.6.8 when parallel frameworks were added.

  • The framework threads will also be saved and restored across checkpoints.

  • Fixed a bug where the program may improperly detect the number of logical processors on Windows Server 2012 R2. This may also apply to other versions of Windows as well. (Thanks to Mike A for reporting this and suggesting a fix.)

  • Fixed a bug in the x86 binaries that would prevent the I/O benchmark from using more than 4GB of disk.

  • Fixed a bug in the Digit Viewer where it fails to print new lines when counting digit frequencies.

Optimizations:

  • The threshold for defaulting to Cilk Plus on Windows has been increased from 33 to 65 logical cores.

December 5, 2015 v0.6.9.9462

Windows + Linux
 

New Features:

  • On Windows, partial support has been added for Processor Groups*:
    • The AVX, XOP, and AVX2 binaries are now able to detect all the logical cores in the system even if it exceeds 64.
    • The AVX and AVX2 binaries will default to Cilk Plus when there is more than one processor group.
  • Support for Cilk Plus has been added for Linux.
  • The maximum task decomposition has been increased from 256 to 65,536.

*This feature has not been added to the older SSE binaries because it requires Windows 7. As of v0.6.8, y-cruncher maintains backwards compatibility with Windows Vista. But all the AVX binaries require Windows 7 SP1 anyway, so nothing is lost by using Win7-specific API calls.

 

Fixes:

  • The app will no longer crash when AVX is disabled on a processor that supports it. This was a Visual Studio bug that was fixed by upgrading to Visual Studio 2015.
  • The command line option, "-C:-1" for compressing to a single file has been fixed. This was due to the (incorrect) use of an unsigned integer parser which parses -1 as zero thereby disabling compression.
  • The dispatcher will select AVX2 instead of SSE3 for AMD Zen.
  • Swap mode computations may cause excessive disk swapping by the OS. This has been alleviated by reducing the default memory usage from 31/32 of available memory to 15/16.

Changes:

  • The Linux version is now available in both static and dynamic binaries. The static binaries are the most portable, but they lack Cilk Plus. The dynamic binaries support Cilk Plus, but has a dependency on Glibc-2.19 (and possibly others). This unfortunate DLL hell is because Intel refuses to provide Cilk Plus as a static library.
  • The ".out" extension for the Linux binaries has been removed. This was some very old legacy stuff that had something to do with GCC outputting "a.out" by default.
  • On Linux, Cilk Plus is the default for everything. This will improve the performance for small computations and on systems with many cores. But it may also cause minor performance regressions under some situations.

Optimizations:

  • On Windows, the binaries that support Cilk Plus will default to Cilk Plus when there are more than 32 logical cores.
  • Minor speedups for specific architectures.

Compiler Upgrades:

  • Windows: Visual Studio 2013 -> 2015
  • Linux: GCC 4.8 -> 5.1

May 7, 2015 v0.6.8.9461

Windows + Linux
 

Fixes:

  • Fixed a bug in the x86 binaries that was causing excessive memory usage.
  • Fixed a performance regression from v0.6.7 of up to 10% on older processors. This was the result of a bad refactor that accidentally replaced the size of the LLC (last level cache) with that of the L1 cache. Oops...
  • Fixed a bug in the BBP app where the compile-time CPU-dispatcher caused the "x64 SSE4.1 ~ Ushio" binary to use the "default" path instead of the SSE4.1 path.

March 17, 2015 v0.6.8.9460

Windows + Linux
 

Fixes:

  • Fixed a performance regression from v0.6.7 that could slow down Haswell processors by as much as 5%.
  • Fixed a bug in the x86 binaries that would cause stack corruption when calculating the memory requirements for a computation larger than ~500 billion digits. This bug has been there since v0.6.1.

March 14, 2015 v0.6.8.9458

Windows + Linux
 

New Features:

  • The BBP side-project from 5 years ago has been revived, rewritten, and integrated into y-cruncher.
  • The Digit Viewer has also been re-integrated into y-cruncher.
  • The multi-threading framework has been revamped. Now you can choose from several parallel computing frameworks. This feature is mostly experimental for now as the options vary by binary and are quite limited on Linux.

Fixes:

  • Fixed a rare but potentially serious bug in the basecase multiplication where a carryout may be missed. This was introduced in v0.6.6 when y-cruncher started using add-with-carry intrinsics. This only affected the 64-bit Windows binaries. The Linux versions are unaffected since GCC lacked these intrinsics.

Limits:

  • The theoretical limit of y-cruncher has been increased to 1015 decimal digits for all constants.

Optimizations:

  • The Windows versions now use thread pools by default.
  • Other miscellaneous optimizations mostly affecting AMD processors.

Changes:

  • The concept of "threads" has been replaced with "Task Decomposition" and "Parallel Framework". In the past, a computation was run using N threads. Now it is run using framework X with a task decomposition of Y. Both the framework and the task decomposition can be manually set.

February 8, 2015 v0.6.7.9457

Windows + Linux
 

Swap Mode Improvements:

  • Swap mode computations will now create a folder for the swap files. When multiple paths are used, each folder will have a unique name. This makes manual backups easier.
  • A swap mode multiplication tester has been added to the advanced options. Anyone who is attempting a world record with y-cruncher should first run this tester to sanity check the program's ability to do arithmetic at the target precision.
  • The I/O Benchmark has been revamped:
    • The old benchmark for strided access was vulnerable to request coalescing which cannot happen in a real computation. This was producing inflated bandwidth numbers. This has been somewhat mitigated in this version.
    • The recommendations have been updated to be more relevant for newer hardware.*

*Based on the runtime stats of the 12.1 and 13.3 trillion digit Pi computations, the new recommendation is to have an IO/compute ratio of 2.0. (i.e. the disk bandwidth should be double the compute bandwidth.) But the reality is that this will be very difficult to achieve on a modern high-end processor using conventional hard drives. This is the unfortunate result of the ever increasing performance gap between CPU and disk.

 

For example, a stock i7 5960X would require around 6 GB/s of disk bandwidth. At 100 MB/s per hard drive, that would be 60+ drives assuming linear scaling. While it is easier to do it using SSDs, they are smaller in size. Furthermore, the sheer volume of writes that y-cruncher will issue means that if you want to use SSDs, you need to be willing to expend them like consumables.

 

 

Fixes:

  • Fixed a serious bug that would cause large multiplications to hit an assertion failure under the right conditions.
  • In Linux, swap files are now created with only the necessary permissions. (read+write for owner)
    In the past, they were created with permissions 777 (everything). This was a huge oversight dating back to 2010 when the program was first ported to Linux.
  • Fixed a problem where swap mode in Linux wouldn't clean up the "pathcheck.ysf" file.
  • Fixed a minor console coloring problem.
  • Fixed a minor bug in swap mode that could cause suboptimal algorithm selection.
  • Fixed a bug in the multi-layer raid-file implementation that may cause the program to crash. This bug has existed since v0.6.1 and is extremely rare - requiring degenerate input data which may be impossible outside of a developer build.

Optimizations:

  • The XOP and AVX2 binaries are now faster.
  • The default "Min-IO Size" parameter has been reduced from 1MB to 256k.

Other:

  • On Windows, the stress tester now runs in the lowest possible priority to ensure responsiveness of CPU monitoring programs which also run in low priority.
  • On Windows, the default process priority has been reverted to "Below Average" for the same reason. As always, this priority can always be changed via Task Manager.
  • The minimum value for the "Min-IO Size" parameter has been reduced from 256k to 4k. SSDs as well as some hard drive configurations have very low average seek latencies which can benefit from a smaller Min-IO parameter.

December 21, 2014 v0.6.6.9452

Windows + Linux
 

Fixes:

  • Fixed a problem where the "Min I/O" parameter would be incorrectly set when resuming from a computation using more than one drive. This bug has performance consequences for large swap mode computations.
  • Fixed an issue that could prevent the program from detecting errors when performing swap mode multiplications. Anything that could allow errors to go undetected is serious since it means that a computation could finish with the wrong digits.

November 28, 2014 v0.6.6.9451

Linux
 

Fixed a problem reported by Matt Hesse where the Linux SSE3 binary would instantly fail on older Intel processors.

 

The problem is that the SSE3 binary was (incorrectly) compiled with "-march=barcelona" when it should have been "-mtune=barcelona". This will enable AMD-specific Advanced Bit Manipulation (ABM) instructions which did not exist on Intel processors prior to Haswell.


November 19, 2014 v0.6.6.9450

Windows + Linux
 

Fixed a bad regression in swap mode where disk I/O errors would not stop the computation. Instead, they would be ignored and the computation would continue with corrupted data.

 

Blame this on bad refactoring and insufficient corner-case testing...


November 5, 2014 v0.6.6.9449

Windows + Linux
 

In memory of my grandfather who passed away last month. He loved numbers and is probably why I do too...

 

New Features:

  • Command Line Options

  • A "Stop on Error" option in the stress-tester to halt when an error is detected.

Critical Fixes:

  • Fixed an issue that can cause failures when using a non-power-of-two number of threads.
  • Fixed an issue that will cause most computations larger than 20 trillion digits to fail. This bug has existed since v0.6.1 and is caused by an off-by-one error in the creation of the precomputed twiddle factor tables.
  • Fixed an issue in the 32-bit versions that would cause Pi computations larger than 40 billion digits to fail the base convert verification. This was caused by an integer overflow in the "size_t" datatype when a 64-bit file offset should have been used instead. This bug has existed since v0.6.1 and was never caught because 32-bit y-cruncher has never been tested at such large sizes until now. (v0.6.1 does not support swap mode for Pi, so v0.6.2 is actually the earliest version that is affected.)

Minor Fixes:

  • Fixed some crashes that may occur when using a very large number of threads.
  • Fixed an issue when attempting to print a line that is longer 79 characters.
  • Fixed an issue in the I/O Benchmark that may cause parity failures under RAID 3.
  • Fixed another issue with detecting OS support for AVX.

Optimizations:

  • Swap mode computations now require ~10% less disk space than before.

  • Minor overall speedups.

Retunings:

  • In Windows, the default process priority has been changed to "Normal". It used to be "Below Average".
  • Threads that perform disk I/O now run at the maximum priority that the OS will allow.
  • The program has been retuned in various ways. So expect some performance differences (both up and down) from the previous version.
  • The SSE3 binaries have been retuned in favor of larger computations. So they are faster for large computations, but slower for small ones. (similar to the SSE4.1 and AVX binaries in v0.6.3)

Internal Changes:

  • The compilers have been upgraded:
    • Intel Compiler (13 -> 14)
    • Visual Studio 2013 (Original -> Update 2)

August 23, 2014 v0.6.5.9444b
(fix 2)

Windows + Linux
 

Fixed a bug in the base conversion that may cause a computation in swap mode to enter an infinite loop.

(Credit to Yifang Sun for discovering this.)

 

This bug has existed since v0.6.1 and only occurs when the number of threads is small (1 or 2). The cause of the bug is a subtle design flaw in the handling of misaligned data sizes. The fix was to align the data and delete the flawed misalignment code.


July 24, 2014 v0.6.5.9443b
(fix 1)

Windows + Linux
 

Fixed a serious bug in the swap file implementation that may cause incorrect disk I/O leading to data corruption and ultimately an incorrect computation.

 

This bug only affects swap mode and is most likely to affect the Euler-Mascheroni Constant's primary algorithm.

 

The only change in this version is a single if-statement. Nevertheless, all bugs that affect the correctness of the computed digits are considered "serious" bugs that warrant an immediate patch and an unscheduled release.


May 25, 2014 v0.6.5.9442

Windows + Linux
 

New Features:

  • x64 AVX2 ~ Airi: A new binary tuned for Intel Haswell processors. It uses AVX2, FMA3, and BMI2 instructions.
  • The CPU dispatcher has been added to the Linux version.
  • Attempting to run a binary that is not compatible with the host will give a warning. An incompatible binary may still be able to run. But don't count on it.

Fixes:

  • Fixed the detection of AVX support by the OS. Previous versions only checked the OS version. It did not check if XSAVE was actually enabled to run AVX.
  • Units for memory have been changed to MiB/GiB/TiB etc... They have always been in binary, but the old labels were unclear as to whether it was binary or decimal.
  • The Linux version will attempt to reset the console colors to their defaults when the program exits.

Removed Features:

  • x64 SSE4.1 ~ Nagisa: This binary was tuned for my Harpertown workstation (Core 2 Penryn). But Core 2 is pretty old now and this binary has become redundant of the "x64 SSE4.1 ~ Ushio" binary. So not much is lost by dropping it. Getting rid of it also reduces the size of the download.

Internal Changes:

  • The entire CPU dispatching framework has been completely rewritten from scratch.
  • The underlying tool-chain has been completely revamped for this release. The AVX and AVX2 binaries use the Intel Compiler 13. The rest of them use Visual Studio 2013.
  • No optimizations have been done since the last release. But because of the compiler upgrades, there will be slight differences in performance since the previous version.

March 14, 2014 v0.6.4.9424

Windows + Linux
 

New Features:

  • x64 XOP ~ Miyu:
    • A new binary tuned for AMD Bulldozer line processors. It uses FMA4 and XOP instructions.
    • Up to 10% faster than the SSE3 binary. (usually about 8% faster for large computations)

Fixes:

  • Fixed a bug in the inverse square root code that may cause the routine to under-estimate the amount of memory that is needed. The result is memory corruption when the computation uses more than is allocated.

February 21, 2014 v0.6.3.9416b
(fix 1)


Windows + Linux
 

Bug fixes. One of which was bad enough to warrant patching v0.6.3 instead of waiting for v0.6.4.

 

Fixes:

  • Fixed a display issue where the secondary formula for Log(n) would show up as the "Primary Machin-Like Formula".
  • Fixed an issue in the Stress Tester where certain classes of soft-errors would not get reported.
  • Fixed a serious bug in the SSE3 binaries that would sometimes cause very large multiplications to fail.*

*This bug was introduced in v0.6.3 when the VST algorithm was refactored. I rewrote some of the processor-specific macros and screwed up the SSE3 version. The bug is rare and does not affect the SSE4.1 or AVX binaries.

 

This slipped through internal testing because it is rare and the SSE3 binaries are not tested as much as the SSE4.1 and AVX binaries.


December 29, 2013 v0.6.3.9415

Windows + Linux
 

The Digit Viewer has been (almost) entirely rewritten. It has been open-sourced on my GitHub.

 

New Constants:

  • Lemniscate Constant
  • Euler-Mascheroni Constant: Not really new to y-cruncher, but it's back after disappearing for two versions.

New Features:

  • The Digit Viewer can now count the # of occurences of each digit.
  • The same digit counts are now logged in the validation files. This is useful for making sure that the process of writing the digits to disk is actually correct.
  • Errors are now more descriptive. This should aid in identifying whether issues are software or hardware related.
  • The validation files now contain a hash for the decimal digits. This can be used to verify that digits are correctly written to disk.

Fixes:

  • The Linux versions now correctly use O_DIRECT for raw I/Os.
  • AMD Bulldozer line processors will now choose "x64 SSE3 ~ Kasumi" instead of "x64 AVX ~ Hina". This is because 256-bit AVX performance on Bulldozer and Piledriver processors is worse than SSE3.

Tweaks:

  • The "x64 SSE4.1 ~ Ushio" and "x64 AVX ~ Hina" binaries have been retuned. They are slightly faster for large computations (> 1 billion digits). But performance for small computations has been decreased slightly.
  • The VST algorithm has been refactored in preparation for future instruction sets. So there will be some performance differences.

Changes:

  • The VST algorithm warning for AVX-capable processors has been removed. Prime95 appears to be more stressful now that it has support for AVX.
  • Checkpoints are now overlapping. Old checkpoints are not destroyed until the new checkpoint has been made. Previously, there were small portions of time when there would be no checkpoint alive. So if an error occured at just the right time, you'd be screwed.

June 16, 2013
June 30, 2013
v0.6.2.9316
v0.6.2.9322

Windows + Linux
 

This release of the v0.6.x series adds the swap modes and checkpoint restart.

 

New Features:

  • Huvent's Formula for Catalan's Constant.
  • Swap Modes and Checkpoint restart have been added for all constants that v0.6.2 supports.
  • Resuming from a checkpoint will automatically remove all non-checkpoint files leftover from an interrupted computation.

Note that the checkpoint-restart in this version (0.6.2) has more granularity than in earlier versions. Checkpoints can now be made deep into the Binary Splitting recursions of all the series-based constants. Therefore the average time between checkpoints is much smaller and less work is lost upon a failure.

 

Changes:

  • All versions are now compiled using Visual Studio 2012. There may be slight changes in performance for all versions except the "x64 AVX ~ Hina" binary.
  • (Internal) - v0.6.2 now has a C++11 dependency. All prior versions of y-cruncher were compilable with only C99.

Regressions:

  • Swap modes require more disk space than in v0.5.x. This is because the checkpoint-restart requires that new checkpoints are written before (or shortly after) the previous checkpoint is destroyed. The result is that most operations must be done out-of-place thus increasing memory usage.
    In v0.5.x, checkpoints were only done in places where disk usage was low. So the pressure of needing to preserve extra data was not felt. But in v0.6.2, checkpoints are everywhere - including places with high disk usage.

Fixes:

  • Fixed a bug where, in some scenarios, the program will halt with "Memory Allocation Failure".
  • Fixed some bugs that would cause small computations to error.
  • Fixed tons of other minor corner-case bugs.

Postponed features:

  • XOP and AVX2 support has been put off until I acquire the hardware.
  • Euler-Mascheroni Constant hasn't been started yet. (Although I should probably do this since that's what the "y" in "y-cruncher" stands for. Doesn't do much justice for a program to be missing its mascot. :P)
  • Lemniscate's Constant is "extremely low priority".

One thing worth noting is that the error-correction in v0.6.x is less aggressive than in v0.5.x. Version 0.6.2 relies more heavily on the new checkpoint-restart system to recover from errors. So instead of attempting to correct errors on the fly, v0.6.2 will usually just terminate the computation - thereby forcing the user to resume from the last checkpoint.

 

Note that the error-detection is just as aggressive as before. It's only the error-correction that has been laxed.


February 17, 2013 0.6.1.9282

Windows Only
 

After such a long rewrite, this is the first of the v0.6.x series. Not all features are implemented yet. Most notably, the majority of the constants are still missing swap mode. But it's good enough for ram-only benchmarks.

 

This release is mostly to put an end to the nearly 2-year break of releases.

 

New Features:

  • More Detailed Output: The program will display much more detailed output during a computation. This feature used to be developer and private-version only. But it is now enabled in all public releases.
  • Component Stress-Tester: Runs individual algorithms in y-cruncher to stress different parts of the system.
  • I/O Benchmark: Benchmarks and evaluates a swap configuration for large computations.
  • Multi-layer Raid-File:
    • Hybrid RAID 0+3 swap-file management to allow for redundancy while preserving the multi-HD functionality.
  • Compilation Options: An option that displays how each binary has been compiled. Originally a developer-only feature for development purposes, it has been enabled in the public release to satisfy those who are curious.

Removed Features:

  • Batch Benchmark Pi
  • The Stress-Tester from v0.5.5 and earlier.
  • Support for x86 without SSE3.
  • Basic Swap Mode

Changes:

  • Privilege elevation is now required to run y-cruncher. This should put an end to all those file allocation problems (which have gotten worse in Windows 8 due to the different UAC settings).
  • Windows Vista or higher is now required to run y-cruncher.
  • "Advanced Swap Mode" renamed to simply "Swap Mode".
  • The division step at the end of each series computation has been separated and given its own timer.
  • The "Frequency Sanity Check" has been disabled.
  • The validation files now include detailed event logs of the computation.

Limits:

  • All constants have a hard-limit of 90 trillion digits.

Optimizations:

  • Faster Division
  • Faster Square Roots
  • Faster Base Conversion
  • Faster Multiplication for large products > 50 billion digits
  • Numerous other minor optimizations

Fixes:

  • Detection for AVX on Windows 8 has been fixed.

Missing Features: The following features are not complete and will not be in this release. All of these exist in v0.5.5, so use that if you wish to use these features.

  • Checkpoint Restart
  • Swap Modes for: e, Pi, Log(n), Zeta(3), Catalan's Constant
  • Euler-Mascheroni Constant

2013 ??? 0.6.x  

Version 0.6.x will be the first major rewrite of y-cruncher. The following changes are planned. Everything is subject to change.

 

Due to the large feature set, these will be rolled out incrementally over multiple versions of v0.6.x.

 

New Features:

  • (v0.6.1) Component Stress-Tester: Runs individual algorithms in y-cruncher to stress different parts of the system.
  • (v0.6.1) I/O Benchmark: Benchmarks and evaluates a swap configuration for large computations.
  • (v0.6.1) Multi-layer Raid-File:
    • Hybrid RAID 0+3 swap-file management to allow for redundancy while preserving the multi-HD functionality.
    • Failed drives can be replaced on the fly without rolling back to a checkpoint.
  • (v0.6.3) Lemniscate Constant: Arc-length of a lemniscate = 5.24411510858423962...
  • (v0.6.3) Swap Mode: Added for the Euler-Mascheroni Constant.

Removed Features:

  • Batch Benchmark Pi
  • The current Stress-Tester
  • Support for x86 without SSE3.
  • Basic Swap Mode
  • Basic Swap Mode optimizations/algorithms*

Changes:

  • (v0.6.1) "Advanced Swap Mode" renamed to simply "Swap Mode".
  • (v0.6.2) Ramanujan's formula for Catalan's Constant will be replaced with Huvent's BBP. (and will support Swap Mode)
  • (v0.6.1) Completely overhauled Benchmark Validation system.

Limits:

  • (v0.6.1) All constants will have a hard-limit of 90 trillion digits.

Optimizations:

  • (v0.6.4) x64 XOP: Specially tuned for the AMD Bulldozer processor line.
  • (v0.6.1) Faster Base Conversion.
  • (v0.6.1) Faster Multiplication for large products > 50 billion digits.
  • (v0.6.2) More fine-grained Checkpoint-Restart.
  • (v0.6.3) Raw I/O support for Linux.
  • Numerous other minor optimizations.

*Due to incompatibilities with new and future planned optimizations, Basic Swap Mode and the optimizations/algorithms that come with it (which are also used by Advanced Swap Mode) will be omitted in v0.6.x.

 

This, along with the rewrite, means that v0.6.x will be the first version of y-cruncher that will not be strictly faster than an earlier version. This means that some computations, under certain situations, will actually be slower in v0.6.x than with v0.5.5.


April 6, 2011 0.5.5 Build 9180
(fix 2)

Windows Only
 

Fixes:

  • The x64 AVX ~ Hina binary is now compatible with non-Intel processors. As a side-effect, the x64 AVX ~ Hina binary is about 1% faster on Intel processors as well.
  • The contact email address has been changed to a-yee@u.northwestern.edu.

February 20, 2011
February 20, 2011
0.5.5 Build 9179
(fix 1)

Windows + Linux
 

Fixes:

  • Fixed a major bug in the ArcCoth code that may cause incorrect computation of all dependent constants:
    • Log(2)
    • Log(10)
    • Euler-Mascheroni Constant

This bug has probably been present in y-cruncher since v0.4.1.


February 1, 2011
February 3, 2011
0.5.5 Alpha
Build 9178

Windows + Linux
 

New Features:

  • Support for the new Advanced Vector Extensions (AVX) instruction set.

Changes:

  • All Windows binaries with SSE/AVX are now compiled using the Intel Compiler 11.1.
  • All Linux binaries are now compiled using GCC 4.4.5. Furthermore, all binaries are compiled as C code, not C++.
  • Minor changes in speed due to rewritten code.

Optimizations:

  • x64 AVX ~ Hina:
    • Specially tuned for the Intel Sandy Bridge Core i7 processor line.
    • ~10% faster than x64 SSE4.1 on Sandy Bridge Core i7.
    • Requires Windows 7 Service Pack 1 or later.

  • The final output of digits at the end of each computation is now faster.
    Note that this has no effect on benchmarks since outputting digits to disk does not count towards computation time.

  • The built-in Digit Viewer is now faster.

Fixes:

  • The Linux binaries are now statically linked.

August 28, 2010 0.5.4 Build 9157
(fix 1)

Linux Only
 

Optimizations (Linux):

  • Retuned I/O. This may or may not be faster than before. Note that raw I/Os are still not used because the current implementation is slower than using straight-forward buffered I/Os. (This is in contrary to Windows where raw I/Os are faster than buffered I/Os.)
  • Slightly improved speed. Added "-ffast-math" to the compile options.

New Features:

  • Colored console output has been added.
  • CPU brand detection has been added.
  • CPU frequency detection has been added.
  • Memory detection has been added.
    • Automatic memory selection has been enabled in Advanced Swap Mode.

August 16, 2010 0.5.4 Build 9150
(fix 1)

Linux - Only
 

New Features:

  • This is the first Linux release. It is slower and does not support all the features as the Windows version, but it's a start.

August 5, 2010 0.5.4 Build 9148
(fix 1)
 

Fixes:

  • Fixed a bug that would cause all computations longer than 24.8 days to trigger a "Sanity Check Error".
  • Fixed a bug that would cause a "Write Error" when using more than 10 drives in Advanced Swap Mode.

August 2, 2010 0.5.4 Alpha
Build 9146
 

New Features:

  • Checkpointing: Advanced Swap Mode computations can now be interrupted and restarted at certain checkpoints. This allows large computations to survive events such as power outages and unrecoverable computational erors. It also allows computations to be paused and restarted.

Improvements:

  • Better Error-Correction: The program is now better able to recover from computational errors. Some computational errors that were uncorrectable in previous versions are now correctable in v0.5.4.

May 13, 2010 0.5.3 Build 9134b
(fix 2)
 

Fixes:

  • Fixed a memory leak at the end of each computation. This affects Batch Mode the most because it runs many computations in succession.

April 26, 2010 0.5.3 Build 9133b
(fix 1)
 

Fixes:

  • Fixed a bug in the Compute + Verify option for Euler's Constant.

April 15, 2010 0.5.3 Alpha
Build 9132
 

Changes:

  • The Stress Test feature will now run in below normal priority to increase system responsiveness.
  • When an error is detected in the stress test feature, both threads will stop after completing (or failing) their current tests.
  • These are thanks to a number of requests that I have received from some people.

  • When switching to Advanced Swap Mode, the program will now choose a default memory setting based on the amount of total and available physical memory that is in the system.

    The ability to set the memory usage in Advanced Swap Mode was less than obvious in v0.5.2. This resulted in some users using the default lowest memory setting when it could have been a lot faster to use more memory.

  • In the Benchmark feature, benchmark sizes that require more memory than there is available are faded out. (Though they can still be run.)

  • The "Validation.txt" files that the program outputs can now be customized with your name/screenname (i.e. a way to identify that the benchmark was done by you).

Fixes:

  • Though technically not a fix, this version adds a check that detects a bug in Windows where thread creation will sometimes return a normal return value when in fact the thread fails to be created due to insufficient memory.

    Previously, this would result in silent errors that would cause a computation to give incorrect digits or trigger other redundancy checks later in the program.

  • The sensitivity of the cheat-detection has been slightly decreased as it had been giving a lot of false positives on certain motherboards with less precise hardware timers.

  • Fixed an integer-overflow bug in the 32-bit binaries that would occur when writing decimal digits at the end of a computation that is larger than ~41 billion digits.

  • Fixed a possible stack-corruption bug for computations larger than 500 billion digits.

  • Fixed some minor bugs in the interface.

Optimizations:

  • Algorithmic change in the final Base Conversion for all constants. The new algorithm is a partial implementation of the Scaled Remainder Tree method that was used in the current world record for Pi.
    • This switch provides a near 2x speed up for the conversion - or about 10% for Pi computations.
    • The rest of the algorithm will be put off to a later version and is expected to give another 30 - 40% speedup for the conversion.

  • As a side-effect of the new conversion, the memory requirements for square roots, Golden Ratio, e, and Pi have decreased slightly.

New Redundancy Checks:

  • The speedups brought on by the new conversion algorithm opens up an opportunity to add some new redundancy checks to increase the reliability of the program without decreasing performance.

  • A verification has been added to the Base Conversion at the end of each computation.
    Though somewhat expensive, this verification is done after writing the decimal digits to disk and does not count towards the "Computation Time" parameter. Therefore it does not really count as a performance penalty.

    This verification is needed because of a change in algorithm for the conversion. (see "Optimizations")

    Unlike the old algorithm, this new algorithm is not sufficiently self-verifying. Therefore, a verification is needed to catch that any computation errors that fail to propagate to the last 100 digits. (since only the last 51 - 100 digits are checked to see if a computation is correct)

    This verification also gurantees that the entire base conversion has been done correctly with a certainty of 261. (An error has a 1 in 261 chance of not getting caught.)

  • A verification has also been added to the "Final Multiply" for Advanced Swap Mode Pi computations using the Chudnovsky algorithm. This ensures that any computational error that fails to propagate to the last few hexadecimal digits will be caught with a certainty of 261. This comes with a slight performance hit.

  • A number of new and extremely aggressive redundancy checks have been added to:
    1. The "series" for all applicable constants except for Euler's Constant.
      (Redundancy checks for Euler's Constant will be included in a later version.)

      In the future, the program will also attempt to correct for errors as well.

    2. Newton's Method for Division and Square Roots.

    3. Within the new Base Conversion algorithm. (This will actually attempt to correct for errors too.)
    • Note that the first of these does come with a noticable (but small) performance hit.
    • Also note that these redundancy checks (although aggressive), will still in no way guarantees that a computation that finishes will finish with the correct results. Verification of the digits will always require a separate and independent algorithm. (or from known pre-computed results)

March 10, 2010 0.5.2 Alpha 3
Build 9082
 

Fixes:

  • Removed two hidden line-feed characters that were present in the validation files.
    These were unintentional and were causing validation problems because they are non-standard and were being messed up by various text editors and viewers.

  • Fixed an issue that would prevent the program from being able to perform arithmetic above ~20 trillion digits. As of v0.5.2, only Square Roots and Golden Ratio are unlocked beyond 10 trillion digits. Neither of them use full size arithmetic so they will not actually fail until ~28 trillion digits.

March 4, 2010 0.5.2 Alpha 3
Build 9074
 

Fixes:

  • Fixed an issue where the program will halt with an assertion error when it tries to print a line that is longer than 78 characters long.
  • Only computations larger than 10 trillion digits will be large enough to trigger this.

March 3, 2010 0.5.2 Alpha 3
Build 9072
 

Reliability Update:

  • y-cruncher now uses raw, non-buffered I/Os. This serves to bypass a number of MAJOR memory issues arising from sub-optimal OS buffering.

  • This primary purpose of this is to fix one MAJOR issue when handling extremely large files.
    When creating a large file for non-sequential writes, Windows will attempt to cache a "small" percentage of the file. What exactly is it caching? I have no idea, my guess is that it's trying to cache the portion of MFT that maps the file.

    The problem arises when that "small" percentage is not that "small" anymore when the swap files are terabytes large...

    In one of my test runs, a 2.7 trillion digit multiplication failed when the program attempted to do non-sequential writes to four 1 TB swap files (total 4 TB large). The result was that the system cache exploded which immediately triggered Windows Error Code 1450 because of insufficient virtual memory. Because of the work-around that was added in 0.5.2.9040, the program was able to continue after increasing the virtual memory size. However, it continued to thrash virtual memory for several hours before the program was terminated manually. The thrashing simply showed no signs of stopping.

    After diagnosing the cause of the system cache spike (which was more than 10 GB large), it was determined that it was due to the OS's stupid caching schemes. (of course it was never really designed for this kind of use...)

    The only true work-around to the problem was to completely avoid OS buffering by using raw I/Os.

February 26, 2010 0.5.2 Alpha 2
Build 9040
 

Reliability Update:

  • Added a work-around for an issue where page-thrashing can cause a Windows Error Code: 1450 (ERROR_NO_SYSTEM_RESOURCES).

  • This is actually not a bug in the program. It is an issue in Windows. In Advanced Swap Mode, there may be long periods of time where y-cruncher does not use all of its allocated memory. As a result, Windows will page out some (or all) of the unused portion. However, when y-cruncher finally does need to use it, Windows will thrash the pagefile like crazy. The resulting stall can sometimes be enough for the OS to fail an I/O with an error code 1450.

  • This "work-around" isn't really a work-around at all. Instead of terminating the program when it encounters an I/O error, it simply pauses and retries until it either completes sucessfully, or the user decides to kill the program because something else is clearly wrong. This may also give other types of I/O failures another chance in case of a random failure of some sort.

February 25, 2010 0.5.2 Alpha 2
Build 9037
 

Fixes:

  • Fixed an issue that may prevent extremely large computations from working properly when a low memory cap is selected.

  • This is an issue in the 5-step convolution algorithm for squaring. This does not affect 5-step convolution for multiplication. Not all cases of squaring via 5-step convolution are affected. Only when the memory selection is very low does it occur. (3.3 trillion digits using less than 8GB of ram will trigger this.)

  • When this issue is triggered, one of two things may happen:
    1. The program will be tricked into using 3-step convolution - which may result in extreme performance degradation.
    2. The program will terminate with an error stating that there is insufficient memory.

February 23, 2010 0.5.2 Alpha
Build 9025
 

Fixes:

  • MAJOR fix to Advanced Swap Mode. This version fixes an issue that was causing a major performance degradation in Advanced Swap Mode.

  • The source of this is because of slight differences between the public releases and the private betas.

    Generally speaking, only the internal builds of the program are tested. Those internal builds have extra code in them that displays detailed debugging information. None of this code is compiled in the public releases.

    It just unfortunately turns out that there was some "required" code that was accidentally put with the debugging code. So it was not compiled in the public versions for v0.5.2.9021.

    This build fixes those errors.

February 23, 2010 0.5.2 Alpha
Build 9021
 

New Features:

  • Advanced Swap Mode:
    • Yeah, it's about time... This is probably the only thing that matters in this version. :P
    • Allows large computations to be done using very little ram.

    • Full support for multiple hard drives: Although this may seem redundant of Raid 0, it allows for unlimited drives. This serves to overpass limitations imposed by Raid 0. (which are usually limited to 4 - 6 drives)

      Total bandwidth scales linearly with the # of drives, but bottlenecked by the slowest drive. This is potentially better than Raid 0 in some cases. You will need to play with the settings to achieve the optimal combination of Raid 0 and the multi-hard drive setting in y-cruncher.
      (For example: 3 x 4-way Raid 0 vs. 4 x 3-way raid 0.)

    • Note that this is a very primitive version of Advanced Swap Mode. It has also yet to be burn-in tested so it's potentially very buggy. And lastly, it isn't supported for all constants and algorithms yet.

      The x86 versions now use 64-bit indexing for Advanced Swap Mode. So they should be clear for computations greater than 20 or 41 billion digits (which are the respective limits for signed and unsigned 32-bit indexing).
      However, I have yet to test them above those sizes so they may still fail if there are any remaining 32-bit indexes that should have been converted to 64-bit.

      The x64 versions have always used 64-bit indexing for everything, so they are clear for all sizes up to the theoretical limit of the program.

      Future improvements should include:
      • Reduced total disk memory usage.
      • Reduced # of disk I/Os.
      • Checkpointing and crash-recovery.
      • Support for all constants and algorithms.
      • A 3rd algorithm for Catalan's Constant. The current secondary algorithm is extremely I/O bound due to its use of the AGM (Arithmetic Geometric Mean).

  • New Validation Scheme:
    • The validation now provides much more detail than in the previous versions.

    • All computations are now validated.
      • All constants. Not just Pi.
      • All algorithms.
      • All computation modes.

    • Even failed computations and benchmarks are validated. They will simply be marked as "failed" in the validation certificate.
    • The validator is now easier to use. Just upload the file. No more manually entering fields.
    • Only the benchmark feature for Pi will be able to auto-verify the computed digits. But the last 50-100 digits that are computed will be included in the validation so that they can be verified using external sources.

Fixes:

  • Memory estimation is more accurate. (Previously, it would underestimate actual memory usage by as much as 50MB.)
  • The Stress Test feature will no longer over-shoot the target memory usage by about 50MB. (Same bug as above.)
  • When entering a write path or a swapfile path, the program will now actually check to see if the path is valid and writable.
  • Fixed the timers in the "Compute + Verify" option for e.

Limits:

  • Advanced Swap Mode without raising the limits of the program is kinda useless:
    • The limits for e, and Pi have been raised to 10 trillion digits.
    • The limits for Log(2), Log(10), and Zeta(3) have been raised to 1 trillion digits.
    • The limits for Catalan's Constant, and Euler's Constant have been raised to 250 billion digits.

  • Note that these limits are well above the current world records for each respective constant. (At the time of this writing.)

    So feel free to attempt a world record if you have the resources. But bare in mind that the program has NOT been tested at these sizes. So there is no guarantee that it will function correctly.

    I can no longer afford to tie up my machines for extended periods of time, so I can't do anymore long running computations on my own machines.

  • The x86 versions are all limited to 80 billion digits. This is because computations above that size become extremely inefficient without using more than 2GB of ram (which is the limit for x86).

    Internally, x86 versions are capable of performing much larger computations than a mere 80 billion digits. But it would be completely impractical to do so without the use of SSDs (Solid State Disks) - which is not recommended anyway because of write-wear.

  • As with the previous few releases, Square Roots and Golden Ratio have no limit.
    They are capped at 90 trillion digits to give a couple orders of safety margin before reaching the precision limit of 64-bit floating-point - which is the true theoretical limit of the program.

Optimizations:

  • All x64 binaries are now a bit faster. (16 register tuning)
    • Prior to this version, the vast majority of performance-critical code was written on x86 and tuned for 8 GP and SSE registers - which is sub-optimal for x64. The x64 binaries for this version are better tuned for 16 registers (GP and SSE).
    • 5 - 12% faster on AMD K10.
    • 2 - 6% faster on Core 2.
    • 2 - 5% faster on Core i7.
    • The speed of the x86 binaries is unchanged since v0.4.4.

Other:

  • This version is not speed consistent with v0.4.3 - v0.4.4 (which have become semi-standardized). Furthermore, this version will likely be the first in a series of successive optimizations. Therefore the use of v0.5.x for competitive benchmarking should be held back until the speed of the program stabillizes.

  • Advanced Swap Mode opens the possibility for hard drive benchmarking. But this will be heavily biased towards machines with a lot of ram and a lot of hard drives (or SSDs) running in parallel.

    (Which could turn into a competition of who has the "most" hardware - rather than who has the "best" or "best tweaked" hardware...)

Janurary 6, 2010 0.4.4 Build 7762b
(fix 2)
 

Fixes:

  • Fixed the benchmark validator.

December 2, 2009 0.4.4 Build 7760
(fix 1)
 

New Features:

  • The last 50 - 100 digits are printed out at the end of a computation.
  • The Compute + Verify modes for all constants that support it will now actually compare the digits from the computation and verification runs to see if they do indeed match. This auto-compare already existed in v0.1.0 - v0.2.1, but was taken out completely from v0.3.1 onwards. This release re-enables this feature. But for the sake of efficiency and ease of implementation, it only compares the last few digits of the two runs to determine if the computations match (whereas v0.1.0 - v0.2.1 compared ALL the digits).

Fixes:

  • Fixed a bug in the CPU consumption and utilization %'s.
  • Fixed some minor bugs in the x86 binaries.

Changes:

  • The CPU consumption and utilization measurements no longer include the time needed to write digits to disk. They now only measure the actual computation time. Writing digits to a slow disk had the effect of drastically lowering utilization and efficiency %'s leading some to beleve that the program is a lot less efficient than it really is. Enabling vs. disabling hexadecimal output also had a huge effect on the measurements.

November 18, 2009
0.4.4 Build 7748
 

New Features:

  • New specially tuned binary for AMD K10 Processors.
  • Start and end dates have been added to computations. (Useful for those extra long computations.)
  • CPU utilization and multi-core efficiency statistics have been added.
  • Added an "Advanced Options" section. The benchmark validator has been moved there.
  • Users who are running an x86 OS on an x64 SSE3 capable system will be informed.
  • Added detection support for AVX and FMA instruction sets.

Optimizations:

  • x64 SSE3 ~ Kasumi: (Credit to Raymond Chan.)
    • Specially tuned for Phenom II X4.
    • 0.5 - 2 % faster than v0.4.3 (x64 SSE3) on Phenom II X4.
  • Slightly faster Log(2), Log(10), and Euler's Constant.

September 29, 2009
0.4.3 Build 7681
 

New Features:

  • Colored Console Output:
    • Slightly less dull-looking than previous versions. :)
  • Automatic Version Detection:
    • Launch Executable. It will automatically choose the best version of the program to run.
  • Validated Batch Benchmarks:
    • Standard and SuperPi-sized batch benchmarks now provide validation.
  • Stronger Anti-Tampering Protection:
    • Binaries that have been tampered with will not run.
    • Helps guard against validation cracking via modding the executables.

Fixes:

  • Fixed a bug in the Digit Compare feature.

Optimizations:

  • All Binaries:
    • All SSE binaries are now compiled using the Intel Compiler.
    • Numerous internal optimizations.
    • Status refreshing has been capped to once/second to reduce printing overhead for small benchmarks.
  • x64 SSE4.1 ~ Ushio:
    • Specially tuned for Core i7.
    • 5 - 18% faster than v0.4.2 (x64 SSE3) on Core i7.
  • x64 SSE4.1 ~ Nagisa:
    • Specially tuned for Harpertown.
    • 0 - 12% faster than v0.4.2 (x64 SSE3) on 2x Harpertown.
  • x64 SSE3:
    • Retuned for a smaller cache. (Previously tuned for 3MB cache/thread.)
    • Speed up vs. v0.4.2 (x64 SSE3) varies by processor. (Typically around 5 - 10%)
  • x86 SSE3:
    • Retuned for a smaller cache. (Previously tuned for 2MB cache/thread.)
    • Much improved multi-core efficiency for small computations.
    • 15 - 50% faster depending on computation size.
    • Single-threaded timings are now competitive with PiFast 4.3.
  • x86:
    • Retuned for a smaller cache. (Previously tuned for 2MB cache/thread.)
    • Much improved multi-core efficiency for small computations.
    • 10 - 40% faster depending on computation size.

Overall:

  • This is the first release dedicated primarily to optimizations. There are few functional changes.
  • Note that the binaries have gotten a lot larger since v0.4.2. This is because the Intel Compiler does more aggressive optimizations than the Visual Studio Compiler.

August 10, 2009
0.4.2 Build 7438
 

New Features:

  • Batch mode option for running automated benchmarks.
  • Stress Testing option for stability checking and burn-in testing.

Fixes:

  • Corrected the name for the secondary formula for Catalan's Constant.
  • Corrected some spelling errors.

July 24, 2009
0.4.1 Build 7412 (fix 1)
 

Fixes:

  • Fixed a major bug in the Basic Swap mode for the x64 binaries.

July 22, 2009
0.4.1 Build 7409
 

Fixes:

  • Fixed an issue where a "Sanity Check Error" would sometimes occur for extremely fast benchmarks that take less than a few seconds.

July 20, 2009
0.4.1 Build 7408
 

New Constants:

  • Square Root of any small integer
  • Golden Ratio
  • e

New Features:

  • Added "SuperPi" sized benchmarks:
    • 1M, 2M, 4M, etc... up to 128G.
  • Existing benchmarks have been extended to 100b.
    • To satisfy those who have access to server-racks and super-computers... Don't even try these on a desktop... :)
  • Digit Compare is back and with full support for compressed digits.
  • Compute and Verify is back for the constants that benefit from reusing steps.
    • e
    • Log(2) and Log(10)
    • Euler's Constant

Fixes:

  • Fixed a bug in the secondary formula for Euler's Constant where it would sometimes terminate with an "Allocation Failure" even when there is plenty of memory.
  • Fixed a bunch of bugs in the x86 binaries...

Optimizations:

  • Minor speed-ups in a few random places.

Limits:

  • Thread limit has been increased from 64 to 256 threads.
  • 200 billion digits for Square Roots, Golden Ratio, e, and Pi.

Other:

  • A lot of code has been rewritten and retuned in preparation for some future features. So there may be some minor speed differences for all computations.
  • Dropped support for x64 without SSE3.

May 14, 2009
0.3.2 Alpha
Build 6953 (fix 1)
 

Fixes:

  • Fixed a major bug in the digit viewer where it may incorrectly view compressed decimal digits in .ycd files larger than ~2 GB.

April 30, 2009
0.3.2 Alpha
Build 6945
 

New Features:

  • Added a single-threaded mode for Benchmarks.
  • Minor improvements to benchmark validation.
  • Benchmark validation is now slightly more resistant to cheating.

Optimizations:

  • Computations now require less memory. (~20% for Pi, less so for other constants)
    • The 2.5b, 5b, and 10b benchmarks will just barely fit into 12GB, 24GB, and 48GB of ram respectively - perfect for triple channel Nehalem systems.
  • The % complete status now has a bit more resolution.

Fixes:

  • The error-correction feature has been fixed. In benchmark mode, errors will automatically fail a benchmark even if the error is recoverable.
  • Some inconsistencies with the reported cpu frequency have been fixed. Note that the incorrect readings on multiplier-jacked CPUs have NOT been fixed yet.

April 17, 2009
0.3.1 Alpha
Build 6897
 

New Features:

  • Benchmark Validation and Anti-cheat protection.
    • Pre-set sizes for validated benchmarks:
      • x86: 25m, 50m, 100m, and 250m
      • x64: 25m, 50m, 100m, 250m, 500m, 1b, 2.5b, 5b, and 10b
        • The larger benchmarks will require a LOT of ram. New Challenge!
        • How high can you overclock while maintaining a full ram configuration?
        • How high can you overclock a fully-loaded workstation?
    • Benchmark computations will be verified against known digits to ensure that they are correct.
    • Anti-clock tampering protection.
      • Timings now use hardware clocks - which are more accurate and cannot be tampered with via system clock.
      • Try to tamper with the clocks (there's more than one), and it will fail validation.
    • Validation Checksum
      • Checksums are computed from Benchmark Time, CPU frequency, CPU type... (among other things).
      • Protects against output tampering.
      • Protects against system substitution. (transfering the output of a valid benchmark from a faster computer to a slower one)
  • New Layout for Option Selection
    • The program starts with a set of default options - which can be changed manually. This avoids all the option selection from the previous versions.
    • Auto-detect # of threads.
    • Shows estimated disk usage for swap computations.
    • Output path can be now be specified.
  • Compressed Digit Format
    • Hexadecimal digits will compress to 50% of text-file size.
    • Decimal digits will compress to roughly 42% of text-file size.
    • Compressed digits can be read directly by the new digit viewer.
    • Compressed digits can be split into smaller files and accessed individually by the digit viewer.
    • This feature is already present in the new Digit Viewer. Version 0.3.1 fully integrates it.
  • Euler's Constant can now be computed to any # of digits. (They were locked to specific sizes in the previous versions.)
  • Compute and Verify + File Compare have been temporarily disabled as they need to be updated to support the new compessed digit format.

Overall:

  • This release consists of mostly interface changes. No optimizations. No bug fixes.

April 10, 2009
0.2.1 Alpha
Build 6841
 

New Constants:

  • Log(10)
  • Zeta(3) - Apery's Constant
  • Catalan's Constant

Fixes:

  • Fixed a pagefile thrashing problem when writing digits at the end of a large computation that used all the ram in a computer.
  • FFT setttings have been pulled back to more conservative levels. This comes at a slight speed penalty, but is necessary to ensure reliability.
  • Fixed some errors that were caused by the program being a little bit too aggressive with multi-threading. This also comes at a slight speed penalty.
  • Added an extra redundancy check for base conversions. (see below)*

Optimizations:

  • Faster "Compute and Verfy" for Log(2) via a better pair of Machin Formulas.
  • Improved multi-core efficiency. Barely noticable on dual-core but obvious improvement on 8-core. (Nearly 10% improvement in some cases on 8-core.)
  • Basic Swap Mode is now a bit faster and requires only half the memory from before.

Size Limits:

  • x86
    • 466 million digits for Euler's Constant
    • 840 million digits for all other constants
  • x64
    • 29.8 billion digits for Euler's Constant
    • 31 billion digits for all other constants

Overall:

  • This second release (as well as the next few) consists mainly of new features and bug fixes. There won't be much in the way of optimizations. Therefore, the next few releases won't be much faster (and maybe even a bit slower if certain bugs fixes necessitate it). I'll make up for it when I start doing optimizations.

*This extra redundancy check is needed to close a small weakness in the method that y-cruncher uses to verify its base conversions.

In order to understand the following paragraph, you must be familiar with radix conversions on floating point numbers.

For a record size computation to qualify as a new world record, it must be verified.
y-cruncher performs a base conversion on a number by first normalizing it to an integer, and then base converting the integer.
The current method of verifying a base conversion is to do it twice using different cutting parameters and apply a modular hash check on the final (integer) base conversion. However, I have found that the powering stage of the normalization step goes through much of the same arithmetic even with different cutting parameters. This opens up a weakness. Since the base conversion is done twice, any hardware errors will be caught. However, if there is a bug (programmer or compiler error) that affected the normalization, it may result in the same incorrect answer for both conversions because of the "shared" arithmetic (and thus pass final verification).
To close this weakness, I have added a modular hash check to the powering stage of the normalization. All existing records that have been set prior to this change should still be fine because y-cruncher already has redundancy checks built into its multiplication. And of course, the digits agree with previous records.


January 19, 2009
0.1.0 Alpha
Build 6013
 

Initial Constants:

  • Pi
  • Log(2)
  • Euler-Mascheroni Constant

Initial Features:

  • Versions: x86, x86 SSE3, x64, and x64 SSE3
  • Basic Swap Mode
  • Multi-Threading
  • Multi-Hard Drive
  • Semi-Fault Tolerance

Size Limits:

  • x86
    • 233 million digits for Euler's Constant
    • 420 million digits for all other constants
  • x64
    • 7.4 billion digits for Euler's Constant
    • 10 billion digits for all other constants