(Last updated: April 22, 2019)
Back To:
This is a table of all the active branches of ycruncher. As this is just a sidehobby, overall development is slow and there is no formal release cycle. But in recent years, feature releases have been approximately every 4  8 months with multiple patches in between.
Branch  Version #  Changes (from v0.7.7.9501) 
v0.7.7  v0.7.7.9501 

trunk  v0.7.8949285  Swap Mode Changes:
New Features:
Optimizations:
Other:

As of 2019, ycruncher is over half a million lines of code. For a single person hobby project, this is large enough to easily collapse under its own weight. So to keep things managable, development of the project has slowed down in recent years to refocus on longterm maintainability rather than rushing out tons of features and enhancements in a short amount of time.
The result of these "policies" is that it can take a very long time for anything to get done  easily months or years. But for a personal project with no deadlines or oversight, this is perfectly acceptable. Since there is less than one person working on ycruncher, development is optimized for longterm throughput rather than shortterm latency of new features.
Because of the long incubation periods of various features/improvements, there are typically many things going on at once. So I can work on whatever I feel like at any given point. If I get bored of something, I'll shelve it and come back (potentially years) later. What's shown on this page is just a subset of all things going on.
Feature  Description  Status 
Nth Root Radicals  Custom Formulas:
Add support for nth root radicals. This will enable the computation of Gamma(1/3). 
This was originally intended for v0.7.7 with the first release of the Custom Formula feature. But it was dropped due to some unexpected numerical stability issues combined with the large backlog of other tasks for v0.7.7.
This is now confirmed for v0.7.8. The new function is called "InvNthRoot" and will compute x^(1/r). 
Slave Mode  Provide a way to control ycruncher through TCP. This will allow 3rd party applications to build a GUI around ycruncher.
More details here: https://github.com/Mysticial/ycruncherGUI 
An early version of this was launched with v0.7.6. As of v0.7.7, it theoretically should be complete enough to implement a full GUI for both the stress tester and custom compute options.
Incremental progress will continue to be made. In particular a unified way to express menus and suboptions is still needed. 
Feature  Description  Status 
exp(x)  Custom Formulas:
Add support for the exponential function. This will enable the fully generic noninteger power function for real numbers.
This involves inverting log(x) with Newton's Method. 
This one is going be ugly. While it's conceptually easy to invert the natural logarithm with Newton's Method, there's a whole mess of other things to deal with:
This was on the roadmap for v0.7.8, but it's looking increasingly more uncertain now. 
Improved Stress Tester  The current stresstester has increasingly poor coverage in today's processors. In fact, the program's unit and integration test frameworks are a better stress test than the dedicated stress test itself!
Find a way build a new stress test around these "better" workloads. 
The difficulty here is that the "best" stress test requires mixing all the different instruction sets (SSE, AVX, AVX512). But each of ycruncher's binaries focuses on only one of them and it's not easy to use a "lower" ISA when compiling for a "higher" one since the compiler will try to vectorize the code to use the higher one. 
Optimized Square Root  Custom Formulas:
The current square root is just the inverse square root followed by a multiply by the input.
It may be possible to do better by merging that multiply in the final iteration  as is the case for division and reciprocal. 

Optimize the AGM  Custom Formulas: In the final iterations of the AGM, much of the output is already known. These can be used to skip some of the early iterations of the Newton's Method square root. 

Optimize log(x)  Custom Formulas:
The current log(x) implementation is a dumb wrapper around AGM(1,x).
There are ways to make the AGM require fewer iterations that should be investigated. 

NonMonotonically Convergent Hypergeometric Series  Custom Formulas:
Extend the SeriesHypergeometric function to allow series that are not monotonically convergent.
This is needed for confluent hypergeometic functions at large inputs where the series initially diverges before eventually converging.
This will allow certain approximation algorithms to be implemented:

This is expected to be very difficult.
Irregular convergence behavior wreaks havoc on the Binary Splitting implementation since a lot of assumptions break down. While the BrentMcMillan formula for the EulerMascheroni Constant already exhibits this behavior, it is specially handled.
The most difficult part of this is precision control. In order to do precision control, ycruncher needs to know:
#1 is complicated, but likely doable. #2 doesn't seem approachable in the generic case with the mathematical techniques that I know of.
#2 is difficult due to destructive cancellation. Confluent hypergeometric functions are notorious for this behavior. And it is just the simplest case.
Take a series and something as simple as taking its derivative can drastically alter the magnitude of the value that it converges to. (i.e. exp(x^{2}) and the Error Function for very large x)
It is likely that #2 will require the user to explicitly tell ycruncher what the final magnitude will be. 
These are projects which have either stalled made no recent progress or are so long term that there is no roadmap.
Feature  Description  Status 
Rewrite the Radix Conversion  The current radix conversion code is actually a prototype that ended up in production. It has so many problems that it's basically unsalvageable.
The current code is actually the 3rd radix conversion to be used in ycruncher (and the 2nd to use the Scaled Remainder Tree). But it's still a prototype since it's the first to combine all of the following features/optimizations:
At the time (2011), this was very ambitious  perhaps a little too ambitious. In the end, it required so many hacks that the thing became a complete mess.
The radix conversion will need to be completely redesigned and rewritten. 
Stalled. No progress has been made in years.
Still trying to figure out a way to approach this in a way that will be maintainable without sacrificing any performance...
The existing radix conversion has virtually no internal abstraction. This makes it both very fast and unmaintainable.
No progress has been made in years due to a backlog of higher priority issues.
The only work has been maintenance to keep the current workinprogress code working with respect to breaking changes in other parts of ycruncher. 
Reduced Memory Mode  For Pi Chudnovsky and Ramanujan, add a mode that will allow a computation to be done using less memory/disk at the cost of slower performance.
For a typical computation, most of the work requires very little memory. It's the occasional memory spike that causes ycruncher to have such a high memory requirement.
There are about 4 large memory spikes in a Pi computation. In approximate descending order of size, they are:
These spikes can be flattened via spacetime tradeoffs in the respective algorithms. Since the tradeoff only needs to be done at the spikes, the overall performance hit should be reasonably small. 
Stalled. No progress has been made in years.
As of v0.6.8, only the first two memory spikes have been suppressed. The overall memory reduction is not enough to be worth enabling the feature publicly.
The last two spikes both involve the radix conversion which is completely blocked pending the rewrite of the radix conversion code.
This partially completed feature was used for the 12.1 trillion digit computation of Pi.
As of 2017, this is low priority due to a backlog of higher priority issues. 
MRFM  MRFM stands for "MultiRegion Far Memory". It is a very large experimental project that will attempt to solve the NUMA problem and more generally, the supercomputer problem.
The current design of MRFM that is planned is a fundamental departure from ycruncher's battletested model of computation. So virtually all highlevel code will need new implementations for MRFM.
As a result, the plan calls for two completely new computation modes:
This will bring the total number of modes to 4. But if all goes as planned, MultiRegion Swap will become a nocompromise generalization of the current Swap Mode. So it will be possible to remove the current Swap Mode without losing much functionality.
Due to the sheer scale of this project along with a large number of unknowns, this was expected to be (and has become) a multiyear project with no guarantee of success. 
Much of the planning for this began years ago. But actual coding didn't start until around September of 2016.
As of 2018, progress is stalled due to large amounts of technical debt in ycruncher's highlevel code which have been accumulating for years. The radix conversion mentioned above is one such example of this technical debt.
MRFM remains a multiyear project with no end in sight. 
Disk I/O is the bane of large number computations. Historically, it was bad only because it is slow. Nowadays, it's also unreliable.
"Unreliability" comes in several forms:
(1) and (2) are easily solved using checkpointrestart and periodic backups. ycruncher has supported checkpointing this since v0.5.4 so it isn't really a problem anymore. But (3) is very scary and remains a huge problem today.
Silent data corruption is the worst since it's undetected. It plagues database applications, and unfortunately, ycruncher is extremely vulnerable as well. If an error manages to slip through ycruncher's builtin redundancy checks, it will cause a computation to finish with the wrong digits.
Analysis:
Hard drives already use errorcorrection and CRCs to ensure data integrity. And transfers between the drive to the controller are also protected with CRCs. So you would expect that data corruption would be detected by the operating system right? Nope...
When a hard drive fails to read a sector due to CRC failures, it usually propagates all the way back into the program and manifests as a (1). So that's not a problem. But transfer errors between the drive and the controller are less ideal. Supposedly, transfer errors lead to retransmission. But this isn't always the case.
Throughout the years of developing ycruncher, transfer errors have also been observed to:
(1) is to be expected if the connection quality is so bad that the data gets stuck in a retransmission loop.
(2) is also be expected if the corrupted data passes a CRC by chance. (I have no idea how long the CRC is, but if it's CRC32, a 1 in 4 billion chance isn't small.)
(3) makes absolutely no sense at all. If the hard drive was able to detect the error, then why the hell doesn't it notify the OS that the operation failed?
In any case, there are other things that don't add up. And in the end, we are forced to accept the reality.
To date, ycruncher's only defense for silent data corruption is the RAID3. When the swap file configuration is set to use RAID3, (almost) all reads are parity checked. And if the data is bad, it will report a failure. The parity bits are flipped so that zeroing errors that zero everything will fail the parity.
Unfortunately, this is far from sufficient:
To date, silent data corruption has yet to bring down a worldrecord attempt. But it happens regularly in the development test machines (which contain some very old hard drives). There has also been an instance reported on a forum where a 1 trillion digit computation of Pi failed  presumably due to silent data corruption.
Approximately 80% of the disk I/Os that ycruncher does are covered by redundancy checks at a higher level. So in the absence of RAID 3, silent data corruption will be detected with 80% probability if we assume uniform distribution. To put it simply, 80% is not good enough. But at the very least, failing a redundancy check should be an immediate and serious red flag.
Possible Solutions:
Use a filesystem designed for data integrity (such ZFS). While this is almost too obvious, there will also be performance tradeoffs.
The other approaches involve adding checksumming into ycruncher's raidfile implementation. But this is easier said than done:
Inlining checksums into the data will break sectoralignment and incur copyshifting overhead. But perhaps this can be merged with the RAID interleaving. Placing the checksums elsewhere will add seek overhead. Either way, it will be messy.
Due to the poor state of the current raidfile code, any significant change will likely imply a complete rewrite of the entire raidfile implementation.
ycruncher is currently a shared memory program. So it will run efficiently given the following assumptions:
As of 2016, these assumptions hold for almost all singlesocket systems  which include the majority of personal desktop computers and lowend servers.
But due to the laws of physics and the speed of light, assumption #2 does not hold for larger systems. So now we're into the territory of NonUniform Memory Access (NUMA), cluster/distributed computing. For now the focus will be on NUMA systems since that's what the majority of the multisocket systems are.
In short, ycruncher does not run efficiently on NUMA. While it scales reasonably well onto dualsocket, it all goes downhill after that. There have been numerous reports of nonscaling (and even backwards scaling) on quadOpteron systems. And all of this is due to the NUMA.
To exacerbate the problem, OS's normally default to a "firsttouch" policy for memory allocation. The problem is that ycruncher allocates all its memory at the start of a computation with a single thread. With the firsttouch policy, all the memory will be allocated (or heavily biased) on one NUMA mode. So during the computation, all the cores from all the NUMA nodes will be hammering that one node for memory access. The interconnect going in and out of that one node will be overwhelmed by the traffic which leads to terrible performance.
ycruncher v0.7.3 adds the ability to interleave memory across nodes. This balances the interconnect traffic and leads to a significant performance on modern dual and quadsocket systems. But ultimately, this doesn't actually solve the problem of NUMA since the memory accesses (which are now randomized) will still be hitting remote nodes over the interconnect.
To be efficient, the program needs to be aware of the NUMA and needs to be designed specifically for it. This is a fairly difficult task which is made worse by the unlimited combinations of NUMA topologies. Long story short, making ycruncher NUMA aware will require changing the way that memory is fundamentally stored and managed. This means that it will need to be done as a new "mode" much like how "Ram Only" and "Swap Mode" have completely different data storage formats.
Currently, the most promising solution is to generalize the functionality of swap mode in the following ways:
Many binary splitting recursions have a lot of common factors between the fraction numerators and denominators. Implement an optimization that seeks to remove these common factors so that the numbers are smaller in size.
GCD Factorization was first described here (now a dead link). Since then, there have been subsequent publications that describe the method:
The general idea of the optimization is to keep the prime factorization of some of the recursion variables. Then use this factorization to obtain the GCD of the numerator and denominator terms. Obtaining this prime factorization is done using a sieve.
Current implementations of this optimization include GMP Pi Chudnovsky and TachusPi. And most literature reports a speedup of about 20  30% for Pi Chudnovsky.
In the context of ycruncher, this optimization is applicable to the following constants/algorithms:
Current Problems:
Even though GCD factorization has been known for years, ycruncher has yet to use it for a number of reasons:
While (1) and (3) are certainly solvable, (2) is more difficult. All the ideas so far for attacking (2) are either complicated and fragile, or they require breaking out of the current memory model. For now, there's bigger fish to fry.