User Guides - Swap Mode

By Alexander Yee

 

(Last updated: July 17, 2017)

 

 

This guide explains how to use the various features in y-cruncher that are related to swap mode. This page is currently meant for version 0.7.1. But things haven't changed too much from the v0.6.x series.

 

Index:

Back To:

 

What is Swap Mode?

 

Swap mode is a feature that allows computations to be performed using disk. This makes it possible to do computations that would otherwise not fit in main memory.

In academia, this is known as "out-of-core computation". It is currently the only way to perform trillion digit computations on commodity hardware.

 

Because swap mode uses disk, it can be much slower depending on the speed of the storage. The usual solution is to simply run many hard drives in parallel. Since most out-of-the-box computers have only 1 hard drive, it implies that a significant amount of customization will be needed to build a system suitable for efficiently running large computations in swap mode. (16 hard drives or more is typical for a high-end system)

 

Solid State Drives (SSDs) may be a viable alternative as they are faster and more reliable. But a theoretical analysis of the write-wear that y-cruncher will subject them to is currently not very encouraging. So for now, do not use SSDs unless you are willing to expend them like consumables.

 

Enabling Swap Mode:

 

y-cruncher is meant to be easy to use for the casual overclocking enthusiast.

 

That's why the very first two options in the main menu are for benchmarks and stress-tests. Swap Mode is a non-trivial feature that's difficult to use correctly. Therefore it is well hidden.

 

In version 0.7.1, Swap Mode can be enabled as follows:

If you did everything correctly, it should look like the screenshot to the right.

 

Now is a good time to get yourself familiar with the Custom Compute menu if aren't already. This is essentially the "real" main menu that lets you access all the computational features of y-cruncher.

 

Options 0 - 8 are pretty self-explanatory and are not specific to Swap Mode. So we'll skip those.

 

 

Set up the Disk Configuration: (Option 9)

 

Between v0.5.5 and v0.6.1, y-cruncher's built-in raid system was completely redone and replaced with a new one that is much more powerful (and unnecessarily complicated). y-cruncher still uses this new system as of v0.7.1.

 

Use of this built-in raid is not mandatory. You are free to use hardware or

OS-supported raid. But the built-in raid is usually fastest since it is optimized for the use case in y-cruncher.

 

Either way, the use of multiple hard drives is basically a must for performance reasons. So you will need to use some kind of raid to utilize more than one hard drive.

 

 

The raid system in v0.6.1 is flexible and allows for a wide variety of setups.

 

No RAID: (option 1)

 

Selecting option 1 will set your configuration to use a single path. This will be the current directory from which the program is running in.

 

The performance of this setting will be exactly the performance of whatever drive the current directory is on.

 

This option is rarely used in serious computations. Multiple hard drives is a must. So the only real use case is if you already have a raid array setup by other means (such as hardware or OS). As a developer of y-cruncher, I use this option with a ram drive to run tests.

 

 

RAID 0: (option 2)

 

Option 2 sets up a simple RAID 0 configuration. It prompts you for the number of paths and then the paths themselves. It's pretty straight forward.

There is a limit of 64 drives. This limit of 64 is a result of using a 64-bit integer for status flags. But it can be increased without too much effort.

 

Performance Characteristics:

*The consequence of this is that a slow (straggler) will slow down the entire array. So pick your drives carefully.

This option is the most commonly used option for serious swap computations. It is the fastest option that gets the most performance from the drives. But be aware that there is no fault-tolerance. Failure of any drive and that's it.

 

 

RAID 3: (option 3)

 

Option 3 sets up a simple RAID 3 parity array. Like the RAID 0 option, it prompts you for the number of paths and then the paths themselves.

The RAID3 parity allows any one of the drives to fail without failing the entire array. But due to technical limitations, there is a limit of 8 drives. More drives can be still be used, but will require a manual configure (next section).

 

Performance Characteristics:

*The overhead comes from the need to do parity calculations which requires fine-grained synchronization of the drives.

 

This option is rarely used. The improved checkpointing system (released in v0.6.2), when in used in combination with manual backups, proved so effective at handling hard drive failures that it rendered the RAID 3 obsolete before I could finish implementing the ability to rebuild a dead drive. The 20% performance hit didn't help either.

 

As a final downside to this RAID 3 option, there's no way to rebuild a dead drive. So you will need to continue in the degraded state and hope the remaining drives hold up. The ability to rebuild a dead drive was part of the original spec. But it was never completed.

 

 

Manual Configure: (option 4 - 5)

 

Options 4 and 5 let you utilize the full functionality of the RAID 0/3 functionality. But before we move on, we need to know how it actually works.

 

y-cruncher v0.6.1's built-in raid is actually a two-level raid system.

A generic setup would look something like this:

In this example, the configuration can tolerate a failure in any of the drives, A, B, or C. But not in either D or E.

 

 

The simple RAID 0 and RAID 3 setups are merely special cases of this fully generic setup.

Simple RAID 0 (option 1) Simple RAID 3 (option 2)

The performance characteristics are more complicated to model in the fully generic case. But these simple rules should get most of the way:

*The 20% synchronization overhead mentioned earlier in the simple RAID 3 setup always applies to layer 1 - regardless of whether it's RAID 0 or RAID 3.

 

Option 4 lets you add a new layer 1 array. (a "Group")

Option 5 lets you add a new path to an existing layer 1 array. (a "Drive")

 

Using options 4 and 5, you can customize the setup you want. Some interesting and useful setups are:

2 x RAID 3 Rebalancing Unequal Hard Drive Speeds*
Tolerate up to one failure in each group. If you have 2 fast drives and 3 slower ones, rebalance them.

*This latter setup is never actually used. The 20% overhead for using layer 1 pretty much wipes out any gain from rebalancing hard drive speeds.

 

 

Overall, the manual configure is rarely used. Thanks to the checkpointing system, the simple RAID 0 seems to be sufficient for all large computations - including potentially record-setting computations.

 

 

Set per-drive Buffer Size: (option 6)

 

y-cruncher needs a scratch buffer to perform sector alignment and data-interleaving. In the past, this buffer was hard-coded to 64 MB per logical drive.

Starting from v0.7.1, the buffer size is configurable.

 

If the buffer is too small, there will be a lot of overhead. If it's too large, it would be a waste of memory. y-cruncher automatically adjusts the total buffer size based on the number of drives that are in the RAID configuration. However, it can't do this if the RAID is being done outside of y-cruncher since it doesn't know if the path(s) that you provide are physical drives or RAID of multiple drives. Therefore, you should manually set the buffer size if you are using RAID outside of y-cruncher.

 

Example:

64 MB is probably not large enough for 16 drives. Therefore you should manually set the per-drive buffer size to 1 GB.

 

 

A Note About Implementation:

 

Internally, y-cruncher uses raw disk I/O. The data in this scratch buffer goes directly to/from disk via DMA. This has the following performance characteristics:

The overhead of the disk seeks can be significant if the drive has high sequential bandwidth and high seek latency. (As is the case with large RAID 0 arrays.) So it's best to set the buffer large enough such that the overhead of the seeks is negligible. At the other extreme, there is no benefit to setting the buffer larger than 1 GB per drive since there's a limit in the system API for the largest I/O request.

 

 

Clear All and Start Over: (option 7)

 

Pretty self-explanatory.

 

 

Save/Load Configuration: (option 8 - 9)

 

This option was never implemented. As of v0.7.3, it goes away as it is superceded by the configuration files for both the Custom Compute and I/O Benchmark menus.

 

 

Other Tips:

 

 

The Backstory to the RAID 0/3:

 

When we ran 10 trillion digits of Pi back in 2011, one thing was clear: Something had to be done about the hard drive failures.

 

The result was two features that were added:

The RAID 0/3 was done first. But it was never quite finished. Nevertheless, it made it into v0.6.1. The checkpoint-restart was done later and released with v0.6.2.

 

But the checkpoint-restart proved so effective at handling hard drive failures that the RAID 3 was no longer needed. So I never bothered to finish that last part which was to rebuild a dead hard drive in RAID 3.

 

The end result is this overly complicated two-layer RAID where the middle layer (layer 1) never gets used...

 

 

 

 

The "Bytes/Seek" Parameter: (Option 10)

 

The "Bytes/Seek" used to be called the "Min I/O Size". But it has been renamed and redefined in y-cruncher v0.7.3.

 

The "Bytes/Seek" parameter is the # of bytes that can be read sequentially in the same amount of time as a seek.

Yes, this sounds confusing, so let me explain. This is particularly important for extremely large computations that require many times more storage than would fit in memory since setting it wrong can lead to severe performance degradation of 10x or more.

 

As of 2017, most hard drives have a sequential bandwidth of about 200 MB/s and a seek time of about 10ms. This means that in the time it takes to perform a seek (10ms), the hard drive could have done about 2 MB of sequential I/O.

 

Therefore, the "Bytes/Seek" parameter is 2 MB for a single hard drive.

 

 

What if we have 4 drives in RAID 0?

 

The seek time of the entire array is still about 10ms. But the total bandwidth is quadrupled to 800 MB/s. So while the "Bytes/Seek" for each individual drive is still 2 MB, the "Bytes/Seek" for the entire array is 8 MB.

 

So there are two definitions here:

Internally, y-cruncher uses the global value. That's because all the computational and mathematical code is oblivious to the disk configuration. It only sees it as a single large virtual drive. The details of the disk configuration are hidden behind an abstraction layer.

 

This is where the difference between v0.7.2 and v0.7.3 lies:

If you don't touch the "Bytes/Seek" parameter, y-cruncher v0.7.3 will heuristically pick one for you based on the disk configuration. But the moment you override it, it stops doing this for you.

 

When loading or running configuration files, you are forced to pick a value. In these cases, y-cruncher will not attempt adjust it, nor will it warn about a potentially improper setting.

 

 

What are the consequences of improperly setting the Bytes/Seek?

 

For computations that only require slightly more storage than there is memory, it won't matter. The majority of casual swap-mode computations fall into this category. But if you're running something very large (like a world record) where the storage requirement is 100x more than your memory, then the parameter will probably matter.

If the "Bytes/Seek" is small:

 

y-cruncher thinks disk seeks are faster than they really are. So it will pick algorithms that do less disk I/O (in bytes), but at the cost of more disk seeks. Therefore if the "Bytes/Seek" is set too low, the computation will spend a very large amount of time performing disk seeks rather than useful disk I/O.

 

If the "Bytes/Seek" is large:

 

y-cruncher thinks disk seeks are slower than they really are. So it will pick algorithms that do fewer disk seeks at the cost of more disk I/O (in bytes). Therefore if the "Bytes/Seek" is set too high, the computation will be doing a lot of unnecessary disk I/O.

The trade-off is usually quadratic. Reducing the "Bytes/Seek" parameter by a factor of 2 will:

Therefore, there is a much higher risk to setting the "Bytes/Seek" too low than too high.

 

 

What's going on internally?

 

The use-case for the Bytes/Seek parameter is in the disk-swapping FFT algorithms.

 

It is almost always possible to perform a disk-swapping FFT with only 2 passes over the dataset. But the larger it is with respect to the amount of physical memory, the more disk seeks are required.

 

The # of disk seeks needed to perform a 2-pass FFT is asympotically:

The square in this equation is what causes the quadratic trade-off mentioned earlier. If the data size is many times larger than the physical memory, the number of disk seeks can be so large that the computation literally spends all its time doing seeks and nothing else.

 

 

Assuming that Bytes/Seek is properly set, y-cruncher will recognize when the 2-pass algorithm becomes problematic due to disk seeks and switch to algorithms that require more passes: 3-pass, 4-pass... as many as necessary. But more passes over disk requires more disk I/O in bytes - hence the trade-off.

 

 

 

 

Working Memory: (Option 11)

 

This is the last option available in the Custom Compute menu. It lets you choose the memory allocator as well as how much memory the computation should use.

 

y-cruncher uses memory as a cache for disk. So the more the better: Give it all the memory you have, but leave some room for the OS. By default, it will use about 94% of your available physical memory.

 

The option to select the memory allocator is new to v0.7.1. You shouldn't need to change this. The default on both Windows and Linux is to use large pages that are locked in memory. But it will fall back to normal pages if that's not possible.

 

More information about large and locked pages can be found here.

 

 

Note that recent versions of Linux are eager to swap memory out to disk even when there is enough memory hold everything in ram. Unfortunately, y-cruncher gets caught up in this. When y-cruncher allocates memory, it expects it to be in memory and it treats it as such. So when the OS pages it out to disk, the result is extremely severe performance degradation. There are 4 possible solutions to this:

 

 

 

The I/O Benchmark:

 

Its purpose of the I/O Benchmark is to measure the speed of your disk configuration and suggest improvements.

 

It will tell you: (example screenshot to the right)

This is the benchmark that can determine whether the "Bytes/Seek" parameter is set correctly.

 

 

Sequential Read/Write:

 

This is pretty self-explanatory. As of 2017, most 7200 RPM hard drives will get about 150 - 200 MB/s of sequential bandwidth. Combine a bunch of them and you'll have a decent amount of bandwidth.

 

The example to the right is using 16 hard drives in RAID 0. Some of the drives are very old, so the performance of this array isn't great.

 

 

Threshold Strided Read/Write:

 

This is the measurement that tests whether the Bytes/Seek is set correctly.

 

It works by performing random access over the disk using block sizes equal to the Bytes/Seek value.

Due to imbalances in y-cruncher's algorithms, the recommended value is about 1/3 of the sequential bandwidth as opposed to 1/2.

 

Because of the high penalty for having too low a Bytes/Seek parameter, you will want to avoid the red values.

 

 

Overlapped VST-I/O Ratio:

 

This is the relative speed of the disk array to the CPU compute power.

 

For very large computations, y-cruncher will spend a lot of time streaming data off the disk, crunching it in memory, and streaming it back. The disk I/O is often done in parallel with the computational work on data that's already in memory. So you get a "race" to see who finishes first.

 

The overlapped VST-I/O ratio is essentially a measure of which is faster. The streaming to and from disk? Or the computational work?

For small computations (< 10x of physical memory), a ratio of 1.0 should suffice. Larger computations will be better served with a ratio of 2.0 or higher.

 

As of 2015, the unfortunate reality is that it will be very difficult for a high-end processor to reach a ratio of 2.0 with mechanical hard drives. It's much easier with SSDs, but as mentioned before, SSDs are not recommended as they are not guaranteed to last under a sustained y-cruncher workload. Hopefully the longevity of SSDs will eventually increase to the point where this is no longer an issue.

 

 

What to do:

 

Once you have your hardware and the disk configuration set up, there isn't much you can do with the software other than to optimize the Bytes/Seek value.

Play with it until the benchmark stops complaining about it and you're set. If the strided read and strided write speeds are drastically different for your system, this may not even be possible. If that's the case, err on the side of caution and make sure the lower of the two is not less than 1/4 of the sequential access.

 

The I/O benchmark lets you adjust the working memory size. It's usually best to use as much memory as you can. If you use less, disk caching from the OS may screw up the results. By default it will choose about 94% of your available physical memory. So you shouldn't need to change it.

 

 

 

 

Resuming an Interrupted Computation:

 

For long enough computations, bad things happen:

Fortunately, y-cruncher will have created a file, "y-cruncher Checkpoint File.txt". (unless you're at the start of the computation - in which you haven't lost much anyway)

That file contains all the metadata state from the last checkpoint it made. It's a readable text file, that contains a list of all the swap paths as well as the execution stack-trace with filenames pointing to all the swap files that are needed to resume the computation.

 

Anyway, if you re-run the program, it will see that checkpoint file and resume from that checkpoint. Nothing else needs to be done.

If you want to give up on a computation, just delete that file.

 

But there are few things to note:

*Under extreme circumstances, such as being far into a very large (world record attempt) computation, and you hit a problem that requires editing the checkpoint file or some other intervention on my part, let me know and I'll see what I can do. I do have the ability to edit a checkpoint file and regenerate the hash. But since the program is not designed for such changes, there is no guarantee that such a hack will work and allow the computation to complete with the correct results.

 

 

Why can't the checkpoint file be changed mid-computation?

 

In short, y-cruncher does not support the editing of parameters in the middle of a computation. The rules for what can and can't be changed will vary between versions of y-cruncher. And such a feature opens up several degrees of freedom in the program that needs to be tested and validated.

 

Most importantly, the majority of the things that are useful to change cannot be changed. Anything that affects the "computation plan" absolutely cannot be changed. And anything that fundamentally changes the swapfile configuration is not possible without some sort of converter which doesn't exist.

 

The following things affect the computation plan and fundamentally cannot be changed:

Things that conditionally affect the computation plan. It may be possible change them under some circumstances.

The things that can be changed, but require extra tools:

The things that are always safe to change, but of course cannot change since y-cruncher locks down the checkpoint file:

 

 

Dealing with Failures:

 

As mentioned in an earlier section, things are going to go wrong when you have a lot of hardware running for a long time.

This section will show a bunch of different ways a computation can break down. It will also show how the program handles it and what you need to do to recover.

 

 

External Software Crash:

Something in the system crashes, and it is not y-cruncher. And somehow it manages to make the system inoperable.

 

What to do:

  1. Close y-cruncher and attempt to safely shutdown the computer.
  2. If you fail, go to "Unsafe Shutdown". Otherwise continue.
  3. Fix the problem. I can't help you here since this is an external issue.
  4. When ready, restart the program and it will resume from the last checkpoint.

 

Unsafe Shutdown:

This is a pretty broad range of failures. It covers everything from power outages to BSODs from hardware instability.

 

Any unsafe shutdown is bad. Because you don't know if the OS managed to flush the disk cache for the checkpoint files. y-cruncher does actually force a flush on the disk cache before establishing a checkpoint. But there is no guarantee that the OS or the hardware actually complies.

 

What to do:

  1. Figure out what caused the unsafe shutdown and fix that first.
  2. You have a tough decision to make: Do you want to trust that the checkpoint files are not corrupted?

A good way to help your decision is to look at the timestamp of when the checkpoint was created. If it was created hours before the unsafe shutdown, then you're probably safe. It's unlikely that the OS will not have flushed the buffers to that checkpoint in that long of a time.

 

If you don't know exactly when the shutdown occurred because it happened overnight or while you were away, you'll want to look at the last created/modified timestamps of the other files in the system. A good way is to look at the Windows Event Log. That can give a rough idea of when the machine went down.

 

If you discover that the checkpoint was made minutes or even seconds before the unsafe shutdown... well... that's your call. Ideally, you will have made a backup of an earlier checkpoint to fall back to.

 

 

y-cruncher Crash:

There are a number of possibilities in this case. So let's get straight to the point:

A bug in the OS is the least likely, but nevertheless, it has happened before.

 

What to do:

  1. If there is an obvious possible source of instability (overclocking, new hardware, etc...) - check that out first.
  2. Otherwise, if you are somewhat confident that the hardware is stable, then attempt to reproduce the crash.
  3. Re-run the program. If it made a checkpoint, then let it resume. Otherwise, re-run it with the exact same settings as before.
    • If you get the same crash in roughly the same place. It's mostly like a bug in y-cruncher. Please let me know with all the details.
    • In all other cases, you should suspect that the hardware is unstable.

y-cruncher is a multi-threaded program. But it is also (mostly) deterministic. It was designed this way to avoid the problem of heisenbugs - which are common in asynchronous applications and can be extremely difficult track down and fix.

 

Because of this determinism, crashes and errors that happen intermittently are more likely to be caused by hardware instability rather than a bug in y-cruncher.

 

 

Raid-File 0/3 Exception:

The "Raid-File" is the built-in raid system. When something goes wrong, it will throw an exception and print out the file that errored.

 

y-cruncher will recognize any of the following issues:

  1. Access Denied
  2. Path does not exist.
  3. Not Enough Memory
  4. Disk is Full
  5. Write Fault
  6. Read Fault
  7. Cyclic Redundancy Check - Possible Hardware Failure
  8. The device has been disconnected.

Most of these are pretty self-explanatory. 5 - 7 usually imply a hardware issue with either a hard drive or a disk controller.

 

Unless you have RAID3 in the setup, it's unlikely you'll be able to do anything to continue the computation on the fly. Sometimes you'll get an option to retry the operation, but from my experience, it rarely ever works.

 

So the only viable option is to quit the program and restart it. It will resume from the last checkpoint that was made. Which isn't bad at all.

But if you start getting these often, pay attention to what it's failing on. If it's always on the same drive... You might wanna replace that drive before it dies.

 

Older versions of y-cruncher (<= 0.5.5) were more aggressive in letting you retry failed disk operations. But this is no longer neccessary in v0.6.x because the checkpoint restart is much better.

 

Final Note: If you find that you are getting read errors on a checkpoint file. You are basically screwed since the current checkpoint is unreadable.

Hopefully you have made a backup of an earlier (non-corrupt) checkpoint.

 

 

Modular Redundancy Check Failed:

When you see this, it usually means one of three things:

  1. Memory instability.
  2. CPU instability.
  3. An undetected buffer-overrun. (bug in y-cruncher)

Memory instability is by far the most common cause of this.

 

The modular redundancy check is a very high level of error-checking.

Getting here means that the data passed the lower-level checks. And then got corrupted in memory causing it to fail this high-level check.

 

CPU instability is possible, but less likely. Cache errors usually cause BSODs instead. But there are parts of a computation where there are no lower-level error checks. So CPU errors in those places will be caught by a high-level check like this one.

 

And lastly, it is a buffer-overrun in y-cruncher. y-cruncher has checks to detect basic buffer-overruns. But they aren't foolproof. But if this is indeed the case, you will consistently get this failure.

 

Depending on whether the overrun crosses boundaries into another thread, the hash numbers may differ between runs. But at the very least, you will consistently get a "Modular Redundency Check Failed" in the exact same place. As with all y-cruncher bugs, please let me know so I can fix it.

 

Starting from v0.6.1, y-cruncher will not attempt to recover from this type of failure. Instead, just quit the program and relaunch it. It will resume from the last checkpoint.

 

 

Generic Exception:

These don't say much other than an error code.

 

What these error codes mean is not documented at all. Even I don't know them and I need to refer to the source code to see what it is.

 

The most common of these is error code 1, which means that a large multiplication has failed a sanity check. But that isn't particular useful since almost everything is a large multiplication.

 

These can be caused by anything. So be ready to suspect everything.

In most cases, y-cruncher will not attempt to correct for such an error. So just kill the program and relaunch it. Checkpoint-restart is kind of the "catch all" for errors.

 

 

 

 

 

 

Hard Drive Failures:

We're not talking about the occasional read error or CRC fail. What do you do when the entire drive just dies on you?

 

Well... y-cruncher has no forward error-correction aside from the clunky (and incomplete) RAID3 implementation. So the only thing you can really do is to prepare ahead of time. Make backups of the checkpoints!

 

Such backups need to be done manually. y-cruncher currently has no automatic system to do it.

  1. Stop the program. Just close it.

  2. Go to each of the swap paths and you will find that all the files will be in a folder named, "ycs-00-0" of some sort. The numbers will be unique among all the paths. So copy all those swap folders to a backup. If you open up these folders, you will find swap files with different names. The only ones that y-cruncher needs are the ones with the word, "checkpoint" in them. Everything else can be thrown out. (And you probably will want to do that since they eat of a lot of space.)

  3. Copy the "y-cruncher Checkpoint File.txt" to the backup as well.

  4. Make sure the backup is disconnected only when it's safe to do so. y-cruncher has no protection against incomplete copying of swap files. So you want to make sure the disk buffers are flushed before you disconnect anything.

  5. When you're done making the backup, just rerun y-cruncher. It will pick up from the last checkpoint.

It's useful to "learn" where y-cruncher does its checkpoints. That way you can close it just after a checkpoint to reduce the amount of computation that's wasted.

 

Restoring from a backup is as simple as copying all the swap files back to the original locations. If you replaced any drives, just set the drive letter(s) to the same as before. Re-running the program will resume the computation as usual. y-cruncher doesn't know (and doesn't care) that you even replaced a drive.

 

 

Other Errors:

y-cruncher has tons of other sanity checks not mentioned here. Most of them are related to computational failures. And in such cases, try to reproduce it. That will determine whether or not it's a software or hardware issue.

 

In some cases, the program will automatically attempt to recover from the error. But these aren't reliable and memory corruption can lead to repeated failures (infinite loops of retries). As with all cases, the best course of action is just to kill the app and rely on the nuclear option of checkpoint restart.

 

 

Final Tips:

When hardware becomes unstable, expect anything. In most cases, they are easy identify because hardware errors are almost always intermittent. This distiguishes them from bugs in y-cruncher which are mostly deterministic.

 

But consistent hardware errors are possible and I've seen them happen before. There was a case where an unstable I/O controller caused a computation to end 3 times with exactly the same (incorrect) results. Assuming it was a bug in the program, I spent days trying to trace it down. But after a while, those (consistent) errors became not so consistent. When I ran the exact same computation on another computer with the exact same settings, the results were always correct. After switching around the hard drives on the original computer, the errors went away completely...

 

 

 

Undetected Errors and Failed Computations:

A failed computation of Pi to 100 billion digits. The digits are wrong.
This failure is believed to be caused by an undetected SATA transfer error.

 

 

The point of this section is not to scare anyone, but to shed some light on the reality of running large computations as well as some of the unsolved problems in the y-cruncher project.

 

 

y-cruncher has a lot of safeguards that will interrupt a computation the moment that things go bad. But unfortunately, it isn't full-proof. While it's uncommon, it is possible for a computation to finish with the wrong result. This is the reason why world records need to be verified using two separate formulas.

 

To the right is a screenshot of a failed computation of Pi to 100 billion digits. It was one of several such failures that delayed the launch of v0.6.7.

 

The program gave no errors during the computation, yet the digits were incorrect. The same computation using the exact same settings succeeded a few days later after removing a video card and reformatting all the hard drives.

 

Needless to say, such failures are the worst thing that can happen. Since you've wasted all this time to get a bad result. And you have no idea what caused the bad result.

 

Historically, the cause of such failures have been split evenly between hardware instability and software bugs in y-cruncher:

Bugs in y-cruncher tend to be reproducible and can be fixed. So while it's a problem, it's something that can be dealt with.

 

The real problem are the hardware errors. All versions of y-cruncher to date have no fault-tolerance for disk I/O when running in RAID0. So errors that occur in disk I/O will be undetected unless it manages to indirectly trigger other redundancy checks.

 

Of all the operations that operate on disk, large multiplication and Newton's Method iterations are the only ones that have their own redundancy checks. These will usually detect errors caused by disk I/O.

 

But all other operations (addition/subtraction/hashing) do not have redundancy checks. The reason for this is simply that all known methods for checking floating-point operations will incur massive performance overhead.

 

 

What to take from this is that dealing with undetected disk I/O errors is a problem for which y-cruncher currently has no good solution for. So you need to bet on the stability of your hardware. y-cruncher RAID3 capability will detect the majority of errors. But it isn't full-proof and it has tremendous performance overhead.

 

Filesystems like ZFS that are designed for data integrity are also an option at the cost of performance.

Silent data corruption on a failing hard drive that was detected via RAID3 parity.
The hard drive at "h:/" had been acting up for a while before it was put through this test.

 

A Note About Solid State Drives (SSDs)

 

Solid State Drives (SSDs) are faster alternatives to hard drives both in bandwidth and seek latency. However, they come with two severe drawbacks:

 

Size and Pricing:

 

The world record for Pi currently stands at 13.3 trillion digits. A logical "next step" would be 20 trillion digits.

 

Using y-cruncher, 20 trillion digits of Pi would require around 86 - 100 TiB of storage depending on the settings. And that doesn't include backups and digit output.

As of 2016, the largest (reasonably priced) hard drives are around 8 TB for about $250 each. So 16 of them will suffice for about $4,000 USD.

 

But if you were to go the SSD route, the largest SSDs are around 2 TB in size at around $650. You'll need about 50 of them for over $32,000 USD! And that doesn't include the SATA controllers that would be needed to provide those 50 ports. The bright side is that 50 SSDs will be absurdly fast. So if you have powerful processor(s) to complement the SSDs, you won't have to wait very long to get those 20 trillion digits.

 

 

Endurance:

 

SSD technology has certainly matured enough where they will last decades under "normal use". But y-cruncher is not "normal use". It will put the devices that it runs on under near continuous load 24/7 for as long as the computation is still running.

 

Let's run some numbers. As of 2016, a typical consumer SSD has the following specs:

These numbers are leaning towards the optimistic side. Most consumer SSDs are rated for far fewer than 5000 P/E cycles. But in practice, they will last much longer than that provided that the firmware doesn't intentionally brick the device once the limit is reached. We also assume a write amplification of only 2x since y-cruncher's workloads consists of mostly sequential access on a relatively empty drive. Even through y-cruncher will require about 100 TiB of storage to run 20 trillion digits of Pi, the usage will usually be less than half of that. It's only the "spikes" in usage that will reach the full storage requirement.

 

In a typical y-cruncher computation, the amount of reads and writes are about the same (with slightly more reads than writes). Assuming that the computation is I/O bound, it will be doing disk I/O around 80 - 95% of the time at full bandwidth. To be on the safe-side, let's assume it to be 100%. So the SSD will be sustaining about half the full write bandwidth.

 

With these assumptions, let's calculate how long the 500 GB SSD will last:

Writes Until Failure = (500 GB) * (5000 P/E cycles) / (2x write amplification) = 1,250,000 GB = 1.25 PB

 

Time to Failure = (1.25 PB) / (500 MB/s) / (1/2 portion of time doing writes) = 5,000,000 seconds = 1,389 hours = 58 days

58 days is not a very long time. Given that the last few Pi computations have taken months, it might not even be enough for a single computation. Sure, this calculation wasn't very scientific in that some of the assumptions were hand-wavy. But regardless, it doesn't instill much confidence. Do you really want to burn through that much expensive hardware in such a short amount of time?

 

In short, consumer SSD technology isn't quite ready for large number computations. Perhaps the enterprise stuff is better, but they're also a lot more expensive.

 

 

 

Frequently Asked Questions

 

What's the point of swap mode? Why can't you just use the page file?

The operating system page file swapping policies are optimized for "normal" applications. They perform terribly for specialized programs like y-cruncher. Unfortunately, it is difficult to quantify how much slower because nearly all attempts to run a ram only computation using more memory than is physically available leads to so much thrashing of the pagefile that the system becomes unresponsive. (I call it the, "Thrash of Death".) Typically the only way to recover is to hard shutdown the machine.

 

Technically, it's possible to wait out the Thrash of Death, but it's not productive to wait for something that could potentially take years to run without any indication of progress. (As mentioned, the system becomes completely unresponsive.) The tests that did finish suggest that a ram only computation using the pagefile will be "orders of magnitude" slower than using the built-in swap mode. Not a surprise at all. A page fault to disk is indeed "orders of magnitude" slower than a cache miss to memory.

 

Why is the pagefile so bad? Because the memory access patterns in y-cruncher are not disk friendly. y-cruncher's swap mode uses alternate algorithms that are specially designed to be disk friendly. It uses domain-specific knowledge to modify the internal algorithms to sequentialize disk accesses and minimize disk seeks. It also knows what it will be accessing in the future and can properly prefetch them in parallel with on-going computation.

 

Why can't this be done with pagefile hints? Because it would be more complicated than just doing the disk I/Os manually. Furthermore, there would be no guarantee that it would actually behave as desired. In short, the OS pagefile is a black box that's better not to mess with.

 

To summarize, y-cruncher wants nothing to do with the OS pagefile. It wants the OS to get hell out of the way and let y-cruncher run unimpeded. Windows is pretty good at allowing this with the right API calls. But not in Linux. Starting from v0.7.1, y-cruncher will try to lock pages in memory to prevent the OS from entering the Thrash of Death.

 

Why does swap mode require privilege elevation on Windows?

Privilege elevation is needed to work-around a security feature that would otherwise hurt performance. Swap Mode creates large files and writes to them non-sequentially. When you create a new file and write to offset X, Windows will zero the file from the start to X. This zeroing is done for security reasons to prevent the program from reading data that has been leftover from files that have been deleted.

 

The problem is that this zeroing incurs a huge performance hit - especially when these swap files could be terabytes large. The only way to avoid this zeroing is to use the SetFileValidData() function which requires privilege elevation.

 

Linux doesn't have this problem since it implicitly uses sparse files.

 

Where is the option to rebuild a dead drive for RAID 3?

The entire RAID 3 feature fell out of use after the improved checkpointing system was released in v0.6.2.

For that matter, the entire multi-level RAID implementation is a rat's nest. It's still maintained, but no longer developed. So it will remain incomplete.