News (2015)

 

Back To:

 


Version 0.6.9: (December 5, 2015) - permalink

 

Intel seems to be taking its fine time with the AVX512 stuff. So I guess that's not happening any time soon...

 

While that endless wait continues, there's some scalability improvements that have been sitting on my desk for months:

*The Linux versions of y-cruncher have historically been statically linked as a means to avoid the DLL dependency hell on Linux. Unfortunately, Intel does not provide a static library for Cilk Plus - thereby forcing dynamic linking. After fiddling with this for multiple weekends, I can't get anything that will run on just my 3 Linux boxes.

 

I give up. For now, y-cruncher for Linux will be available in both static and dynamic versions. The dynamic version will target a recent version of Ubuntu and will support Cilk Plus. The static version will run almost anywhere as before, but it lacks Cilk Plus.

 

 

 


Performance Announcement for Ubuntu 15.10: (November 8, 2015) - permalink

 

This applies to Ubuntu 15.10, but may also apply to other Linux distributions with the same kernel version.

 

When performing swap mode computations on Ubuntu 15.10, the OS has been observed to do excessive swap file access when y-cruncher is performing heavy I/O. This swapping is so severe that the OS becomes unresponsive and the computation stalls. It has not been observed in Ubuntu 15.04.

 

So far, the only solution that seems to work reliably is to completely disable the swap partition. Then reduce the amount of memory that y-cruncher should use so that you don't run out of memory. Another possible solution is to set the "swappiness" value to zero. But this is untested.

 

The next version (v0.6.9) will have a lower default memory setting for swap mode.

 

 

 


200 billion digits of Catalan's Constant: (June 8, 2015) - permalink

 

I'm pleased to announce that after running for more than half a year, Robert Setti has computed Catalan's Constant to 200 billion digits.

This is very impressive because Catalan's Constant is one of the slowest to compute. (among popular constants that can be computed in quasi-linear time)

 

 

 


Version 0.6.8 (fix 2): (May 7, 2015) - permalink

 

A lot of unexpected personal stuff happened this last month. I'll be starting a new job next week that is potentially much more stressful than ever before.

So I've decided to push out all the remaining bugfixes for v0.6.8. Depending on how things turn out, this may very well be the last version until the Skylake Xeon.

 

 

 


Version v0.6.8 Patched: (March 17, 2015) - permalink

 

I totally knew this would happen... The moment I rush a release (for Pi Day), something breaks.

 

It turns out there was a very large (up to 5%) performance regression on Haswell processors that scales inversely with the memory bandwidth of the system. Normally, such large regressions are caught long before they can be released. But since my primary test machine has 4 channels of overclocked DDR4, I never noticed the regression at all. It was only after Shigeru Kondo reported this did I test it on a different Haswell machine which did reproduce the regression.

 

This screw-up involved a cache optimization that was designed into the algorithm from the very beginning rather than being added later as a result of profiling. Being a premature optimization, it backfired for small computations and had no noticeable effect for large ones on my machine. Therefore I disabled it sometime between v0.6.7 and v0.6.8. Not a big deal, I expect some of these premature micro-optimizations to backfire.

 

Well... It turns out that the premature optimization wasn't really that premature after all. On large computations, it reduces the demand on memory bandwidth. If the processor is bandwidth-starved (such as mainstream Haswell), it translates to an increase in performance. Therefore, this patch re-enables the optimization on all processors except for AMD Bulldozer - which takes a 10% performance hit for some unknown reason.

 

The sad part is that, back in 2012, I did this optimization because I predicted that it would reduce the demand on memory bandwidth. But it wasn't until 2015 did the hardware and software become fast enough for it to actually make a difference. During those 3 years, I completely forgot about the optimization and left it in the "on" position until a recent refactoring touched it. That prompted me to re-evaluate it and (erroneously) disable it for the Pi day v0.6.8 release.

 

 

 


Pi Day of the Century: (March 14, 2015) - permalink

 

Surprise! There was no way I could possibly pass this day up right?

 

y-cruncher v0.6.8 is out with some new features:

 

 


Version 0.6.7 Released: (February 8, 2015) - permalink

 

It turns out that v0.6.6 had yet another serious bug that would cause a large multiplication to fail under the right circumstances. But at least this time, it wasn't related to swap mode. So after fixing that and cherrypicking it into v0.6.7 (along with 3 other things), I think we're good to go.

 

About that unstable workstation...

While the system still isn't entirely stable in Linux yet, it's in good enough shape to do longer running stuff.

 

This instability turned out to be a good opportunity to test the program's never-used RAID3 implementation. The hard drive configuration was 2 sets of 8 drives each in RAID3. One computation tolerated 9 hard drive read errors and still managed to finish correctly.

 

This entire instability mess has prompted me to update the user guide for Swap Mode with a new section.

 

 

 


Version 0.6.7 Preview: (January 27, 2015) - permalink

 

Version 0.6.7 has been built and is undergoing final testing. But I have no idea how long that will take. While everything looks good on Windows, testing on Linux is currently blocked while I diagnose an instability on my storage workstation with Linux.

 

The likely source of the instability is a massive hardware upgrade in December. Since Windows is fine and Linux is unstable, I suspect it's a driver issue. But I have yet to sort it out. The fact that I'm not much of a Linux person isn't really helping the situation.

 

In any case, part of that hardware upgrade involved adding 8 hard drives to the machine for a total of 16 drives. So v0.6.7 consists of mostly swap mode improvements that I decided to do after playing around with this 16 hard drive toy.

 

 

The main feature of v0.6.7 is a swap-mode multiplication tester which has two purposes:

The second point is important for anyone attempting world record computations. As the sole developer of y-cruncher, I only have the resources to test large multiplications up to around 5 trillion digits. Which means that I cannot reach the sizes that are now required to set Pi world records.

 

In the past, there have been bugs in the multiplication which only manifest at sizes that nobody has ever reached before. The scenario that I want to avoid is for someone else to spend months attempting a world record only to fail because of a bug in my code. In a sense, it's somewhat miraculous that y-cruncher is 4 for 4 in world record attempts for Pi. (i.e. no fatal software bugs)