News (2013)

 

Back To:

 


Version 0.6.3 is out: (December 29, 2013) - permalink

 

It had been sitting on my desk for quite a few months now. And now it's out. It adds the Lemniscate Constant and brings back the Euler-Mascheroni Constant.

There have also been a number of refactorings and re-tunings since v0.6.2. So expect some slight differences in performance (both up and down).

 

As some of you already know, AVX is slower than SSE for AMD processors. The reason for this is explained here.

 

Unfortunately, y-cruncher is no exception: The performance of the AVX binary is much worse than that of the SSE3 and SSE4.1 binaries. Therefore, the dispatcher for v0.6.3 has been reconfigured to fall back to SSE3 for all AMD processors even if they support AVX. It falls all the way back to SSE3 instead of SSE4.1 because the SSE4.1 binaries are tuned specifically for Intel processors and don't run as well on AMD processors.

 

A new binary (x64 XOP ~ Miyu) will be coming out in v0.6.4 that is specifically tuned for AMD Bulldozer and Piledriver. It will use SSE4.1, FMA4, and XOP instructions.

 

 

 


12.1 Trillion Digits of Pi: (December 28, 2013) - permalink

 

The 10 trillion digit record had been standing for 2 years and it didn't look like anybody was trying to beat it. So we threw y-cruncher v0.6.3 at it along with some new hardware. More details here.

 

y-cruncher v0.6.3 will be released in a few days. Still needs a bit more testing...

 

 

 


119 billion digits of Euler's Constant: (December 22, 2013) - permalink

 

Using a beta version of y-cruncher v0.6.3, it took 50 days to compute and 38 days to verify a computation of 119,377,958,182 digits of the Euler-Mascheroni Constant.

 

This is by far the longest computation I've ever attempted by myself using my own hardware. The main computation was interrupted multiple times due to overheating problems and a blown out power supply. After replacing the power supply and reseating the heatsinks, there were no more hardware issues. So the verification was able to run from start to finish in a single contiguous run lasting 38 days.

 

This is also the last long-running computation that will ever run on my aged workstation. Afterwards, the machine will be retired. I'll still be keeping it around, but I will no longer be running anything stressful on it.

 

 

 


User Guide for v0.6.x's Swap Mode: (November 24, 2013) - permalink

 

y-cruncher's swap mode got a lot more complicated in v0.6.x. The lack of documentation also made it a lot harder to use.

I've finally gotten around to writing a user guide for y-cruncher's swap mode functionality.

In the future, I may add more of these for other features of y-cruncher.

 

 

 


Version 0.6.2 (fix 1) Canceled - Digit Viewer Source Released: (October 9, 2013) - permalink

 

Stuff happens... :(

 

I was originally gonna release v0.6.2 (fix 1) back in August, but some of the code-refactoring that I did had touched a bit too much of the program. So I didn't feel it was stable enough for a public release. (Shigeru Kondo knows this pretty well after I sent him some broken binaries. :P)

 

So v0.6.2 (fix 1) will be skipped and everything will be pushed into v0.6.3.

 

In addition to everything that was supposed to be in v0.6.2 (fix 1), v0.6.3 will also have:

I don't have a timeline or a release date yet. There's still a lot of testing to be done and I have less free time than when I was still in school.

 

In the meantime, I've released the source code to the Digit Viewer on my GitHub.

This is the exact same source that will be used to compile the Digit Viewer binaries that will be released with v0.6.3.

 

 

 


Some random things... (July 14, 2013) - permalink

 

Any C++ programmers out there? I've been toying around with a "tiny" Pi program that can do millions of digits of Pi. Feel free to play with it.

It isn't very fast, but it hits all the necessary algorithms to get quasi-linear run-time.

 

I've found and fixed the problem with "O_DIRECT" on Linux. Getting that to work cut the CPU usage in half. While it's a decent improvement, it wasn't as good as I had expected. And after trying out numerous tweaks, that last chunk of CPU usage from the I/O threads won't disappear. So I'll let that rest.

 

The fix that solves the "O_DIRECT" issue will be rolled on the next patch: v0.6.2 (fix 1)

 

 

 


v0.6.2 for Linux (June 30, 2013) - permalink

 

After finally getting everything to work on Linux, I did some tests and noticed something that really bothered me: Large swap computations on Linux are significantly slower than on Windows.

 

Take a look:

Notice the difference. Same computer, same settings, everything is the same except for the OS.

 

After digging around I was finally able to trace the issue. It turns out that the I/O operations on Linux were using a lot of CPU. And I mean a LOT - as in half a core per hard drive. (8 hard drives = 4 cores) WTF?!?!

Why does this matter? Because the program overlaps disk I/O and computation. If disk I/O is using a lot of CPU, then the computation threads will be denied a lot of CPU time. In the test case above, the disk I/O threads hogged half of the 8 cores! On Windows, the disk I/O uses close to nothing and the computation threads get nearly all of the 8 cores to grind at.

On Windows, I use "FILE_FLAG_NO_BUFFERING" to DMA all the I/O operations and bypass the OS cache. So there is no overhead - and almost no CPU usage.

Likewise, on Linux, I use "O_DIRECT" to achieve the same thing.

 

However, it seems that the "O_DIRECT" flag has no effect. The performance is the same with or without it. Furthermore, it seems that I can pass in misaligned buffers and sizes. So in other words, the flag isn't working. If it was, it should fail on the misaligned parameters.

 

Until I can figure out what's preventing "O_DIRECT" from working, the Linux version will not perform as well as the Windows version.

This issue has probably existed since the v0.5.x, but I never did any serious benchmarks on Linux until now.

 

 

Other things: I still plan to open-source the digit viewer. But I need to clean up the code first. It's pretty unreadable to anyone other than me ATM.

 


There were older entries. But I no longer have a record of them... :(