News (2022)

Back To:

Zen4's AVX512: (Sepember 26, 2022) - permalink

Now that the embargos have lifted, I have published my breakdown of Zen4's AVX512 over on Mersenneforum.

So if you're a SIMD programmer or just curious about architecture in general, this might be worth a read.

Version 0.7.10 and AMD Zen4: (August 31, 2022) - permalink

Zen4 is set to be AMD's first processor to support AVX512. You know what that means - a new y-cruncher binary for it!

AMD has graciously provided me a pre-release sample of their Ryzen 9 7950X. And using that, I'm able to produce a Zen4-optimized binary - well ahead of launch and in time for the hardware reviewers to pick up.

Since most information about Zen4 is still under embargo, I cannot say anything about it at this time. If you happen to have access to a Zen4 system, feel free to try out this new release.

If you are a hardware reviewer who uses y-cruncher as one of your benchmarks, you will need to grab this latest version of y-cruncher to get the best results on Zen4.

The existing Intel-optimized AVX512 binaries for Skylake and Tiger Lake do not run optimally on Zen4, so you will need the new binary. Fortunately, the performance of the other binaries remain unchanged in v0.7.10. So Zen4 benchmarks on v0.7.10 can be directly compared with those of other processors using y-cruncher v0.7.9. Thus you do not need to redo your benchmarks for competing processors if they are already done with v0.7.9.

Overall, this was a very fun project which I enjoyed. Being pre-release meant that all the usual optimization and architectural resources that I usually rely on do not exist yet. So I had to do all the reverse engineering myself to figure out enough of architecture to where I could optimize for it. Unless someone beats me to it (via leaks), I intend to publish my findings as soon as is allowed.

AMD's support for AVX512 may be the trigger that finally breaks AVX512's chicken-egg problem. For better part of the last decade, nobody used AVX512 because of poor support. And since nobody used it, it received poor support. Now with AMD's backing, adoption of AVX512 may finally start to increase and perhaps put Intel at a competitive disadvantage until they bring it back to the consumer market.

I mentioned earlier this year that Zen4 and Sapphire Rapids X were the other two chips I wanted to test and optimize for. Now with Zen4 fulfilled (for now), that leaves Sapphire Rapids - which looks like it's having its fair share of delays. So I obviously have no timeline for that and I may end up skipping it if it ends being cost prohibitive. Just in case if anyone from Intel is paying attention...

100 Trillion Digits of Pi: (June 8, 2022) - permalink

I'm glad to announce that Google has reclaimed the Pi world record by computing 100 trillion digits of Pi!

This computation took 158 days from October 14 to March 21. Like last time, it was run on the Google Cloud platform, but with newer and improved hardware for both compute and storage.

Hardware Specs:

128 vCPU Ice Lake Xeon (2.6 - 3.1 GHz)
864 GB ram
663 TB of total storage

For more details check out Google's blog here. If you are interested in the details, you can download them from here.

Pi Day and Version 0.7.9: (March 14, 2022) - permalink

Happy Pi day everyone!

I'm back! Kinda... The project has been on hiatus for 2 years as I took a break for other things. While I'm not officially back to working on y-cruncher, I am giving it some much needed updates.

A new binary for Zen 3: AMD graciously provided me a Zen3 system to do this, though the results are mixed. The Zen 3 tunings generated by the superoptimizer aren't strictly better than Zen 2's tunings. For benchmarking, you may want to try both and see which one is faster for your specific system.

The 18-CNL binary for Cannon Lake has gotten a facelift and is now retuned for an 8-core Tiger Lake laptop with 64GB of ram. This significantly bigger system has produced a binary that is noticeably faster than the old one. (3-5% improvement) This improvement should translate to Alder Lake and Sapphire Rapids.

Added some minor features and bug fixes for some new issues that arose on some very large computers.

Compilers and libraries have been updated 2 years to the latest. I have yet to test if the latest TBB fixes the low CPU utilization issue from before.

A lot of the old binaries (00-x86, 04-P4P, 05-A64, 08-NHM, 11-SNB, 11-BD1), have been either removed or retuned for newer processors. In these past 2 years of hiatus, I cleaned out my lab and retired a bunch of the older computers which were used to tune these binaries. So don't expect these binaries to be any faster on the old systems they were originally meant for.

Looking forward, I do expect to do updates for Zen 4 (AVX512!) and possibly Sapphire Rapids HEDT - though the latter is questionable depending on pricing.

Generally speaking, I try to max out a system's core-count and memory since that gives the best tuning results (and thus best performing binary). But this can be expensive to do for higher end systems since nearly all my hardware is out-of-pocket.