What about w28?

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 01/25/2022 at 19:020 Comments

w26 was swift and w27 was a walk in the park. w28 should be similar, but longer... Is it useful however ?

In itself, I don't expect anything from w28, just as with w27 and w29. It's not the result that counts, but how it is obtained, since I develop the tools further each time and can better extrapolate and refine the ideas. Oh and if w28 has some unexpected property, it's still good to know. I have the hunch that w28 is not remarkable but it's important to be sure, as well.

What can I predict ? From what I have seen, the fusion would take about 12-15 minutes, the complete log file will be almost 7GB and the scan will take about 40 days using the 2 hexaprocessors (24 threads in parallel overall).

What can be done to speed this up again ? Well, a lot has already been done:

What's left ? AVX2 and then CUDA. The first will bring a 4× to 8× speedup, and the second will bring the last boost.

Considering that w28 will take about 40 days as is, and the SIMD/AVX2 version will bring 4x, w28 would take only about 10 days or a week if all goes well. This buys me 30 days to convert my code to break even. There are some challenges to adapt the existing scalar algorithm to SIMD, so the intermediary step of "multi-scalar" lets me get the housekeeping algos right.

I have no doubt that AVX2 will be easy to implement. w28, w29 and w30 are reachable. But CUDA is a totally different beast... I'm watching tutorials on YouTube for now.