Close

Compute better, compute more

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 11/24/2021 at 07:470 Comments

There are 2 main and complementary methods to reduce the time needed to scan the state of a whole statespace:

  1. compute more arcs in parallel
  2. test for 2 conditions to detect complementary/symmetrical orbits.

The first method yields the best speedup, relying on multi-processor parallel processing. So far I can use POSIX threads but I also plan to use CUDA on RTX3070. Before I can setup the software chain, I also explore using SIMD: AVX promises 256 bits wide packed integers that GCC seems to handle mostly fine. https://gcc.gnu.org/onlinedocs/gcc/Vector-Extensions.html looks "reasonable", except I need a bit more than basic operations: carry wrap-around and conditional jump on comparison will require some intrinsics or even assembly...

SIMD code is quite important because this is inherently how GPGPU work. Developing the AVX version will help make a better CUDA implementation.

The second method was introduced in 76. More efficient scan of the state space and only brings a speedup of (almost) 2 but even that is significant. What if a computation would run 3 months instead of 5 months ? I still have to get the details right though, and this is the priority because the result will be the basis for the parallel versions.

Discussions