An individual job and all the necessary processing

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

Yann Guidon / YGDESYann Guidon / YGDES 07/09/2021 at 16:220 Comments

As I evaluate the potential ways to distribute computations among potentially thousands of cores, which I talk about in 41. I need a big cluster., I come to the point where I have to define the "golden algorithm" for all the jobs sent to the "cores" and test it before I modify it for the chosen platform.

Of course, all the "cores" must be configured identically with the same parameter W. This could be verified at first by computing a known arc and comparing the results.

The core receives a single argument X >= 1. The internal value of Y is cleared, C=0, and the loop starts.

function Pisano_Forward(uint32_t Xi) {
  uint64_t N=0;
    X = Xi,
    Y = 0,
    C = 0;

  do {
    C += X + Y;
    Y = X;
    X = C & MASK;
    C = C >> WIDTH;
  } while (Y != 0);

  register_result(Xi, Y, C, N);

TODO : I have to check the effect and relevance of C. X=n, C=1 amounts to X=n-1, C=0, which matters during reconstruction, but only one of those is possible.

The results are handled by the function

register_result(Xi, Y, C, N);

which is "left as an exercise to the readers" because I wanted to keep the example code short.

Tabulation/Scheduling/Job management is not very complicated: it receives a "range" to process, distributes the individual jobs picked from this range to the available resources, waits for their completion (in random order) and stores/sends the resulting bloc of result to the top scheduler. This is a trivial application of parallel computing.

Reconstruction is where all the intelligence has to shine. It could start before all the individual range blocs are received, to make sure the partial results are coherent and valid. Some ranges sent to two different schedulers could overlap, to ensure they return the same values.

Due to the duality of the space, there should be 2 arcs with the same length. This is also a hint of which belongs to the primary and complementary orbit(s). Hence, sorting by length can help separate the orbits. => wrong, I retract this...

Another helpful aspect is to run a string of jobs "in order" for the first orbit (both forward and backwards, even), which gives an advance hint of which arc belongs to what. It also provides additional validation of the parallel jobs.

In the end, all the gathered data must be sorted into orbits. Each orbit is a collection of job outputs stringed in the order they cross the Y=0 axis. The lengths of the orbits must be equal pair by pair, in reverse order. Here is a dumb example with fake values:

Orbit 1 : (1, 53), (6, 35), (3, 12), (5, 6)
Orbit 2: (7, 6), (2, 12), (4, 35), (8,53)=> wrong, as shown by runs with w16, so I retract this...

In-memory sorting sounds possible up to w28 (depending on available RAM). Above that, stream processing using files becomes necessary.


Update :

w16 has two orbits, one with 2147516415 steps and 33035 crossings (Y=0).

2147516415 = 2³¹ + 2¹⁵ -1 (check)

33035 = 2¹⁶ + 267 ???

I have to verify the behaviours for all the known widths.