Close

Fusion 2: the manual tinkering

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 01/08/2022 at 03:020 Comments

That log's going to be a long one...

The log 100. Fusion was about the ideas and principles but all the mental simulation will not replace actual data. So I started to play with small logs to get an idea of the correctness of my assertions.

And... you can guess that it didn't go as planned. I had overlooked a nuance or two that break the idealistic algorithm I had in mind. It's not totally thrown away though it must be refined and updated with new principles.

It took a few tries to get something off the ground but at least, now, I see what was missing from my first idea: the principle of odd/evenness of the position of a semi-arc. If you fuse all pairs of semi-arcs, then you have to be careful because they are provided in random order. So if you process them as they come, every other 2 semi-arc you encounter may be at an "odd" position, thus comprising a start and end index that will be (or has been) already eliminated through fusion.

Imagine, you have these elements that make an orbit :

1-2 2-3 3-4 4-5 5-6 6-7 7-8 8-1

You can pair them as they come :

We get 1-3 3-5 5-7 7-1 and with the above algo, we get fewer sequences: 1-5 (drop 3) 5-1 (drop 7), and the last pass finds that we get to 1-1 which is a closed orbit.

But the numbers are not provided in such a neat sorted way. It's a pretty good random order, in fact, which creates some problems, particularly in the context where 1) we can only access the list of semi-arcs sequentially 2) RAM can only contain a limited amount of data compared to the stream size. So let's have a look at a simple case: w3.

The list of semi-arcs sorted by the "source" or beginning is:

0 1 1  1
1 6 1  6
2 7 1 10
3 2 1  5
4 5 1  3
5 3 1  4
6 4 1  5
7 0 1  1

When sorted in order of "destination", or end of the semi-arc:

7 0 1  1
0 1 1  1
3 2 1  5
5 3 1  4
6 4 1  5
4 5 1  3
1 6 1  6
2 7 1 10

Together, side by side :

0 1 1  1    7 0 1  1
1 6 1  6    0 1 1  1
2 7 1 10    3 2 1  5
3 2 1  5    5 3 1  4
4 5 1  3    6 4 1  5
5 3 1  4    4 5 1  3
6 4 1  5    1 6 1  6
7 0 1  1    2 7 1 10

This is what the fusion program will see, and will process sequentially.

The basic algorithm is quite simple:

if the entry is valid and contains no removed value, then
   *  output the new entry calculated as
         dst.src  src.dst  src.arcs+dst.arcs  src.len+dest.len
   *  mark entries that contain src.src to be removed/ignored

But it won't work. So let's see how things would be ideally, by ignoring the constraint of "only going forward", and let's fuse the entries by hand:

0 1 1  1    7 0 1  1   fuses to 7  1  2 2 , remove 0
let's get the line with dest.src=1:
6 4 1  5    1 6 1  6   => 1  4  2 11,  remove 6
Now let's find the line with dest.src=4:
5 3 1  4    4 5 1  3   => 4  3  2  7  remove 5
and now, dest.src=3:
2 7 1 10    3 2 1  5   => 3  7  2 15  => remove 2

The start and end are the same number = 7 so we're good. But in practice, in the order of the stream, this gives:

0 1 1  1    7 0 1  1   => 7  1  2 2  => remove 0
1 6 1  6    0 1 1  1  removed by 0
2 7 1 10    3 2 1  5   => 3  7  2 15  => remove 2
3 2 1  5    5 3 1  4  removed by 5
4 5 1  3    6 4 1  5  removed by 6
5 3 1  4    4 5 1  3   => 4  3  2  7  remove 5
6 4 1  5    1 6 1  6   => 1  4  2 11  remove 6
7 0 1  1    2 7 1 10  removed by 2

The result is a shuffled mess, as well:

7  1  2  2
3  7  2 15 
4  3  2  7
1  4  2 11

So how do we get there from the original sequence ?

First, we have to define an origin, a standard starting point. This is typically the entry starting at 0, so it ends at 1, and implicitly it is fused with the semi-arc that starts at MASK (the equivalent of -1 if signed numbers existed).

Second, we must use a few inferences to remove some entries from partial information. Let's fuse the first 2 lines:

src           dst
0 1 1  1    7 0 1  1     => 7  1  2  2  =>   remove 0
1 6 1  6    0 1 1  1   dst.src removed by 0

Since 0-1 on the 2nd line is already processed by the first line (since that line contains dst.src < src.src), not only can we drop that line, but we can deduce that 6 will later be removed as well, because it will be at an odd position, while 1 is even. So we can already mark 6 as dropped. And the src.dst value at 6 can also be marked too... The next 2 lines are similar:

2 7 1 10    3 2 1  5     => 3  7  2 15  =>   remove 2
3 2 1  5    5 3 1  4   2 already removed, mark 5 as to be removed

The 4th line contains a src.dst that is already processed, src.src > src.dst, and marked as "remove" by the precedent line, so it will not be fused, and we can deduce that the dst.src will be "odd" as well.
The next 2 lines get pretty interesting:

4 5 1  3    6 4 1  5    we know 5 is to be removed, mark 6 to be removed
5 3 1  4    4 5 1  3    5 to be removed & src.src & dst.dst => 4  3  2  7  => remove 5

So there is a distinction: a "removed" entry gets fused and output if the src.src and dst.dst fields match. Otherwise, for example if the removed entry matches dst.src, the entry is dropped/discarded (as shown on the 2nd line). The last line confirm this:

6 4 1  5    1 6 1  6   6 marked to be removed but = src.src & dst.dst,  output 1  4  2 11  
7 0 1  1    2 7 1 10   dst.src=2 => to be removed

The same process is repeated for the remaining entries, the new list is sorted in 2 ways and pasted side by side:

src:
1  4  2 11
3  7  2 15
4  3  2  7
7  1  2  2

dst:
7  1  2  2
4  3  2  7 
1  4  2 11
3  7  2 15

1  4  2 11    7  1  2  2
3  7  2 15    4  3  2  7
4  3  2  7    1  4  2 11
7  1  2  2    3  7  2 15

 And the fusion restarts, line by line:

1  4  2 11    7  1  2  2    remove 1, output 7  4 4 13   
3  7  2 15    4  3  2  7    remove 3, output 4  7 4 22
4  3  2  7    1  4  2 11    removed by 1
7  1  2  2    3  7  2 15    removed by 3

And from 

7  4 4 13     4  7 4 22 ==> 4 4 8 35

we see that src=dst, so the orbit is closed with 8 semi-arcs and 35 steps. This is as expected so it's a success!

But that was an easy one, with only 8 semi-arcs and one orbit. Things will get naughty with more complex cases.

___________________________________________________________

Let's see if the algorithm still holds for w4:

src           dst
 0  1  1  1   15  0  1  1
 1  3  1 18    0  1  1  1
 2 11  1  6   12  2  1 16
 3  7  1  5    1  3  1 18
 4 10  1 14   13  4  1  7
 5  6  1  4   11  5  1  8
 6 14  1  5    5  6  1  4
 7  8  1 10    3  7  1  5
 8  9  1  3    7  8  1 10
 9 13  1 24    8  9  1  3
10 12  1  4    4 10  1 14
11  5  1  8    2 11  1  6
12  2  1 16   10 12  1  4
13  4  1  7    9 13  1 24
14 15  1  9    6 14  1  5
15  0  1  1   14 15  1  9

The easy ones are the first lines, as usual:

 0  1  1  1   15  0  1  1      => 15  1  2   2  mark 1 and 15 as "to be fused" and 0 to be removed
 1  3  1 18    0  1  1  1     dst.src=0 : marked to be removed, so mark src.dst=3 as "odd"

So in fact, each line gives us several hints about future entries: on entries that can be fused, we can mark future entries that will be fused, these are the end and start indices, and the intermediary value (between start and end) should drop/remove entries where it is found as dst.src. I think maybe 3 scoreboards will be necessary, which is not impossible since one scoreboard for w32 is 512MB "only", overall that's 1.5GB, or 1/10th of my RAM.

 2 11  1  6   12  2  1 16    => 12 11  2  22  , 2 removed

Here, we don't know. Is this line odd or even ?

Just output it as is, maybe, or fuse it and mark all the start, intermediary and end accordingly. But then, some line must then be compensated somewhere later. 2 is odd, 11 and 12 are even.

 3  7  1  5    1  3  1 18  =>  1  7  2  23

Here, we find 1 as a known "end" already processed, as a dst.src field, and 3 is already marked "known odd", so the entry can safely be fused, and 7 is marked as even.

The next line is the same case as the 3rd line, since we have no way of knowing if it is odd or even so we fuse it anyway..

 4 10  1 14   13  4  1  7  => 13 10  2  21  , 4 removed

4 is marked as removed/odd, 13 as source (even) and 10 as destination (even), for the later entries.

 5  6  1  4   11  5  1  8  => 11  6  2  12  , 5 removed

Here we find 11 as start and it was already marked as destination (even) on the 3rd line so it can be safely fused, and 6 marked as a destination (even). Thus 5 is marked as odd.

 6 14  1  5    5  6  1  4  

Here 6 is already marked on the previous line as a destination (even), but it's both src and dest so this line is odd... but we can mark 14 as odd (?), and output

=> 6 14  1   5

unchanged because the rest of the line is already fused.

But since 14 is a destination, 14 should be marked as even.

Here, we already know that 7 is even and 3 is odd:

 7  8  1 10    3  7  1  5

The end of the line is already fused so we can output

 7  8  1 10 

and mark 8 as odd... but since 8 > 7 it will come later (hopefully with a fusion) so we can simply drop the entry as well, as it will come later, right ?

And indeed, the next line:

 8  9  1  3    7  8  1 10  =>  7  9  2  13   8 removed

8 is odd and removed, 7 is even as a start, and we get 9 as even too !

Then comes 9, which we already know is even:

 9 13  1 24    8  9  1  3

so we can mark 13 as odd, and 8 is already odd.

But on line 4, 13 is already marked as even (source, already output) !

9 and 13 are already output so the only solution is to "realign" the stream and just output

=> 9 13  1 24

as is.

10 12  1  4    4 10  1 14

4 is already removed, 10 already a destination (odd) and 12 is already even (source). 4-10 is already fused so we can output

=> 10 12  1  4 

The next one is quite.... useless:

11  5  1  8    2 11  1  6  removed by 2 AND 5

5 is odd, 11 is even and 2 is odd. The right half is already processed at the 3rd line, and the left half already fused. So there is nothing to output here, just like this next line:

12  2  1 16   10 12  1  4

The right half is already output, 2 lines above, and the left half has already been output/fused on the 3rd line.

13  4  1  7    9 13  1 24

Again,  right half is already output, just as the left half. Nothing to output.

14 15  1  9    6 14  1  5

Here, the right  half is already output (on line 6) but not the right half. 15 and 14 are even so we can output

=> 14 15  1  9

Finally:

15  0  1  1   14 15  1  9

The first half is already fused by the first line, and the second half output by the previous line, so nothing to output.

Overall, when everything is chained in order, we get:

15  1  2   2
 1  7  2  23
 7  9  2  13
 9 13  1  24
13 10  2  21
10 12  1   4
12 11  2  22
11  6  2  12
 6 14  1   5
14 15  1   9

2+23+13+24+21+4+22+12+5+9 = 135

The sum of the arcs is right, the output contains 6 fusions and 4 straight copies. The same process as above is repeated to shorten the list, little by little... It seems that the use of appropriate flags helps a lot, it is not necessary to peek and poke at random locations all the time. The only random access is a non-critical write of forward flags, they are read in order.

_____________________________________________

The second phase starts with the output sorted in 2 ways:

sort src:
 1  7  2  23
 6 14  1   5
 7  9  2  13
 9 13  1  24
10 12  1   4
11  6  2  12
12 11  2  22
13 10  2  21
14 15  1   9
15  1  2   2

sort dst:
15  1  2   2
11  6  2  12
 1  7  2  23
 7  9  2  13
13 10  2  21
12 11  2  22
10 12  1   4
 9 13  1  24
 6 14  1   5
14 15  1   9

And we get:

 1  7  2  23    15  1  2   2
 6 14  1   5    11  6  2  12
 7  9  2  13     1  7  2  23
 9 13  1  24     7  9  2  13
10 12  1   4    13 10  2  21
11  6  2  12    12 11  2  22
12 11  2  22    10 12  1   4
13 10  2  21     9 13  1  24
14 15  1   9     6 14  1   5
15  1  2   2    14 15  1   9

As usual, the 1st line is "even" by default:

 1  7  2  23    15  1  2   2   => 15 7 4 25

1 is marked odd, 15 and 7 are even.

The next line, we don't know, so let's consider it even:

 6 14  1   5    11  6  2  12  =>  11 14 3 17

We know nothing about 6, 14, or 11 so 6=odd, 11=even, 14=even

 7  9  2  13     1  7  2  23

7 is even, 1 is odd, 9 unknown: the right half is already output/fused so output only the left side.

=> 7  9  2  13

and 9=odd.

 9 13  1  24     7  9  2  13

9 is odd, 13 unknown, but 7 is even and the right side is already output, so output the left side.

=> 9 13  1  24 

and mark 13 as even since 9 is odd. But these last 2 entries could have been fused...

10 12  1   4    13 10  2  21  => 13  12  3 25

13 is even, but 10 and 12 are unknown, so merge, and 10=odd and 12 even.

11  6  2  12    12 11  2  22

11=even, 6=odd (left half already fused) and 12 is even but right half is not already sent. So copy it to the output:

=> 12 11  2  22
12 11  2  22    10 12  1   4

right half is output on the last line and the left half is already fused so nothing to do.

13 10  2  21     9 13  1  24

The right half is already output. left half already merged.

14 15  1   9     6 14  1   5

The right half is already merged. Left half not yet so copied to the output:

=> 14 15  1   9
15  1  2   2    14 15  1   9

Left half is already merged at the first line. The right half is already output at the previous line. Nothing to do.

Only 3 merges, reduced the length from 10 to 7 so it's not really a "halving".

Previously I thought about alternating passes with forward and backward scans, is that more efficient ?

_________________________________________________________

3rd pass:

sorted by source:
 7  9  2  13
 9 13  1  24
11 14  3  17
12 11  2  22
13 12  3  25
14 15  1   9
15  7  4  25

Sorted by destination:
15  7  4  25
 7  9  2  13
12 11  2  22
13 12  3  25
 9 13  1  24
11 14  3  17
14 15  1   9

Side by side:
 7  9  2  13    15  7  4  25
 9 13  1  24     7  9  2  13
11 14  3  17    12 11  2  22
12 11  2  22    13 12  3  25
13 12  3  25     9 13  1  24
14 15  1   9     11 14  3  17
15  7  4  25    14 15  1   9

As usual: first line is merged.

 7  9  2  13    15  7  4  25   =>  15  9  6  38

mark 7 and 15 as output.

 9 13  1  24     7  9  2  13

7 was already output at the first line but not 9 so output 9 :

=> 9 13  1  24 
11 14  3  17    12 11  2  22  => 12 14 5 39

11 and 12 are not yet output, mark them as such.

12 11  2  22    13 12  3  25

12 is already output, but not 13 (yet):

=> 13 12  3  25
14 15  1   9     11 14  3  17

11 is already output but not 14:

=> 14 15  1   9
15  7  4  25    14 15  1   9

15 and 14 are already output, so do nothing.

___________________________________________

4th pass:

src:
  9 13  1  24
 12 14  5  39
 13 12  3  25
 14 15  1   9
 15  9  6  38

dst
 15  9  6  38
 13 12  3  25
  9 13  1  24
 12 14  5  39
 14 15  1   9

side by side:
  9 13  1  24      15  9  6  38
 12 14  5  39      13 12  3  25
 13 12  3  25      9 13  1  24
 14 15  1   9       12 14  5  39
 15  9  6  38     14 15  1   9

 You know the drill now:

  9 13  1  24      15  9  6  38  => 15  13  7 62
 12 14  5  39      13 12  3  25  => 13  14 8  64
13 12  3  25      9 13  1  24 => nothing (both already out)
 14 15  1   9       12 14  5  39 => 14 15 1 9 (12 already out)
 15  9  6  38     14 15  1   9  => nothing (both already out)

=>

 15  13  7  62
 13  14  8  64
 14  15  1   9

___________________________________________

5th pass:

sorted by src:
 13  14  8  64
 14  15  1   9
 15  13  7  62

sorted by dest:
 15  13  7  62
 13  14  8  64
 14  15  1   9

side by side:
 13  14  8  64     15  13  7  62
 14  15  1   9     13  14  8  64
 15  13  7  62     14  15  1   9

here we go again:

13  14  8  64     15  13  7  62  => 15  14  15  126
14  15  1   9     13  14  8  64  => 14  15  1 9  (13 already out)
 15  13  7  62     14  15  1   9  => nothing (14 & 15 already out)

___________________________________________

6th pass:

15  14  15 126    14  15 1 9  =>  loop closed @ 14, 16 semiarcs, len=135
14  15   1   9    15  14  15  126  => nothing (14 & 15 out)

___________________________________________

So the whole process took 6 passes instead of 4. The 50% increase is considered tolerable, though it could be improved a bit, with the help of a "purgatory buffer": fused entries are written directly to disk but unmodified semi-arcs are held in a temporary FIFO, in case they can be combined anyway while they are still in memory. The point is to reach the point where all entries fit in RAM as soon as possible.

...

By the way: did anyone notice the sizes of the lists ?

16, 10, 7, 5, 3, 2, 1

We get some fun results in OEIS such as A052011 Number of primes between successive Fibonacci numbers.

Primes and Fibonacci numbers at the same time, hummmm....

Discussions