Close

gPEAC to 1 million

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 04/27/2023 at 00:180 Comments

With the last log 139. Heuristic breakthrough, we found a sieve-like structure for the moduli that give perfect and maximal orbits. We also found that perfect and maximal orbits obey to the same rules except for one (the base 4 "detail").

I have only tested up to the prime 131 but it is interesting to get more of these because they will help to better sieve out more candidates with little effort. In return, more effort is needed with more thorough heuristics. As a warm-up, let's find the prime numbers that match our criteria: let's find a list of primes, up to 1 million for example, and keep only those with 1 and 9 as last digits. Well I know I could write a sieve but I have better than that to do so let's find something ready in the interwebs. It was a breeze with https://numbergenerator.org/numberlist/prime-numbers. which let me download a file of  78496 odd numbers up to 999983. That should be enough.

grep '[19]$' Primes.1M.txt > Primes.19.txt
wc -l Primes.19.txt

ok that was easy. 39210 numbers left. What is way less easy is to get  the corresponding pairs of forbidden remainders for each of these bases. I have already noticed some funky patterns in them and the sieve implies some successive elimination algorithm but let's not get too dizzy: the numbers, not the ideas, rule, so let's make them speak first. But for that we need *more* than the 131540 orbits already computed. It's going to take even more time than before...

Fortunately now we have even more insight into them and we can apply 2 new heuristics:

  1. We can stop the iteration once they exceed the maximal threshold, thus halving the computation time one in every 4 unsieved candidates. This would save about 20% of computation time overall.
  2. We can pre-sieve out more numbers. A previous version would discard all 2 mod 5 candidates but we can do much better.

Oh and we can run the code on more cores at once and concatenate the results. But first, a bit of benchmarking.

gPEAC_bases_basic.c scans the first 10000 orbits in

...
* 9974  99490648  -49745324
* 9975  49755299  0
* 9978  99570460  -49785230
* 9985  49855104  0
 1383 / 10000
147.60user 0.02system 2:27.94elapsed 99%CPU

Stopping the long iterations early is easy to code.  Here is the output of gPEAC_bases_earlystop.c

M 9959  49595819
P 9974  49745329
M 9975  49755299
P 9978  49785235
M 9985  49855104
 1383 / 10000
153.53user 0.01system 2:33.95elapsed 99%CPU 

I expected a 1/5th speed increase but there is no gain, probably because the early stop adds a test/condition in the inner loop. More inlining and peep-hole optimising might be required, removing conditions inside the computation. gcc -O3 barely made a dent:

M 9985  49855104
 1383 / 10000
147.81user 0.01system 2:28.18elapsed 99%CPU 

 A bit of unrolling removed a few seconds. Streamlining the return mechanism (smaller loop exit computation, gotos...) removes a few more seconds. But nothing significant yet.

I got a small gain with this code:

uint64_t orbit(uint64_t modulo, uint64_t max) {
  uint64_t len=0, X = 1, Y = 0;

reloop:
    Y += X;
    if (Y >= modulo) {
      Y -= modulo;
      X++;
    }
    if ((0== ((Y ^ 1) | X))
       || (len > max))
      goto return_1;

    X += Y;
    if (X >= modulo) {
      X -= modulo;
      Y++;
    }
    len+=2;

 if ((X^1) | Y )
    goto reloop;
  return len;

return_1:
  len++;
  return len;
}

I like that it saves the S variable but the actual gain is in the % range. The loop exit conditions don't affect much the runtime. I suppose that the most taxing part is the "if > modulo" part, which flushes the pipeline so often. I can come up with some x86 sequences but inline code is dirty, right ?

.

.

.

Oh wait, I opened another can of worms. See the next log: The end of benchmarking.

Discussions