Close

Post-processing: the strategy looks good

A project log for PEAC Pisano with End-Around Carry algorithm

Add X to Y and Y to X, says the song. And carry on.

yann-guidon-ygdesYann Guidon / YGDES 12/07/2021 at 20:290 Comments

I slept on the last log 91. Post-processing: the strategy is coming  and I just realised that the join/coalesce/merge/fusion program can work in any direction, so there is only one program to write, yay. There is only one command line option to add, to allow fusions with forwards or backwards arcs. Otherwise, since there are 2 synchronised streams, the fusion program doesn't care if it's sorted in increasing or decreasing order.

So far, the rest of the post-processing algorithm relies heavily on the features of the stock GNU sort program. It provides all the required options (-n, -k) and a load test just showed it uses all the available processors, without having to enable a flag/option. The only remaining issue is that it only deals with text data, which more than doubles the required storage space. Later, I'll see if/how I can re-encode the logs with the "MIDI-like" format (with 7 bits of data per byte).

Anyway, the algorithm to process w32 spends maybe 4 or 5 passes with files that exceed the RAM size, each time halving the size of the dataset. I use an external SSD for development, I could add a second one to double the bandwidth. I should check the laptop's ports if enough high speed links are available.

Each pass contains a pair of merge operations:

The burden of development is reasonable, I suppose that most of the time will be spent on sorting the huge log files. Sadly sort can't work on p4k files directly, I must develop another file/encoding format... p7ck ?

The post-processing method seems clear enough and I can return to develop the parallel scanners ;-)

Discussions