While w27 is about to complete, the same question arises again: what to do next ?
The most important thing to do is to transform the code to "multi-scalar" and then SIMD (SSE3, AVX2), as a way to prepare the CUDA code. This is slowly progressing though because other practical aspects become clear as I ramp up the computational power.
The next big thing is a way to seamlessly manage all the computation resources. A networked dispatcher would be good since I already use 3 computers and must allocate the ranges by hand. I have a bunch of Raspberry Pi boards, that still need to be clustered (#Clunky McCluster) and it will be a mess to manage them or let them participate. So here is #CHOMP. But efficient distribution of the workload is not a really critical issue.
A third development axis is to make a better fusion program. The existing one has a constraint: it can only work on a set of semi-arcs that is already complete. Running the fusion program on intermediate, incomplete sets could help soon, as I get partial contents from various computers. A partial fusion would reduce a bit the size of the data before amalgamating the various parts from different computers. The script must be changed too, because it stops only when the output file is empty, but partial fusion will give a file with an unchanged length. Which is equivalent to the existing case when the file remains empty...
This last development axis seems to be the simplest/fastest to complete, in maybe a few days, and can help a bit during very large computations, since partial fusion can be run progressively.