So lately I started tinkering with #YAMS and things started to get weird because the most important place where I would need a good sorting algo would be here, to manage the merging of the trajectories to rebuild the whole orbit(s). Previously I used the traditional ** sort** utility which implies a specific type of encoding. See

94. New packing: p7k

95. New packing at work

But if I have my own sorting infrastructure, not only I can simplify/compact the records, but I could also exploit some parallelism here and there, eventually avoiding ** sort** completely.

For now the goal is to reach w32 and this determines the format:

- uint32_t
**Xstart**: range from 0 to 0xFFFFFFFF - uint32_t
**Xend**: range from 0 to 0xFFFFFFFF - uint32_t
**Arcs**: range from 1 to 0xFFFFFFFF - uint64_t
**Len**: range from 1 to 0xFFFFFFFFFFFFFFFD - uint8_t
**Parity**flag : 0 or 1

Total : 21 bytes per record, down from 26, so the final log for w32 would be about 90GB. The last byte has very little entropy but it is required to check that w26 is perfect, not just maximal.

Maybe the record could be reduced a bit by using variable-sized **Arcs** and **Len** fields, because they are pretty low at the start and increase in size with each pass of fusion. **Xstart** and **Xend** remain fixed (to ease sorting). I'm fine with variable-sized records because random access is not required here.

**Arcs** ranges from 1 to 4 bytes (2 bits), **Len** from 1 to 8 bytes (3 bits) so there would be 1 bit left in a byte to signal the parity flag. So the minimum size of a record would be 11 bytes. The 2 remaining bits could help reduce the size further: with 3 bits, one can encode either the number of bytes, or a direct value from 1 to 4.

The fusion program should be able to perform the sorts only, so partial logs can be preprocessed for fusion. The input processing would likely be decoupled from the output processing (usint pthread ?) to keep the whole program fast, despite all the I/O stalls. The sum of all the lengths should always be computed, to catch any flaw during processing.

**Edit : **

For w32, the **Len** field should be capable to hold more than 64 bits... so 4 bits are required to encode more than 4 bytes, because the orbit could be perfect...

## Discussions

## Become a Hackaday.io Member

Create an account to leave a comment. Already have an account? Log In.