The Sorting Demon

I must have watched too many videos of graphical sorting algorithm visualisations. These are hypnotic... I'm so impressed by radix sort and bucket sort, and all the others have a very good lesson in algorithmics to give. But there is always this voice in your head that says, without telling how or why, that "there must be a better way" for simpler situations, which explains why there are so many sorting algorithms out there, and sorting is used in many places so why resist...

I was looking at the Fibonacci Heap algorithm and wondered how it could be turned into a sorting algorithm, so... nah. I want something much simpler. And it seems I have devised something that borrows from many domains. It is probably not the best (radix sort being the best ever, yeah) but "at least it's mine and i know how to do it" (a variant on the NIH syndrome). It may already exist under a name I don't know, but do I care ?

It is based around the merge sort principle, which I love due to its simplicity, but I want it to work well both on totally random and on sorted inputs. So it has 2 stages :

First : scan the incoming sequence for an ascending, then a descending sequence. Once they are identified/tagged, a merge sort can be applied to obtain one ascending sequence, by scanning the descending sequence backwards.
Second : merge sort all of the "rectified" sequences. But this is where it gets less trivial. The issue is that merge sorting works best when the inputs have a similar length. So the order of merging would be... ideally determined by a modified Huffman tree ? A simpler version would use avalanches or something like that...

The last part has all the "smarts" because the first part is pretty easy. The Huffman approach may be overkill but the idea of "avalanche" sounds great and applies an adaptive binary approach naturally, while allowing in-place operation with the help of a reasonably-sized stack.

The merge algorithm is under-utilised if the 2 input sequences have a length ratio of about 2 (approximately) so this should be avoided. The only way to be sure that a new sequence is at least as large as the current sequence is if the new sequence is larger, and then check/iterate again... an avalanche...

The algorithm would alternate between the 2 stages :

scan the input for ascending and descending extents (complexity : N comparisons), then merge them (N comparisons and N moves).
see if the size of the current extent is larger than the previously available in memory. If yes, merge them. Loop to 2.

I'm sure I'm reinventing something here... but some other details must be somewhat new :-P

Now, there is something to be aware of : if a sequence is perfectly scrambled/shuffled with a random permutation, half of the sequences are 1-datum long, one quarter has 2 numbers, one eighth is 3 data long... So a pre-filter already does a LOT of work.

Anyway, with the first scan part, the algo works ideally if the input is already sorted in both ascending and descending, and the 2nd part does the rest...

I should try to code it !

...

Well OK I'm coding it.

...

20220815 : hmmm'ok, it looks like it's a cousin of Timsort. Except I don't bother with insertion sort or even a minimal size of the merge sort. The "cascade" condition of merge ensures there is no stack problem.

Going (mostly) Software

I don't make projects.

Discussions

Become a Hackaday.io Member