I think I cracked it :-)
The MUX8 are all identical and a circular permutation controls 7 bits. The last bit has a different permutation to reach the ideal fanout of the gates. Hopefully this will let me make a better register set, both with relays (easier construction) and with VHDL (shorter, more generic code).
I'm just trying to reduce the length of the wires and the long crossings :-)
Oh, that's even better :
The sequence of permutations is :
I now have to rewrite my register set VHDL code...