I decide to make with KISS rule at the first place.
For slave outputs I decide to make chained shift register powered with SPI, when I want to make it more complicated I can add small CPLD/uC to keep more registers and be more "intelligent"
I think also about adding extra lines for UART and CAN buses
First test is here:
Next step is to make this idea more robust....