It took the better part of day, but but finally I've got a low transistor count version of a single bit ALU.
It does both arithmetic and logical operations and the result ends up on a single output pin.
Usually most ALU designs have a rather big multiplexer at the output stage where one of the results of the desired operation is selected. That would not be a big issue if using ICs, just a one additional chip would be required, but when building it from discrete NAND gates it's not that simple.
My ALU slice have the usual A, B and CarryIn inputs and a Result and CarryOut as outputs. In addition to that there are seven control inputs that will be shared among all slices that will be used in the complete ALU.
One of the control inputs are required both in a normal and also in an inverted version. I could have put the inverter inside the slice and then I just had to distribute six control signals among the bus. But I choose to put the inverter on the control pcb instead - thus saving seven transistors in total.
Depending on the state of the control inputs the ALU slice does the following common functions (plus a number of strange things that I'll don't have use for).
|a AND b
|a OR b
|a XOR b
|a ADD b
|a SUB b
The ADD and SUB both have variants with/without Carry.
So I ended up with 14 transistors, 38+(14*3) = 80 diodes per slice. Outside of the slice I'll have to add some logic for the shift and roll instructions as well. I think I'll implement them parallel to the A/B/Result pins since no arithmetic or logical instructions will take place at the same time so by bypassing the main ALU will reduce the propagation delay a lot.
I'll breadboard this up and connect it to an Arduino to verify that all ALU ops are correct before I design a PCB for it.