[ Edit: I eventually settled for a totally different ALU design than sketched here! Ik keep this log entry just to preserve history. -MvK ]
We will do a custom ALU, not because we don't have any 74'181 IC's available, but because it is more fun.
There is a beautiful 12 chip MUX-based design out there, nicely described by Dieter Mueller. It even has a shift-right instruction which the 74181 is lacking. Without that it would be 10 chips. I'm tempted to use this design but I still worry about the many control lines that go in, 9 if I count correctly. Well, 8 if we drop de SHR support. Many control lines means many chips in the decoder, unless we use a ROM but that is slow.
That's why I consider something else, based on 6 chips per nibble, with less flexibility but therefore also fewer control lines. The necessary operations are there (A+B, A-B, A&B, A|B, A^B). There is also a "B" operation that we can use to load data without modifying it. We need that because all traffic to registers goes through the ALU in our design. In our data path we can also put AC on the BUS, so we have things like A+A. And there is an "A+1" that we will make use of in the STIX instruction ("store-and-increment-X") later on.
The main part is straightforward: three stages, some logic on top, some multiplexers in the middle to select intermediates and a final addition stage. With 4 control lines we can generate our desired functions plus a handful more that are not very useful, such as "(A ^ B) + 1". Of our functions, only "A|B" is a bit difficult to visualise, because there is no OR-chip in the circuit. It uses the identity A|B = (A&B)+(A^B) instead. Finally, we won't store the carry as we don't want to have a status register. Maybe in a later phase we can use the carry in some useful way.
Four control lines is OK already. With that we could make an opcode scheme where 4 bits select the desired ALU operation immediately, without any further decoding, and let the other bits select the addressing modes. Then we assign the less useful codes to instructions that don't use the ALU, such as store and jump instructions. We need to derive the "write" and "jump" detectors with some logic but that shouldn't be hard. Also, during the first phase of the clock the "load" lines into the registers and the "write" line into the RAM must be muted anyway (for different reasons), which means there should be no worry for glitches while deriving these signals with combinatorial logic.
Still haven't decided yet on this one.