I decided to redesign my ALU, the old one worked but it wasn't easy to see how it worked. It was too clever, I think people expect an ALU to compute everything (AND, OR, XOR, ADD/SUB, ASR) in a way they can see and then select the right output - that is much easier to understand.
Minimal ALUs are limited by the ripple-carry propagation time, so the first thing to do was to get my fast carry propagation design working. The slow bit of bipolar junction transistors is switching them off, so I got rid of all of them and used diodes to do all the carry propagation (these will be Schottky diodes for fast switching and low voltage drop). So here it is in it;s simplest form:
Generate, G, is high if both inputs are high. Kill, K, is high if both inputs are low. The carry goes in on the left, if K1 is high then the first NPN is on and the carry is grounded, if G1 is high then the carry out is high else the carry is propagated. That's pretty simple, I feel confident I can both get this implemented in a way that you can see the signal flow (as above) and thus be able to explain how this time-critical bit of a CPU works to all.
There's more detail at http://www.nanocpu.org/alu