I did spend a lot of time finding optimal ALU solutions.
The best I could find (for TTL) was this:
[ Please note that the function table shown here is a little different from that of the Square Inch ALU ]
The key to the schematic is, that a single multiplexer (the lower one), can generate every possible logic function of two variables. (Only the the most useful ones are in the table).
For logic functions, all orange carry wires should be "0" (The grey CARRY ENABLE signal must be low to accomplish this for the carry output, the connection of the grey wire to the multiplexer is not very obvious, the multiplexer symbol that I used in the drawing did not have an ENABLE input). The output of the ALU is equal to the logic output (on the red wire).
For addition, the upper multiplexer part will generate the "majority" function of the 3 inputs, giving the new carry at its output. The incoming carry is exor'ed to the logic output (logic function should be set to XOR for addition).
The ALU can also pass one of the input signals unmodified, so it can be used for a LOAD instruction to get immediate or memory data into one of the registers. For the control section, the LOAD is just another arithmetic instruction, this simplifies the control section of your CPU.
With the addition of a 74HC151, it is possible to do a fast-carry that calculates a carry for three levels at once.
The above schematic can easily be adapted to relay or other technologies.
For a relay based ALU, see the file "relay CPU technology V1.7" in the files section of https://hackaday.io/project/11012-risc-relay-cpu, that is based on the same idea. The logic function is performed in diode-resistor logic, combined with relay logic.
A transistor version can be found HERE.
(A previous version of this log was a comment made to AMBAP in 2016)