Today I finished the building of Arithmetic-Logic Unit for my CPU!
It took almost 3 months, 7 big perfboards and 124 logic chips (logic gates, multiplexers and a couple of bus drivers).
It can operate at up to 5 MHz, and draws up to 200 milliamps of current.
This ALU has five inputs:
1 -- microinstruction, which has 8 lines controlling the operation of the whole ALU:
- ALU_enable line, which enables ALU operation result to the data bus,
- 3 lines selecting one of the eight types of ALU functions,
- Carry_in_enable line (controlling several function flavours),
- Arithmetic_shift line (used only when Shift function is selected),
- Subtract/Invert/Reverse line, which inverts the second operand in two-operand functions, and reverses the shift direction,
- Use_const line, which replaces second operand with 8-bit constant value sourced from instruction;
2 -- Carry_in, which has only 1 line and carries the value of carry_in, used in arithmetic operations;
3 -- Src1, 16-bit, the first operand;
4 -- Src2, 16-bit, the second operand;
5 -- Const, 8-bit , the substitute second operand, sourced from the instruction.
The ALU also has 2 outputs:
1 -- Result, 16-bit;
2 -- flags, 4 lines, the side effects, which are to be stored into status register and used in further ALU operations or in conditional jumps (branch operations):
This ALU is capable of 8 types of functions most of which have several variants, all operating on 16-bit data:
1: Byte Sign Extend -- simple function which replaces high 8 bits of the Src1 input with copies of bit 7 of this input;
2: Shift -- shifts word given in the Src1 input by 1 bit, has several flavours:
a) shift left (default),
b) shift right,
c) arithmetic shift right (preserves most significant bit),
d) rotate left through carry (msb outputs as carry_out, while carry_in goes into lsb),
e) rotate right through carry (lsb outputs as carry_out, while carry_in goes into msb);
3: Rotate -- rotates the word given in the Src1 input to the left by set amount of bits, has two flavours:
a) rotate using amount encoded into instruction,
b) rotate using amount given by Src2 input;
4: Invert: simply inverts all bits of the Src1 input;
5: ADD (more exactly, instruction which uses the adder), has several flavours:
a) Add value of Src2 to the value of Src1,
b) Add value of Src2 and Carry_in to the value of Src1,
c) Add Const value to the value of Src1,
d) Add Const value and Carry_in to the value of Src1,
e) Subtract value of Src2 from the value of Src1,
f) Subtract value of Src2 with borrow (Carry_in) from the value of Src1,
g) Subtract Const value from the value of Src1,
h) Subtract Const value with borrow (Carry_in) from the value of Src1,;
6: XOR, has 4 flavours:
a) Src1 XOR Src2,
b) Src1 XOR Const,
c) Src1 XOR ~Src2,
d) Src1 XOR ~Const;
7: OR, has 4 flavours:
a) Src1 OR Src2,
b) Src1 OR Const,
c) Src1 OR ~Src2,
d) Src1 OR ~Const;
8: AND, has 4 flavours:
a) Src1 AND Src2,
b) Src1 AND Const,
c) Src1 AND ~Src2,
d) Src1 AND ~Const.
I have measured signal delay of the whole circuit -- well, the worst case delay, or the delay of the longest path, to be exact.
This worst delay is incurred in following situation: when Src1 has value 0xFFFF, and Src2 has value 0x0000 (which is changed to 0x0001), and operation is addition. Measured output is the Zero flag. On the scheme below is the path, outlined by orange line:
The signal change needs to propagate through Incrementor, Negator, Fast Adder ( actually, all 4 four-bit sections of it), function selector and finally, zero detector. As per the model, this is 19 gate delays.
The propagation time was measured to be 76 to 80 nanoseconds, which is consistent with the model and spec gate delays of ~5 ns for HC logic chips. These timings give me hope that the full CPU could operate at up to 5 MHz clock frequency, if ALU operations will turn out to be the longest.
(A bit messy) process of measuring circuit delay:
View from the top:
All of the ALU parts before assembly: