2 days ago •
This entry is just an announcement of start of building the Register File component. It should serve as the statement of commitment.
The Register File will have eight 16-bit registers, and their selection logic.
Here is how I envision its boards arrangement: there would be just 3 big boards ( two 8-to-1 multiplexer boards, and one "RF backplane" board, which will have a handful of chips on it and 8 slots for small register boards), and 8 small boards each having two 74HC273 chips and LEDs indicating content and selection status. The individual small register boards are to be connected to "RF backplane" through pin header connectors.
I am not sure if I need single interface board similar to one I've done for ALU.
Overview schematic of Register File to be built:
04/03/2021 at 20:52 •
Today I finished the building of Arithmetic-Logic Unit for my CPU!
It took almost 3 months, 7 big perfboards and 124 logic chips (logic gates, multiplexers and a couple of bus drivers).
It can operate at up to 5 MHz, and draws up to 200 milliamps of current.
This ALU has five inputs:
1 -- microinstruction, which has 8 lines controlling the operation of the whole ALU:
- ALU_enable line, which enables ALU operation result to the data bus,
- 3 lines selecting one of the eight types of ALU functions,
- Carry_in_enable line (controlling several function flavours),
- Arithmetic_shift line (used only when Shift function is selected),
- Subtract/Invert/Reverse line, which inverts the second operand in two-operand functions, and reverses the shift direction,
- Use_const line, which replaces second operand with 8-bit constant value sourced from instruction;
2 -- Carry_in, which has only 1 line and carries the value of carry_in, used in arithmetic operations;
3 -- Src1, 16-bit, the first operand;
4 -- Src2, 16-bit, the second operand;
5 -- Const, 8-bit , the substitute second operand, sourced from the instruction.
The ALU also has 2 outputs:
1 -- Result, 16-bit;
2 -- flags, 4 lines, the side effects, which are to be stored into status register and used in further ALU operations or in conditional jumps (branch operations):
This ALU is capable of 8 types of functions most of which have several variants, all operating on 16-bit data:
1: Byte Sign Extend -- simple function which replaces high 8 bits of the Src1 input with copies of bit 7 of this input;
2: Shift -- shifts word given in the Src1 input by 1 bit, has several flavours:
a) shift left (default),
b) shift right,
c) arithmetic shift right (preserves most significant bit),
d) rotate left through carry (msb outputs as carry_out, while carry_in goes into lsb),
e) rotate right through carry (lsb outputs as carry_out, while carry_in goes into msb);
3: Rotate -- rotates the word given in the Src1 input to the left by set amount of bits, has two flavours:
a) rotate using amount encoded into instruction,
b) rotate using amount given by Src2 input;
4: Invert: simply inverts all bits of the Src1 input;
5: ADD (more exactly, instruction which uses the adder), has several flavours:
a) Add value of Src2 to the value of Src1,
b) Add value of Src2 and Carry_in to the value of Src1,
c) Add Const value to the value of Src1,
d) Add Const value and Carry_in to the value of Src1,
e) Subtract value of Src2 from the value of Src1,
f) Subtract value of Src2 with borrow (Carry_in) from the value of Src1,
g) Subtract Const value from the value of Src1,
h) Subtract Const value with borrow (Carry_in) from the value of Src1,;
6: XOR, has 4 flavours:
a) Src1 XOR Src2,
b) Src1 XOR Const,
c) Src1 XOR ~Src2,
d) Src1 XOR ~Const;
7: OR, has 4 flavours:
a) Src1 OR Src2,
b) Src1 OR Const,
c) Src1 OR ~Src2,
d) Src1 OR ~Const;
8: AND, has 4 flavours:
a) Src1 AND Src2,
b) Src1 AND Const,
c) Src1 AND ~Src2,
d) Src1 AND ~Const.
I have measured signal delay of the whole circuit -- well, the worst case delay, or the delay of the longest path, to be exact.
This worst delay is incurred in following situation: when Src1 has value 0xFFFF, and Src2 has value 0x0000 (which is changed to 0x0001), and operation is addition. Measured output is the Zero flag. On the scheme below is the path, outlined by orange line:
The signal change needs to propagate through Incrementor, Negator, Fast Adder ( actually, all 4 four-bit sections of it), function selector and finally, zero detector. As per the model, this is 19 gate delays.
The propagation time was measured to be 76 to 80 nanoseconds, which is consistent with the model and spec gate delays of ~5 ns for HC logic chips. These timings give me hope that the full CPU could operate at up to 5 MHz clock frequency, if ALU operations will turn out to be the longest.
(A bit messy) process of measuring circuit delay:
View from the top:
All of the ALU parts before assembly:
03/29/2021 at 18:17 •
When completing function selector board I got the idea to make yet one more board for ALU. This last board should provide a single interface through which the ALU would be connected to the rest of the CPU. Also I hoped that magnitude determination circuit would not be too complex and be able to fit on this board. Turned out this wasn't main concern -- I think, it could be fit on that board with place to spare; the main drawback was long chain of OR gates needed which would make signal propagation big issue. So I opted to have a small function of byte sign extend (BSE) which is just copies bit 7 to all the higher bits.
Instead of adding a complex ALU function to the interface board, I opted to make it a display for the ALU: it has LED banks to show inputs and outputs, as well as individual LEDs to show flags and to indicate which instruction ALU is doing at the moment.
Also the board has bus drivers on result output which make use of ALU enable signal -- if it is 0, the ALU output is floating.
Below is board's photo with captions:
Next step is integrating all these boards into functional ALU, and testing it out.
03/10/2021 at 08:17 •
I finally came up with the circuit that converts 16-bit number into its size (i.e. finding how many bits is the number without leading zeroes).
It has 3 stages:
- first -- make all bits after the most significant "one" bit to be also "one" bits ( like 0001 0110 => 0001 1111 ), using OR gates;
- second -- find an edge with XOR gates -- this turns the 0001 1111 number into 0001 0000;
- third -- encode result from second stage into the final magnitude value ( 0001 0000 => 0000 0101, i.e. there are 5 bits in number );
Here is the schematic:
One significant drawback here is the long ripple through OR gates, which makes all action up to 18 gate delays long, which is likely one of the longest paths in ALU circuit. This is subject for further investigation right now. Maybe there is a way to make this go faster with fancier wiring.
As for the time being, I am thinking and trying to evaluate, if this circuit is needed at all. It can be most useful in division routine, and maybe also in floating point routines, but I am not seeing this as frequently used feature. All it does can be done with other ALU parts, in several operations.
03/07/2021 at 15:55 •
The function selector board is completed.
This is 16-bit 8-to-1 multiplexer, using 16 74HC151 chips, and quite a lot of wire.
Here it is:
Additional ALU refinements
1. Adjusting ROT instruction:
For now, the ROT instruction can only be used with rotation value hardcoded into instruction itself. Very recently it occurred to me that with a small change (addition of one 4bit 2-to-1 multiplexer) it could be made so that it also can take the rotation value from Src2 register.
The ROT instruction will need addition of one more 74HC157 chip to be modified this way.
2. Replacing ZERO instruction by something more useful:
There are several ways to put zero value into register: could be subtraction from itself, or XOR with itself, or OR with zero, and maybe some other ways. So, having special instruction ZERO feels unnecessary. Therefore, I decided to incorporate some additional functions into the ALU, the Byte Sign Extend (BSE) and Magnitude (Mag).
BSE would just copy bit 7 into all higher bits, making byte values signed.
Mag should give the size of a number in bits, i.e. if there is number 0b 0000 0110 1100 1100 in source register,
the result would be 0b 0000 0000 0000 1011 (the number is eleven bits).
This will probably take one more board, and together with all boards already soldered, fully functional ALU can be assembled.
ALU scheme with updates:
1: Rotation amount source selector is added to barrel rotator board:
2: Thoughts on scaling back additional functionality:
As I am trying to come up with the circuit that would output number magnitude, it starts to seem that this is a non-trivial task, and such circuit most probably won't be implemented.
So, this would leave only BSE function, which is implemented only by wiring.
03/01/2021 at 08:21 •
Since the last update I've soldered , assembled and tested two new boards -- one containing multiple functions, and other - barrel rotator, which performs arbitrary bit rotations of 16-bit words.
Miscellaneous components board
Here is overall scheme of ALU, with components on the Misc board are in shaded area:
The components take from 3 (zero detector) to 7 (shifter) chips, so all of them were possible to place on single board.
Here is the board itself:
Barrel Rotator board
The barrel rotator performs word rotations to the left by an amount ranging from 0 to 15 bits, in one clock cycle. This module is useful for operations like swapping bytes in the word, or for speed-up of operations involving floating point numbers.
It is constructed as 4 levels of 16-bit 2-to-1 multiplexers, each level multiplexing increasingly disparate bits, here is the schematic:
and the actual board looks like this:
For ALU completion, only one board is left to be completed -- the 16-bit 8-to-1 multiplexer, which will select one of the outputs from previously created boards.
After that I'll start working on the Register File, and beginnings of the control module.
02/18/2021 at 13:37 •
This is just an overview of the core number crunching component of the processor:
There are two parts to it - ALU and Register File.
ALU is described in previous log.
The register file combines eight 16-bit registers, 3-to-8 decoder and two 8-to-1 16-bit multiplexers. One of the registers can be selected to be written to, and at the same time, output from two others can be channelled to respective ALU inputs.
Register File acts as sort of very small memory with 8 addressable words. Together with ALU it forms what I call Main Data Path -- the computing core of the processor, which by itself is quite capable. By feeding it the right sequence of commands it is possible to do multiplications and divisions, and probably some other functions not provided by ALU right away.
Here is screenshot of it in current implementation (together with ALU instruction decoder):
02/15/2021 at 09:06 •
Following are descriptions of design changes:
1. Some reshuffling of ALU schematic -- mainly for more clarity (compare to schematic in one of the early logs):
Most of the glue logic (individual gates controlling such things as carry flipping at subtraction) were moved to the functional blocks. The blocks themselves are redone with 74 family chip outlines to serve as the reference while building hardware -- as there were some unused pins/gates on those chips, the were repurposed for those glue logic functions.
There are 2 main 16-bit input busses (Src1 and Src2), one 8-bit input (Const), and one 16-bit output bus.
For some operations the 8-bit constant is switched in instead of "Src2", using the Incrementor block;
Next, the signal from "Src2" | "Const" goes through Negator block, which inverts it, alongside with "Carry_in" to facilitate subtraction.
The signals then go in parallel through 4 blocks which do different operations:
- adder takes "Src1" and (+/-)"Src2"|"Const", and outputs 16-bit sum;
- logic operations unit uses the same values as adder, and outputs results of its own operations;
- shifter works on "Src1" input and does simple and arithmetic left/right shifts and rotations through carry;
- barrel rotator also works on "Src1" input, and does rotation to the left (0 to 15 bits)
2. Slight change to command encoding - mending a couple of irregularities:
Previously, there were two ways of doing ALU functions: the one where one of the two source registers was also a destination, and the other where destination could be the third specified register. Now the first version is only used with constant value, otherwise three arguments are specified in the instruction (two sources and destination for two-operand operations).
There also was compare command which was interfering with three argument ops, where the registers 0 and 1 couldn't be used as destination. This interference is now overcome with different encoding of the compare instruction.
3. More top-down approach for overall CPU design:
When starting designing the CPU, I had no clear idea of what the addressing scheme would be, and how all its workings will be organised. The clearest ideas were that this should be 16-bit machine (16-bit data bus, and 16-bit instructions). That was dictated primarily by my assessment of possible complexities: 8-bit will have too complex addressing scheme and instruction encoding -- it most likely would be microcoded. On the other hand, 32-bit would be too much in terms of the sheer number of components needed at the level I wanted to build it (simplest logic gates). So the 16-bit seemed "the golden middle".
I wanted the machine to have a register file, a number of identical registers which are addressed in instruction, and to have an ALU capable of a adding, subtracting, logical operations and shifts, and also to have an ability to increment/decrement a value by a set number, thus a set of commands with 8-bit constant values. Overall this was constraining me to 8 registers in register file. This is also convenient, as for addressing 8 values, only one 3to8 decoder is needed, which a single IC.
So, the ALU and Register File were the first parts which I had fairly good idea of what I want them to be. Not the other parts. So I started with building the ALU and then Register file in simulator, and then I was adding all other parts in the order I found them necessary at the time. This led to quite a complicated mess, which incrementally grew in its ability and complexity... and in difficulty of understanding of how it all works.
That is why I am restarting almost from scratch (well, many parts are already done, they just need some tidying up), and having more holistic understanding of how I want this CPU to work I will recreate the simulation in a more clear and understandable way.
Following is the high-level scheme of CPU parts:
The scheme summarises the overall CPU design in just 4 main blocks. On a high level it is similar to what I've done up to date, just in a clearer way. The differences are in particulars, mostly in addressing block -- there the 24-bit adder is now present, and 4th 24-bit register is added (Frame Pointer). The presence of adder removes the need to make PC and SP from presettable counters, thus making design more regular.
- Main data path:
--- eight 16-bit general purpose registers in register file
--- main ALU (16-bit)
--- four 24-bit memory pointer registers ( PC, BP, SP, FP )
--- its own dedicated secondary/address ALU (24-bit)
--- Instruction Register
--- instruction decode circuitry
- Memory ( + memory-mapped input/output )
- Miscellaneous small bits:
--- four 8-bit special registers, (SR, Hi8, OP, IP), placed around the CPU.
--- Boot loading and DMA circuits.
--- Interrupt handling circuitry
--- Some other things I do not know I need yet.
02/13/2021 at 20:56 •
Five small boards are completed:
1: Source1 + Source2 16-bit input adaptor for double-board fast adder
2: Output adaptor (16-bit) for double-board fast adder
3,4: Two 16-bit input boards
5: 16-bit output board (leds).
The boards were used to test 16-bit Adder unit and 16-bit Logic Operations unit. While the latter one was tested, small bug was found - there was a short in one place - and fixed.
The output adaptor ties two 8-bit adder boards via routing Carry_out signal from one board to Carry_in to other, so that they act as one 16-bit adder. Also the output and input adaptors serve some mechanical support role making the construction stiffer.
The input boards are just switch banks connected to IDC connector, while output board have 16 red LEDs connected to the same connector.
Next step will be a board combining function of the Incrementor and Negator units, and some glue logic.
Incrementor is just 16-bit 2:1 multiplexer which switches between 16-bit value from Source 2 and 8-bit constant that would be coming from the instruction word.
As for Negator - this circuit will consist of 5 74HC86 (4 x XOR) ICs, four of which are used to conditionally invert 16-bit value, and the fifth one is for the conditional invert of C_in and C_out signals.
And probably this same board could accommodate the shifter -- I am not sure, will investigate this.
Input adaptor tied to adder boards:
Input and output boards:
Here are all these boards in action, while testing function of the adder: