Nits Processor V1
The goal is to build an 8-bit TTL based CPU and learn from the experience.
It is based on the SAP-1, Ben Eater, James Bates and James Sharman's design with many changes and tweeks.
Nits Processor V2
V2 is an improved version of the Nits Processor with expanded ALU, 16-bit address space, more registers, interrupt manager and rich instruction set.
This project was triggered by the incredible set of videos from Ben Eater and James Bates. Many thanks to both of them.
I do not pretend to be an expert in computer hardware. As a professional in the software side of things, I wanted to learn more and build myself a working TTL based CPU to improve and try out ideas.
Even if this build is based on both Ben Eater and James Bates design, I wanted to really understand every bit of the design and therefore I have made many changes along the way. I do not pretend they are better, they are mainly driven by the fact of testing things (different choice in chips, different approach, try an learn).
One thing that was really improved is the instruction set. The choice of having 8bit instructions and 256 bytes of memory really made it possible.
Once the Nits Processor was up and running, I started working on an improved V2.
Our current CPU has many limitations, including on the register level. This post will look at how to improve the registers that handle memory or output addresses.
First, let's look at the current register capabilities:
there are basically only two memory address registers : the Program Counter (PC) and the Memory Address Register MAR)
these registers are 8-bit registers, therefore they can only address 256 bytes of memory or IO
Only one provides the increment capability (the Program Counter) however it cannot directly address the memory and has to go through the MAR (memory address register)
None can decrement
There is no stack pointer
There is no index register
So lets try to design a single General Purpose Address Register (GPAR) that would provide:
16 bits in order to address 64k words of memory space
increment and decrement capability in order to serve also as PC (Program Counter) and SP (Stack Pointer) without the need of the ALU
Read and Write capability from the databus which is only 8-bit, in order to access both MSB (Most Significant Byte, the byte of higher value of the 16-bits) and LSB (Least Significant Byte, the byte of lower value of the 16-bit)
Write capability to the 16-bit address bus
Now what type of instruction these registers should be able to handle :
LD PC-MSB, D2 : meanind load the MSB of the PC address register with the content of register D2
LD D2, IX-LSB : meaning load D2 with the content of the LSB of the index pointer IX
LD IX-LSB, IY-MSB : meaning load the LSB part of ix with the content of the MCB part of IY
This shows that at any given time, it must be possible to output any of the content of any part of the register ***AND*** input any other part of any other register.
Possibly even within the same register:
LD IX-MSB, IX-LSB: meaning loar the the MSB part of IX with the LSB value of IX
We therefore need to separate the output action functions from the input action functions.
For output and reset:
Publish MSB to data bus: -msb-out, asynchrounous
Publish LSB to data bus : -lsb-out, asynchrounous
Publish value to Adress bus : -add-out, asynchrounous
Clear value : -clear, asynchrounous
For input and other:
load MSB from data bus : -msb-in, on clock rising edge
load LSB from data bus : -lsb-in, on clock rising edge
Increment value : -inc, on clock rising edge
decrement value : -dec, on clock rising edge
In the end we need:
3 bits to select the output register (value 0b000 serves as not selected)
2 bits to select the output/reset function (MSB, LSB, address bus or reset)
3 bits to select the input register (value 0b000 servers as not selected)
2 bits to select the input/other function (input MSB, input LSB, inc, dec)
With a total of 10 bits, we can perform any type of function on the General Purpose Address Registers.
Here is the naming convention for the register select bits:
None: code 0b000
Program Counter : PC, code 0b001
Stack Pointer : SP, code 0b010
Index Pointer 1 : IX, code 0b011
Index Pointer 2 : IY, code 0b100
Here is the convention for the output action (all are asynchronous):
Publish MSB to data bus: -msb-out, code 0b00
Publish LSB to data bus: -lsb-out, code 0b01
Publish value to Adress bus : -add-out, code 0b10
Clear value : -clear, code 0b11
Here is the convention for the input/other functions (all are on the clock rising edge):
load MSB from data bus: -msb-in, code 0b00
load LSB from data bus: -lsb-in, code 0b01
Increment value : -inc, code 0b10
decrement value : -dec, code 0b11
Lets now review what the 10 bits would be for the instructions listed above:
LD PC-MSB, D2
select output register NONE: 0b000 (D2 is not consider as we are focusing on the GPAR only)
The first version of the Nits Processor is now finished. It is turing complete with a very basic set of instructions, it is possible to upload a program and run it, it can display a result on the 7-segment display.
Here are a few photos.
The instruction decoder (3 EEPROMS for 17 signals and a set of gates to decode the flags):
The memory (256 bytes stored in Non Volatile Static RAM, with the Address Register, the memory value display).
The two flat cables come from the memory loader module (Arduino based). One is the Address bus and one is the data bus. They are used in PROG mode to upload the program to the memory.
It is now time to think about the next steps. What are the current limitations and how can it be improved.
Improvements can be of 3 sorts:
Improve the instrution set
Improve the architecture
Improve the build
Improve the instruction set
The instruction set is very limited and really needs to be expanded to provide usable capabilities. For instance:
Add shift functions (Shift, Shift Circular, Shift with carry)
Add compare functions (zero, equal, greater than)
Add Push and Pop capability (requires a dedicated stack pointer register)
Add Call and Return
Add bit management (test flags, store flags, bit operations)
Improve the architecture
With only 2 registers, the lack of stack pointer and only 256 bytes of memory for both data and program, the current archietcture can really be improved:
Expand address bus to 16 bit (hence 64 Kbytes of memory). However this requires many changes because now the address bus can be double the size of the databus and ALU creating a challenge when computing addresses
Expand the number of registers, at least to 4 General Purpuse Registers
Separate memory from Input-Output. This will provide double the addressing capability
Add a way to interact with the system, for instance with a proper serial interface
add a ROM with basic functions including initial setup and serial management
add a stack pointer and index registers for points in memory
add interupt management (is required for serial interface)
expand the ALU capabilities
Improve the buid
In its current form (build on breadboards), the CPU works well at 1 Mhz, however, when putting a 4 Mhz oscilator, it breaks. And this is normal considering the capacitance of the breadboard and how the cables are set up.
It would therefore be interresting to improve on the design with:
a PCB backplane to handle all the busses and the clocks with proper connectors (I'm investigating the 96pint DIN41612 connector)
PCB modules for very stable elements such as registers, clock
Keep the breadboards for test modules and modules that keep beeing improved (ALU, IO)
A simple software to compile the assembly code into binary
An arduino based RAM loader
Indeed since the beginning of the project I had to manually write the binary code and upload it using dip switches and this is very error prone and it takes forever.
My assembly compiler is very basic, written in PHP (just because it's the langage I'm more comfortable with). The input is an assembly file such as (custom format):
; Brute force find three consecutive integers whose sum is equal to 204var x1
const expected 204
LD A, 0
LD [x1], A
; compute the 3 values and total
LD A, [x1]
LD B, 1
LD [x1], A
comments (anything that follows the semicolon)
var definitions (only unsigned bytes)
And it produces a binary file and a human readable processed file very useful to debug both the software and the hardware:
VAR x1 at address 11111111VAR x2 at address 11111110VAR x3 at address 11111101VAR result at address 11111100CONST expected = 11001100
00000000 LD A, 0x0001001100000000000000010 LD [x1], A 0100100011111111
00000100 LD A, [x1] 001001011111111100000110 OUT A 0001100000000111 LD B, 0x10010111000000001
Each line of compiled code contains
the start address of the code
the assembly code
the binary code once compiled (one or two bytes depending on the operand)
Once the binary file is obtained, it was required to load the code into the memory. For this I used an Arduino nano connected to a 74HC595 in a way very close to Ben Eater's EEPROM programmer.
The Arduino will take over the Address and Data bus of the memory by activating the PROG mode, this basically disconects the memory from the Bus through 75HCT245 chips. Once the memory is isolated, the arduino will go through all the needed address using the 75HC595 (a shift register) and upload the data.
a dedicated signal is used to write the value on the bus to memory
Lines A0 to A2 of the arduino are set to digital
Yes it would be possible to connect directly all 8 lines of Data and 8 lines of Address to the arduino but I wanted to try out the shift register for the time I will have more lines.
Overall everything works and it a good way to finish 2019. I hope 2020 will bring new features such as:
PCB backplane with a 5A power supply
Stack Pointer register with associated Push and Pop
Now that that first version of my CPU is running, it is time to fix some issues. One of them is manual switches and debouncing.
I will not write one more article about debouncing as all this is very well detailed in the great article by Elliot Williams Debounce your noisy buttons.
In this first version of my CPU I ended up with the following switches
Bus Publish : This pushbutton published the value of a dip switch to the bus
Master Reset : This pushbuttons resets the computer
Memory manual Write : This pushbutton writes the data set on the data dip switch at the memory address set on the address dip switch [PRO mode only]
PROG/BUS selector : this is a two way selector used to program the memory (PROG mode) or to use the memory through the regular bus and Memory Address Register (BUS mode)
MANUAL / AUTO clock selector : this is a two way selector used to select between the automatic clock (slow or fast) or the manual pulse pushbutton
SLOW / FAST clock selector : this is a two way selector to swicth between the 555 base slow clock (between 0,5Hz and 300 Hz) and the fast cristal oscillator based clock (1 Mhz) [valid for Auto clock mode only]
MANUAL / AUTO uCode selector : this is a two way selector disabling the microcode decoder in order to use manual action signals (debugging purposes only)
3 are pushbuttons and 4 are two way selectors (slide buttons).
In is articule, Elliot explains how to debounce using an RC (Resistor/Capacitor) circuit and a schmitt trigger inverter. The inverter can be found in the 74HCT14 IC.
Here is an example of a complete debouncer. Note that the signal is inverted : when pressing the button, the signal ACTION_MANUAL_BUS goes low.
Here a short description of how it works :
When the switch is open, the capacitor is loaded through the 10k + 10k resistors and reaches VCC. The output signal is then 0V (inverted input)
When the swicth is pressed, the capacitor is unloaded through the 10k resistor, it will therefore take 1 ms to reach 1/3 VCC and trigger the change of state of the inverter
When the switch is released, the capacitor is loaded again, it will reach 2/3 VCC in 2ms and trigger the change of state of the inverter.
For the slide swicthes (type is break before make), we need to prevent any oscilation between the two states and prevent an unknown state. The best solution here is a simple SR latch.
An SR latch (in this cas an SR NOT latch as it is built using 2 NOR gates) can only be in 1 of 2 states.
When moving the switch from on position to the other, what may happen is the following:
bounce off the first state
stay undefined (in between the two states)
bounce on the second state
In such a situation, the SR latch will prevent any oscillation following the reasoning:
When the switch bounces off, the latch will stay in the same status (set or reset)
when the swicth is undefined, the latch will stay at the same status
the first time the witch touches the other positon the latch will toggle but even if the switch bounces it will stay in that second position.
So in the end, I have built a dedicated breadboard with all the swicthes, RC circuits and IC to debounce all and have perfectly clean manual signals.
The decoder section of the architecture diagram shows a 13-bit decoder register.:
What is the rationale behind the need for this register?
In order to properly set the action signals, there is a need for 13 bits to address de EEPROM used to implement the micro-code:
3 bits for the steps (count from 0 to 7 steps maximum, however most macro-instructions will only need 3 or 4 steps)
8 bits for the instructions (not all 256 capabilities will be used)
2 bits for a combination based on the flags register
However, when looking at the specs of the EEPROM (AT28C64) we can see that in the worst case, between the stabilization of the address inputs and the availability of proper stable output it can take up to 250ns. This means that if bits from the address are not stable the output might not be valid.
This is why the implementation of a register used to create a stable snapshot of the 13 bits will help providing stable action signals.
This is actually a perfect use case for the 74HCT273 register we already talked about. What we need is the capability to snapshot tu status of the 3 elements (register, steps, flags) at a given time and keep it stable until the next clock cycle:
Now the question is : when are values changed (step, instruction register, flags)? When in the overall instruction cycle are they stable enough that I can snapshot them ?
Well if we look at the clock cycle we can identify 3 moments:
- main clock (rising edge) : when the actions take place (add, load, etc)
- step clock : when the step is incremented
- uCode clock : when to hold the value of the 3 elements that constitute the micro-instruction register
So we need 3 clock rising edges per step cycle:
If the main clock signal is at 1Mhz, we get 1ms between general clock rising edges. So that is:
500ns between T0 and T1
500ns between T1 and T2
1000ns between T2 and the next T0 which is more than enough to stabilize the EEPROM values
After watching many Youtube videos on Breadboard TTL computers, I noticed that there is often a misundestanding regarding the type of Register TTL chips available and what are their best use cases.
The most common is the 74HCT173. This one brings everything you need from a register IC:
4 bit register
Common clock signal (rising edge)
Input enable signal (/E1 and /E2)
Output enable signal (/OE, /OE2)
3 state output
Asynchronous master reset
The only drawbacks are that this IC is only 4 bits and the pinout is really weird.
However, in many breadboards computer, designers use the 74HCT273 without really understanding the differences. The 273 has the following features:
8 bit register
Common clock signal (rising edge)
Asynchronous master reset
This IC has no Input Enable signal and no Output enable signal. What it means is that at EACH clock signal, the input is latched and that the output is always on. In other words : The output mimics the exact value of the input as it was on the previous rising edge of the clock.
No big deal regarding the output as we can use a 74HCT245 to buffer the bus.
The issue is with the input. Do we want to latch the input value at each rising edge of the clock ? Most of the time NO ! We want an input enable signal. Some would say, it's easy: just use an AND gate between the clock and the input enable and it will work. This is not true and should not be done. Here's why:
Example 1, the Input Enable is activated a bit before the clock rises. This is basically what we would expect. The input Enable signal activates the clock, the register latches the value on the bus at the time of the rising edge. The AND between the clock and the Input Signal looks like the clock when activated:
Example 2, the Input Enable signal is not really aligned with the clock, it misses the first rising edge and stays on for the second (note that the duration of the Input Enable is the same as above):
In that situation, the AND signal provides 2 rising edges. Therefore the register will latch (if fast enough) the information twice at points in time that are not expected. it will miss the first rising edge of the clock, latch on the rising edge of the Input Enable signal (unexpected) and then latch again on the second clock rising edge.
It is not recommended to apply gates on the clock signal to enable/disable clocks for the microinstructions. The 74HCT273 is not recommended in our use cases.
However, there is a nice IC that matches better the needs for typical registers: the 74HCT377. It provides the following features:
8 bit register
Common clock signal (rising edge)
Input Enable signal (/CLKEN)
It still lacks some nice features of the 173 (master reset, output enable) but it is quite convenient to get 8 bits with Input Enable.
Here is a good example of the usage of the 377 for our Instruction Register (it doesn't need to be cleared and the output is always on):
In a TTL type CPU, there is a need for two rising edges of the clock clearly separated:
One rising edge to trigger the decode of the microinstruction and set the action signals
One rising edge to trigger the action itself (update the register, etc)
Most TTL cpus use a single clock with an inverted signal to produce two rising edges per clock cycle (one from the clock signal, one from the inverted clock signal).
However, this might be playing dangerously as there are chances that one rising edge happens while the other clock signal is still active, creating situations that are very difficult to trace.
So one alternative would be to clearly separate the two signals with absolutely no overlaping time.
Here is an example (all screenshots with a 1 Mhz input clock signals from a quartz oscillator):
In such a situation, a rising edge can never happen at the same time than the other signal is active.
So how can you build such a signal ? Lets look at this simple circuit and analyze what is going on:
Lets input a clock on the JK flip-flop (pin 12). The JK is set with two active J and K signals, meaning that it will toggle its exits (Q at pin 3, inverse Q at pin 2) at each clock signal. Note that the 74LS107 triggers on the falling edge.
Below is the clock signal (pin 12 of the 74LS107, top signal) and the Q output (pin 3 of the 74LS107, bottom signal).
At each falling edge, the J-K flip flop toggles. It is therefore a frequency divider by 2. The output is half the frequency of the clock with a 50% duty cycle (ratio between high and low duration).
What happens then if we AND the clock and the Q output ?
Below is the graph of the clock signal (pin 12 of the 74LS107, top signal) and the output of the AND gate (pin 3 of the 74LS08, bottom signal):
We get the high part of the clock signal but only once out of two.
If at the same time, we AND the clock and the inverse Q we get the two reciprocating signals.
Below is the output of both AND gates (pin 3 of the 74LS08, top signal and pin 6 of the 74LS08, bottom signal) :
With such 2 clock signals, no chances that you will get a rising edge at the same time the other signal is high. The drawback however is that the clock is now half the frequency.
So we've been talking about the bus for some time. But when building a CPU we need two things from the bus:
see what is going on (visualize the value currently set on the bus)
manualy input a value if needed (for instance when trying to test a register, we need to set a known value on the bus to make sure the register can store it register_in and send it back register_out)
For this I have built a simple module that manually sets the value of the bus and displays the value of the bus.
However, there are a few contraints to take into account when building such a module:
You need to set a default value for the bus. if a line of the bus is kept hanging (not connected to anything), there are chances that you will get either random values or unexpected behavior. This is called terminating a bus. It is usually done by connecting each line of the bus to a known value (Ground or VCC - 0V or 5V) through a resistor (called pull down or pull up).
It is not a good idea to plug LEDs directlly to the bus as they can draw many milliamps from the bus and eventually overload the IC that is active on the bus. To prevent that, a buffer was added (as usual a 74HCT245).
We are using standard DIP switches to set a value, but as explained above, the value is actually not directly connected to the bus but with a 3-state buffer and an action signal
In the end this module is quite simple with only 2 74HCT245, a bunch of LEDs and resistor networks (it is easyer to use a resistor network than to manually plug 8 resistors).
[Note: do not forget the pull down resistors on the DIP switches otherwise the line would be seen as hanging when the switch is opened]
[note : the NAND gate is used as an inverter, juste because there was one available near the module]
Manual setting of the bus balue:
Displaying the Bus value and bus termination:
Preview of the Breadboard prototype (right of the photo):