-
Designing and Implementing the MicroCode Sequencer
05/06/2025 at 21:21 • 0 commentsIt’s time to translate my intent for the MicroCode Sequencer into design decisions.
Narratively, when the CPU executes an instruction:
- The Instruction Controller presents the MC Program Counter where execution should start (MC_PC_START) and sets the MC_START bit
- The MC Sequencer outputs the control signals from the MC ROM at the address specified by the MC_PC to the ALU and other parts of the CPU
- At every clock cycle when an instruction is executing*, the MC Sequencer increments the MC_PC or loads a new value entirely in the event of a Jump or Branch, loop back to 3
- At the end of an instruction, the MC ROM sets a signal (MC_DONE) that prevents the MC_PC from loading a new value and instead stops execution and signals the Instruction Controller that the MicroCode Sequencer is done.
*Note (from above): whenever the MicroCode Sequencer requests a memory read or write from the CPU bus controller, it should wait any number of clock cycles until the memory operation completes. So, in the case of memory-wait condition, the MC PC won’t actually advance on every clock cycle.
The inputs to the MicroCode Sequencer will be:
- MC_START - Instruction Controller calls the MicroSequencer to start
- MC_PC_START - Starting MicroSequencer Program Counter, supplied by Instruction Controller
- BUS_CONTROLLER_DONE - Memory has been read and data is available, or data has been written
- COND - Branch condition bits extracted from the Instruction Word by the Instruction Controller
- BRA_OFF - Branch condition relative offset extracted from the Instruction Word by the Instruction Controller
The outputs of the MicroCode Sequencer are:
- To the ALU
- A bunch of signals that I now realize I can’t define until I design the ALU
- Other
- BUS_CONTROLLER_READ_REQUEST - to read instructions and data
- BUS_CONTROLLER_WRITE_REQUEST - to write data to RAM
- MC_DONE - Signal to Instruction Controller that the MicroSequencer is done executing the current instruction
- MC_PC_SEL - Select which data to load into the MC_ProgramCounter on the next clock cycle:
- 00 = 0, used for RESET
- 01 = BRANCH_REGISTER
- 10 = INSTRUCTION CONTROLLER START ADDR
- 11 = NEXT = MC_PC+1
Let’s address the implementation itself.
We’ll use a simple S-R flip-flop to track whether the MC_Seq is busy or idle.
When the MC_Seq is done with an instruction, we’ll use an S-R flip flop to generate a one-clock-cycle DONE pulse to the Instruction controller.
We’ll use a register to capture the MC_START_ADDRESS on every clock cycle when the MC_Seq is idle, so that when we receive MC_START, the start address is already captured.
We’ll need some logic to examine MC_Seq control signals like ready/busy, branch, jump, done, and RESET to determine which value to load into the MC_PC. Conceptually:
- If RESET, load the value 0x0000
- If the MCSeq is not busy, load from MC_PC_START
- If the MCSeq is busy and done, don’t care
- If the MCSeq is busy, not doing a branch or jump, and not done, load from MC_PC_INC
- If the MCSeq is busy and doing a jump, load from MC_ROM_LOW_BITS
- If the MCSeq is busy and doing a branch, and the CCR_Condition matches, load from MC_ROM_LOW_BITS
- If the MCSeq is busy, and doing a branch, and the CCR_Condition does not match, load from MC_PC_INC
I’ll use a 4-1 multiplexer for the routing of the MC_PC address.
Select mapping for the multiplexer is:
Select
Input
00
0x00
01
MC_ROM output low-bits
10
MC_START_ADDR
11
MC_PC + 1
To implement the logic of selecting which source to use for the next MC_PC, I’ll use this Karnaugh map:
Jump, Branch, CCR Match
000
001
011
010
110
111
101
100
Reset, Busy, Done
000
10
10
10
10
10
10
10
10
001
d/c
d/c
d/c
d/c
d/c
d/c
d/c
d/c
011
d/c
d/c
d/c
d/c
d/c
d/c
d/c
d/c
010
11
11
01
11
n/a
n/a
01
01
110
00
00
00
00
00
00
00
00
111
00
00
00
00
00
00
00
00
101
00
00
00
00
00
00
00
00
100
00
00
00
00
00
00
00
00
n/a - not applicable, condition should not arise
d/c - don’t care
Select_Bit_1 = AND(Not_RESET, Not_Busy) OR AND(Not RESET, Not_Jump, Not_Branch) OR AND(Not_RESET, Branch, Not_Match)
Select_Bit_0 = AND(Not_RESET, Busy, Not_Done)
I’ll route the output of the MC_PC latch to an adder to prepare MC_PC + 1.
I’ll route the output of the MC_PC to the MicroCode ROM, which will be quite wide (I’m expecting 20-40 bits wide).
Except during MicroCode branch operations, I’ll want to route the MC ROM output to the ALU (and other CPU sections), but will want to suppress that output for some of the bits during Jump and Branch instructions, so I’ll use some tri-state output buffers to control when the whole output of the MC ROM is passed to the ALU, etc.
During MicroCode Branch operations, I’ll want to route the condition bits to the ALU CCR section to perform the comparison, so I’ll use some tri-state output buffers to control when the condition select bits are routed to the CCR condition test logic.
Here’s the resulting circuit diagram. If you look in detail, you’ll see that it is currently set up for a 4-bit MC_ROM address space (OK for simulation, not realistic) and a 16-bit MC_ROM output (also OK for simulation, not realistic).
I did a simulation of the circuit and it seems to work OK. Again, famous last words…
Next, I’ll take on the design of the ALU. That will help me define the ALU control signals I’ll need.
-
Planning the MicroCode Sequencer
05/05/2025 at 17:06 • 0 commentsThe purpose of the MicroCode Sequencer is to fetch MicroCode instructions in order and direct them to the ALU and other CPU components in order to manipulate data that accomplishes the intended machine instructions. The MicroCode Sequencer needs to be able to jump and branch within the MicroCode, either to effectuate branch instructions or to loop (potentially in the case of multiply or divide instructions). The MicroSequencer needs to be able to pause its operation while it waits for the Bus Controller to fetch memory contents (and maybe write to memory, though I think that could potentially happen asynchronously, with the MicroSequencer carrying on before the Bus Controller confirms a write is complete). Finally, the MicroSequencer output will control the elements of the ALU and CPU without further intermediation, so the control outputs of the MicroSequencer will be enabling and disabling buffers, adders, and multiplexers directly. As a result, we expect the MicroSequencer to have a very wide data output word, potentially dozens and as many as 50 bits wide.
Narratively, I need the MicroSequencer to:
- Receive a starting address from the Instruction Controller
- Receive a request to start from the Instruction Controller
- Fetch the requested MicroCode Instruction
- Direct the MC Instruction outputs to the ALU or other elements of the CPU
- When necessary, evaluate a branch condition and proceed to a non-sequential next MicroCode instruction
- Otherwise, fetch the next sequential MicroCode Instruction, and repeat from 4
- Detect the last MicroCode instruction in a machine instruction and halt operation
- Signal the Instruction Controller that the current machine instruction has been completed
As a reminder, the Instruction Controller will call “subroutines” within the MicroSequencer to fetch and store Machine Instruction operands according to the addressing mode specified in the operand fields within an Instruction Word. These subroutines will not be accessible directly to the Machine Instruction programmer, but only indirectly by specifying a particular addressing mode in an instruction. Thus, when the MicroController is called to perform a Machine Instruction (like ADD or SHIFT LEFT), all necessary operands will already have been latched into the ALU operand latches. LIkewise, storage of the result of a Machine Instruction happens invisibly to the MicroCode of that Machine Instruction.
For the most part, I expect the code for a Machine Instruction will be relatively simple. For example, for ADC (add with carry), in the first clock cycle, the MicroController would simultaneously output control signals to:
- Enable gating the Condition Code Register’s Carry bit to the Adder Carry-In
- Select ALU operand 2 to be routed to the Adder directly
- Enable gating of the Adder’s carry-out bit to the latch for the Condition Code Register’s Carry bit
- Select the Adder output should be routed to the ALU result latch
- Enable the ALU result register to latch its input
- Increment the MicroCode Program Counter
In the second clock cycle, the MicroController would simultaneously output control signals to:
- Latch the ALU Adder carry bit into the CCR
- Latch the ALU result register
- Enable output of the ALU result register to the Internal Data Bus
- Signal the Instruction Controller that the MicroController is done
Branching and jumping seem to present some complexity. When we branch, we load a new value into the MicroCode Program Counter register, potentially based on the value of the CCR. We need to represent the destination address somewhere, and that has to be in a MicroCode instruction. The destination address field has to be the full width of the MicroCode ROM address word. The MicroCode ROM data word will already be quite wide. Do I really want to dedicate another 8-12 bits to hold branch and jump destination addresses?
One solution I see is to allow the bottom 8-12 bits of the MC ROM to hold the destination address when we are branching or jumping, and to function in their usual role as ALU/CPU control signals for instructions other than a branch or jump.
I propose that when we execute a MicroCode branch:
- The upper bits continue to function as normal, performing functions required even when branching
- A branch bit indicates a branch instruction
- The branch bit also serves to suppress output of the MicroController to the ALU and other CPU sections so that the next lower bits that usually control those other sections can instead perform their special branch-mode functions
- The next 3 bits encode the branch condition
- The bottom N bits contain the MC_PC address to jump to if the condition is satisfied
- If the branch condition is not met, the MC_PC is loaded with MC_PC_INC
In my notes for “Instruction Set and Addressing Modes”, I had proposed these test conditions and instruction bit-field for the test condition:
Condition Bits
Condition
000
Always
001
Carry Set
010
Carry Clear
011
Zero
100
Non-Zero
101
Negative
110
Overflow
111
Half-Carry
As I thought ahead to implementing multi-programming and multi-tasking, I realied it would be helpful to have a bit to indicate if the CPU is in the middle of processing an interrupt and when it is in “Supervisor” mode. That means two more bits in the CCR and two more conditions to test for. To make these new tests fit in the 8 choices provided by my 3-bit condition field in the instruction, I need to trim.
My solution is to test only for one version of each status bit, “set.” If my program logic wants to test for the opposite condition, I’ll just have to reverse my if and else clauses. Also, I have to give up the “Always” condition. I’ll need to implement a different MicroCode for unconditional branches, jumps, and returns.
These are the revised branch condition codes:
Condition Bits
Condition
000
Parity Set
001
Carry Set
010
Overflow Set
011
Zero Set
100
Negative Set
101
Interrupt Set
110
Half-Carry Set
111
Supervisor Set
-
Implementing the Instruction Controller
04/18/2025 at 21:20 • 0 commentsApril 18, 2025
Before I dig in:
As I was doing the circuit design, I made some observations that call for more revision of past decisions. Sorry, that’s just the way design goes. You lay something out, later reality points out an impracticality, you revise.
Also, I kind of realized that the Instruction Controller was over-reaching into the domain of the MicroCode lookup and MicroCode Sequencer. I’m going to reduce the boundary of the Instruction Controller so that it sends the MicroCode Start address and MicroController Start signal to a separate block. Here’s the reduced block diagram:
Instruction Group Mapping - ReDo
As I was designing the Instruction Group Decoder circuit, I realized there are four cases when the low 4-bits of the Instruction Decoder Address have to be driven by different bit-fields of the IW:
- 2-operand
- 1-operand
- 0-operand
- Special-group instructions.
This clearly calls for a 4-1 multiplexer to map different parts of the IW to the low bits of IDA. Unfortunately, I assigned the Instruction Group IDs for these four cases as 001, 010, 011, and 100.
I could build some logic to map these to 00, 01, 10, and 11. Or, I can just redefine IDA6-IDA4 so that just two of the three bits are needed to drive a multiplexer directly. To do that, I need to re-map the Instruction Group for “Fetch Operand” to a different Instruction Group ID. So, I have to change some of the tables in previous posts. Sorry if this generates confusion.
Here’s the result:
Lookup Type
IDA6-IDA4
IDA3-IDA0
Special 0-Operand
000
(Logic Mapping)
2-Operand Instruction
001
IR15, IR14, IR13, IR12
1-Operand Instruction
010
IR9, IR8, IR7, IR6
0-Operand Instruction
011
IR3, IR2, IR1, IR0
Operand Fetch
100
0, MLB2, MLB1, MLB0
Result Save
101
0, MLB2, MLB1, MLB0
Unused
110
n/a
Reserved
111
(Logic Mapping)
Looking more closely at how the IW influences ID6-ID4:
Instruction Word
Instruction
ID6-ID4
Instruction / Instruction Group
1111 0 dddddddd ccc
BRA
000
Special
1111 100000 mmmmmm
to
1111 100000 mmmmmm
IM
000
Special
1111 100001 mmmmmm
to
1111 100001 mmmmmm
SWI
000
Special
1111 100010 000ccc
JMP
000
Special
1111 100011 000ccc
JSR
000
Special
0000 aaaaaa bbbbbb
to
1110 aaaaaa bbbbbb
LD, ADC, ADD, AND, CMP, OR, SUB, SBC, XOR
001
2-operand instructions
1111 110000 bbbbbb
to
1111 110111 bbbbbb
NOT, NEG, INC, DEC, ROT*, SHIFT*
010
1-operand instructions
1111 111111 00 0000
to
1111 111111 00 1111
RTS, SWI, RTI, NOP, STC, CLC, etc.
011
0-operand instructions
Using a Karnaugh map, I derive these equations to produce ID6-ID4:
ID6 = 0
ID5 = ((IW15-IW12) and IW11 and IW10 and NOT IW9) OR
(AND(IW15-IW12) and IW11 and IW10 and IW9 and IW8 and IW7 and IW6 and NOT IW5)
IW4 = NOT(AND(IW15-IW12)) OR
AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9, IW8, IW7, IW6, NOT IW5)
If I reassign the mapping for OP1 and OP0 as follows, I can even recycle the ID5 and ID4 bits to drive OP1 and OP0, saving some logic:
Instruction Group
OP1, OP0
Special
00
2-operand
01
1-operand
10
0-operand
11
OP1 = ID5
OP0 = ID4
Finally, I realized I made a minor mistake with mapping the ID3-0 bits for Special instructions. I left the bits undefined for BRA instruction. Here’s the fixed table.
Instruction Word
Instruction
ID3-ID0
1111 0 dddddddd ccc
BRA
0100
1111 100000 mmmmmm
to
1111 100000 mmmmmm
IM
0000
1111 100001 mmmmmm
to
1111 100001 mmmmmm
SWI
0001
1111 100010 000ccc
JMP
0010
1111 100011 000ccc
JSR
0011
Here’s the logic for Special Instructions when ID6-4 = 000:
ID3 = 0
ID2 = AND(IW15, IW14, IW13, IW12, not IW11)
ID1 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW7
ID0 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW6
Back to the Instruction Controller Design
I used Logisim-evolution to design and simulate the circuitry, not based on extensive research but because it was free and came up near the top of my Google search.
To keep the circuit diagram from becoming too complex, I used Logisim’s “Subcircuits” ability to design each of the functional blocks in my block diagram in a separate Logisim tab and then glued them together with data busses.
I chose to design the master controller as a Moore state machine, meaning the outputs are determined only by the current state. I did this because the states are used primarily to track what should happen next, rather than to condition the response of the system to current inputs. The current state ID plus the inputs determine what the next state ID will be, though. On every clock transition, the next state is latched from the state lookup table output to the data latch and determines the new current-state.
Let’s work through each of the major sections:
- Instruction Group Decoder
- Addressing Mode Decoder
- Register Address Decoder
- Operand Selector
- Register Address Selector
- Instruction Decoder Address Selector
- REG_INC / REG_INC_LATCH
Then I’ll weave them all together.
Operand Count Decoder
I realized I could just use the IDA bits 5 and 4 for this, so I no longer need a separate circuit.
Instruction Group Decoder
At the top, we use glue logic to implement the decoding of the IW and generation of IDA_6, IDA_5, and IDA_4. Note that IDA_5 and IDA_4 also serve as OP_1 and OP_0. Note that a 12-input AND gate would be required in the top section, something that isn’t available as a discrete component. I chose to cascade a 4-input AND of the top 4-bits of the IW, since that is an intermediate term that can be reused in several other parts of the circuit. This structure adds a gate delay to the production of the IDA output, but seems worth it.
In the middle section, we extract the relevant fields of the IW for use in the Special instructions and route those bits to the low-bit multiplexers.
In the lower section, we see multiplexers used to route sub-fields of the IW to the lower-four bits of the IDA. For Special instructions, the data come from the mapping function in the middle of the circuit diagram. Note that bits IDA_5 and IDA_4 are used to drive the select inputs of the multiplexers.
Addressing Mode Decoder
This is a straightforward NOT-AND-OR implementation of the logic equations specified earlier.
Register Address Decoder
This is a straightforward NOT-AND-OR implementation of the logic equations specified earlier.
Operand Selector
Here we route bits 5-0 and 11-6 from the Instruction Word to a 2-1 multiplexer driven by the OP_SEL signal. This circuit selects which operand within the IW should be routed to the Addressing Mode decoder and the Register Address Decoder.
Register Address Selector
The Register Address Selector either passes the Register Address input to the output when RSA_SRC_SEL signal is low, or forces REG_ADDR_OUT to 0x1f when RSA_SRC_SEL is high. This is used to forcibly select the PC when the IC is incrementing the PC during the instruction cycle.
Instruction Decoder Address Selector
This circuit simply selects whether the Instruction Group Decoder or the Addressing Mode Decoder output should be routed to the Instruction Decoder lookup ROM, according to control by the IN_OP_SEL (Instruction / Operand Select) input.
Register Incrementer and Latch
This circuit adds 0x01 to the current REG_INC_IN value and captures the value with a latch, then controls the output with an Output Enable signal. In practice both the REG_INC_IN and REG_INT_OUT words will be connected together at the Internal Data Bus.
Bringing it All Together
The diagram below shows all of the foregoing blocks, plus the Instruction Decoder ROM and a Register RAM. The IC_STATE_MACHINE is a straightforward Moore state machine that could be implemented with logic (4-bits input + 12-bits output + 4 bits state) and a 4-bit register to store the current state.
It all seems to work in simulation… (I know, I know, famous last words.)
To really test everything, I’ll need the MicroSequencer circuit so I can tie the two together. Even with the MicroSequencer, I’ll have to manually simulate the bus controller to provide instruction and operand reads and writes, but those are expected to be slow for now and not timing dependent.
So, my next step is to design the MicroCode Sequencer!
-
Planning the Instruction Controller
04/01/2025 at 04:00 • 0 commentsToday, I’m going to focus on the CPU Instruction Controller. The Instruction Controller will orchestrate its subsystems to fetch an instruction, get any needed operands, perform the instruction itself, potentially store a result, and queue up the next instruction.
Overview
My initial vision is a sequencer that will:
- Load the PC to the Address Latch
- Ask the Bus Controller to fetch an instruction and store it in the Instruction Register
- Have the Addressing Mode Decoder examine the Instruction Register to see if we need an operand, or skip to 7.
- Load the MicroCode PC with the start address of the routine to fetch the first operand according to the Addressing Mode. The MicroCode Sequencer will execute until the first operand is stored in the appropriate ALU latch.
- Have the Addressing Mode Decoder examine the Instruction Register again to see if we need an additional operand, or skip to 7, leaving any computed or fetched indirect address in the Address Scratch register.
- Load the MicroCode PC with the start address of the routine to fetch the second operand according to the Addressing Mode. The MicroCode Sequencer will execute until the second operand is stored in the appropriate ALU latch. This step will potentially overwrite the computed or fetched indirect address in the Address Scratch register with a new address, which will serve as the destination of the result, if appropriate.
- Have the Addressing Mode Decoder examine the Instruction Register again and route the appropriate bits to the Instruction Decoder.
- The Instruction Decoder looks up the start address of the MicroCode routine for the subject instruction.
- The Controller turns control over to the MicroCode Sequencer, which executes the necessary instructions to perform the actual data manipulation. The MicroCode will store the result of any computation, possibly making use of the address in the Address Scratch register or the Register Address latch, before returning.
- When the MicroCode signals it is done, the Controller will then return to step 1 above.
Since the MicroCode for the actual instruction may take several cycles, I thought I might try to pipeline operations a little and start the instruction fetch and Addressing Mode Decoder steps as soon as the Controller turns control over to the MicroCode Sequencer for the instruction itself. Whoever finishes first would have to wait.
Hmm. It sounds nice, but as I type this I see a potential problem. If the result of the current instruction alters one of the operands of the subsequent instruction, that would be a problem. In theory, I could check to see if the addresses are the same and then pause the pre-fetch logic if they are, or even detect if the operand >changed<. Or it could be more trouble than it’s worth. I’m voting for the latter.
Maybe the Controller could at least pre-fetch the next instruction. Ah, but what if there’s a branch? I could pre-fetch the instruction as long as the current instruction isn’t a branch, jump, RTS, RTI, etc. OK, KISS for now.
Allright, in any case, I see the Instruction Sequencer (IS) using a shift register to sequence the steps as:
- Route PC to AR, fetch instruction to IR, evaluate IR with AMD logic on the fly, preload the IS if needed to jump ahead to 3 or 4
- Load Operand 2 (src) to ALU-Left, evaluate IR with AMD logic on the fly
- Load Operand 1 (dest / src+dest) to ALU-RIGHT
- Start here for 1-operand instructions
- Lookup Instruction MC Start Address
- Start here for 0-operand instructions
- Start MC for instruction processing
- Instruction processing is responsible for writing result of any instruction to dest
- Instruction processing is responsible for incrementing PC
- When instruction MC is done, signal IS to jump back to 1
Addressing Mode Decoder
The Addressing Mode Decoder (AMD) unit needs to do a few computations:
- Examine the Instruction Register and determine the number of operands, this decides when to turn control over to the instruction MicroCode and which ALU input latch each operand is routed to
- Examine an operand field (6-bits) and determine the MicroCode start address for the routine to handle fetching the operand
- Examine an operand field (6-bits) and determine how to route some of the operand field bits to the Register Address bus
- For certain instructions, IM, SWI, BRA, JMP, JSR, latch or gate some of the operand-field bits to make them available to the MicroCode.
- Examine the IR and route appropriate bits to the Instruction Decoder to determine the MicroCode start address for the actual instruction itself.
Let’s look at each computation in turn.
Number of Operands
There are fifteen permitted instructions that require two operands. I’ve designed these so that if the high 4 bits of the IR are 0000 - 1110, there are 2 operands.
If the high 4 bits of the IR are 1, then we examine the next 6 bits, IR11 - IR6. If they are 000000 - 001111, we have a 1-operand instruction.
If IR11-IR10 are 01 or 1X, we have 0-operand instruction, but we need to parse the OpCode bits in a special way.
Finally, if IR15-IR6 are 1, then we have a 0-operand instruction without special treatment of the operand field.
Two Operand = NAND(IR15-IR12) = OR(IR15*, IR14*, IR13*, IR12*)
One Operand = AND(IR15, IR14, IR13, IR12, IR11*, IR10*)
Zero Operand = AND(IR15, IR14, IR13, IR12, B11) OR AND(IR15, IR14, IR13, IR12, IR10)
Each of these expressions has 2 gate delays, which is the best we can do.
Operand Fetch MicroCode Start Address
To actually fetch the operand and latch it in a register or one of the two ALU operand latches, we will turn control over to the MicroCode. There will be a separate MicroCode routine to handle each Addressing Mode. We need to examine the Addressing Mode and determine the start address in MicroCode to handle it. That is this task.
We will use a small ROM to examine the addressing mode bits and lookup the MicroCode starting address. In principle, we could just route all 16 bits of the IR to the Instruction Decoder to find the start address, but because there is so much redundancy, that is inefficient. We would need a 64Kx16 memory to look up just 8 distinct Addressing Modes. Better to reduce it with some logic.
Note that we’ll need to do this for each operand, so we’ll need logic to select the Addressing Mode select bits for each of the operands in sequence when we have two operands, and just for the one operand when we have just one.
As a reminder, here are the addressing modes and how the bits of the Operand field signal the mode.
Mode Name
OP5
OP4
OP3
OP2
OP1
OP0
Addr Mode ID
Register
0
X
X
X
X
X
0 = 000
Indexed
1
0
0
X
X
X
1 = 001
Indirect
1
0
1
X
X
X
2 = 010
Doubly Indir
1
1
0
X
X
X
3 = 011
Indir Pre-Dec
1
1
1
0
X
X
4 = 100
Indir Post-Inc
1
1
1
1
X
X
5 = 101
Immediate
1
1
1
1
1
1
6 = 110
We’ll map the OP bits to the Addressing Mode ID to look up the MC start address of the routine to handle the Addressing Mode. We need to generate a 3-bit value. Here are the Karnaugh maps to generate the low 3 bits of the MC start-address lookup. We’ll deal with the upper bits later.
Mode Lookup Bit 2
OP2-0:
000
001
011
010
110
111
101
100
OP5-3: 000
0
0
0
0
0
0
0
0
001
0
0
0
0
0
0
0
0
011
0
0
0
0
0
0
0
0
010
0
0
0
0
0
0
0
0
110
0
0
0
0
0
0
0
0
111
1
1
1
1
1
1
1
1
101
0
0
0
0
0
0
0
0
100
0
0
0
0
0
0
0
0
Mode Lookup Bit 1
OP2-0:
000
001
011
010
110
111
101
100
OP5-3: 000
0
0
0
0
0
0
0
0
001
0
0
0
0
0
0
0
0
011
0
0
0
0
0
0
0
0
010
0
0
0
0
0
0
0
0
110
1
1
1
1
1
1
1
1
111
0
0
0
0
0
1
0
0
101
1
1
1
1
1
1
1
1
100
0
0
0
0
0
0
0
0
Mode Lookup Bit 0
OP2-0:
000
001
011
010
110
111
101
100
OP5-3: 000
0
0
0
0
0
0
0
0
001
0
0
0
0
0
0
0
0
011
0
0
0
0
0
0
0
0
010
0
0
0
0
0
0
0
0
110
1
1
1
1
1
1
1
1
111
0
0
0
0
1
0
1
1
101
0
0
0
0
0
0
0
0
100
1
1
1
1
1
1
1
1
So:
- MLookupBit2: OP5 AND OP4 AND OP3
- MLookupBit1: AND(OP5, OP4, OP3*) OR AND(OP5, OP4*, OP3) OR AND(OP5, OP4, OP3, OP2, OP1, OP0)
- MLookupBit0: AND(OP5, OP4, OP3*) OR AND(OP5, OP4*, OP3*) OR AND(OP5, OP4, OP3, OP2, ,OP1, OP0*) OR AND(OP5, OP4, OP3, OP2, OP1*)
Register Address Bit Routing
For most of the addressing modes, I need to route register select information from the operand field to the register memory address lines (RA). Most of the development for this topic is in the post “Instruction Set and CPU Architecture.”
The logic is:
RA4 = OP5 OR OP4
For RA3:
Conceptually:
RA3
OP2, 1, 0:
000
001
011
010
110
111
101
100
OP5, 4, 3: 000
OP3
OP3
OP3
OP3
OP3
OP3
OP3
OP3
001
OP3
OP3
OP3
OP3
OP3
OP3
OP3
OP3
011
OP3
OP3
OP3
OP3
OP3
OP3
OP3
OP3
010
OP3
OP3
OP3
OP3
OP3
OP3
OP3
OP3
110
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
111
1
1
1
1
1
1
1
1
101
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
100
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
In detail:
RA3
OP2, 1, 0:
000
001
011
010
110
111
101
100
OP5, 4, 3: 000
0
0
0
0
0
0
0
0
001
1
1
1
1
1
1
1
1
011
1
1
1
1
1
1
1
1
010
0
0
0
0
0
0
0
0
110
0
0
0
0
1
1
1
1
111
1
1
1
1
1
1
1
1
101
0
0
0
0
1
1
1
1
100
0
0
0
0
1
1
1
1
Algebraically:
RA3 = AND(OP5, OP4, OP3) OR AND(OP5*, OP3) OR AND(OP5, OP2)
For RA2:
Conceptually:
RA2
OP2, 1, 0:
000
001
011
010
110
111
101
100
OP5, 4, 3: 000
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
001
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
011
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
010
OP2
OP2
OP2
OP2
OP2
OP2
OP2
OP2
110
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
111
1
1
1
1
1
1
1
1
101
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
100
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
OP2*
In detail:
RA2
OP2, 1, 0:
000
001
011
010
110
111
101
100
OP5, 4, 3: 000
0
0
0
0
1
1
1
1
001
0
0
0
0
1
1
1
1
011
0
0
0
0
1
1
1
1
010
0
0
0
0
1
1
1
1
110
1
1
1
1
0
0
0
0
111
1
1
1
1
1
1
1
1
101
1
1
1
1
0
0
0
0
100
1
1
1
1
0
0
0
0
Algebraically:
RA2 = AND(OP5, OP4, OP3) OR AND(OP5*, OP2) OR AND(OP5, OP2*)
For RA1 and RA0:
RA1 = OP1
RA0 = OP0
Instruction Start Address Lookup
We will use a small ROM (the Instruction Decoder, ID) to look up the starting address in the MicroCode to either fetch an operand or to perform an instruction. We don’t yet know how many address lines we’ll need to select from the ID, but let’s refer to them as IDA0, IDA1, …
To conserve resources, we would like the lookups in the ID to be relatively compact, meaning that there should not be large groups of input addresses that are unused. That just wastes ROM real-estate.
- For operand fetch, we use logic to map the Addressing Mode to 3 bits: MLookupBit2, 1, and 0
- Map MLB2 to IDA2, MLB1 to IDA1, MLB0 to IDA0.
- We’ll have to figure out IDA3… later
- For 2-operand instructions, we can route IR15-IR12 to IDA3-IDA0
- We’ll have to figure out IDA4… later
- For 1-operand instructions, we can route IR9-IR6 to IDA3-IDA0
- We’ll have to figure out IDA4… later
- For most 0-operand instructions, we can route IR3-IR0 to IDA3-IDA0
- For 0-operand instructions IM, SWI, BRA, JMP, and JSR, we will want to find a way to map to IDA3-IDA0 as well
- I haven’t mentioned it yet, but I’ll also need some routines in the MicroCode to store the result of instruction operations according to the addressing mode of the destination operand (or the single operand). I can probably reuse MLB2-MLB0, but will need to map the options to a different block of the ID.
Of course, we can see that we will need to generate the upper bits of IDA to avoid collisions among the different groups of instructions that all want to use IDA3-IDA0. The 2-, 1-, and 0-operand instructions all want 4-bits each, so let’s allocate 16 addresses to each major block, and map the blocks as follows:
Lookup Type
IDA6-IDA4
IDA3-IDA0
Operand Fetch
000
0, MLB2, MLB1, MLB0
2-Operand Instruction
001
IR15, IR14, IR13, IR12
1-Operand Instruction
010
IR9, IR8, IR7, IR6
0-Operand Instruction
011
IR3, IR2, IR1, IR0
Special 0-Operand
100
(Logic Mapping)
Result Save
101
0, MLB2, MLB1, MLB0
Unused
110
n/a
Reserved
111
(Logic Mapping)
As I was typing this out, it occurred to me that I could use the “Reserved” block of lookups (IDA6-4: 111) to lookup MicroCode to handle RESET, and Interrupt handling. I’ll keep that in my back pocket for later.
Finally, I need to design some logic to map the special 0-operand instructions (IW, SWI, BRA, JMP, JSR) to a compact list of values for IDA3-IDA0. Let’s tackle that.
Instruction
IR
Desired IDA3-0
IM
1111 0100 00mm mmmm
0000
SWI
1111 0100 01mm mmmm
0001
BRA
1111 1ccc dddd dddd
0010
JMP
1111 1111 1110 0ccc
0011
JSR
1111 1111 1110 1ccc
0100
I observed two possible optimizations here.
First, if I reorganize the bits of BRA to move the condition codes to the low bits, it will make it easier to use the condition code bits because they will always be in the same place and I won’t need logic to select them from two different places.
Second, if I set the desired IDA for JMP and JSR differently, I can map IR3 to IDA0 to differentiate between JMP and JSR with less logic. Using the same observation, I see that I can likely map IR6 to IDA0 to differentiate between IM and SWI.
OK, so here’s the redesigned map for IDA3-0.
Instruction
IR
Desired IDA3-0
IM
1111 0100 00mm mmmm
0000
SWI
1111 0100 01mm mmmm
0001
BRA
1111 1ddd dddd dccc
0010
JMP
1111 1111 1110 0ccc
0100
JSR
1111 1111 1110 1ccc
0101
Here’s the logic to control the IDA3-0 and the routing of the bit fields for ccc, dddd dddd, and mmm mmm.
IDA3 = 0
IDA2 = AND(IM15-IM5, IM4*)
IDA1 = AND(IM15-11)
IDA0 = AND(IM15-IM12, IM11*, IM10, IM9*, IM8*, IM7*, IM6) OR AND(IM15-IM5, IM4*, IM3)
MMMGate = AND(IM15-12, IM11*, IM10, IM9*, IM8*, IM7*)
DDDGate = AND(IM15-11)
CCCGate = AND(IM15-IM5, IM4*)
At the cost of one additional gate of propagation delay, it would be possible to re-use AND(IM15-IM12), AND(IM11*, IM10, IM9*, IM8*, IM7*), and AND(IM11-IM5). Also note that some of the ANDs are so wide that I will have to cascade two AND gates.
Whoopsie! Do-Over Required
[Edit 3/26/25: I am such a numbskull. The design above for the instruction group detection won’t work! I can’t believe I didn’t see the problem. This is embarrassing.
Clearly, there are values of dddd dddd and ccc that make BRA indistinguishable from JMP and JSR. That means the instruction decoder can’t tell which instruction to execute. That’s a problem.
I need to reassign the distinguishing upper bits of the third hex digit from the right of the Instruction Word so that the Instruction Decoder can distinguish between the instructions, regardless of what the operand is.
Since BRA needs the most auxiliary bits, let me start fixing it by making the op code for BRA:
- 1111 0ddd dddd dccc
Then Special Instructions IM, SWI, JMP, and JSR can be (respectively):
- 1111 1000 00 mmm mmm
- 1111 1000 01 mmm mmm
- 1111 1000 10 000 ccc
- 1111 1000 11 000 ccc
This approach keeps the bits that distinguish the instruction class all together and clusters the bits that differentiate the specific instruction within the class all together too.
The standard 1-Operand instructions can be:
- 1111 11xx xxpp pppp
- Where xxxx encode one of 16 instructions, which should be enough
- pp pppp encode the operand
Finally, Zero-Operand instructions can be:
- 1111 111111 00 xxxx
To summarize:
Instruction Word
Instruction
ID6-ID4
Instruction / Instruction Group
0000 aaaaaa bbbbbb
- 1110 aaaaaa bbbbbb
LD, ADC, ADD, AND, CMP, OR, SUB, SBC, XOR
001
2-operand instructions
1111 0 dddddddd ccc
BRA
100
Special
1111 100000 mmmmmm
- 1111 100000 mmmmmm
IM
100
Special
1111 100001 mmmmmm
- 1111 100001 mmmmmm
SWI
100
Special
1111 100010 000ccc
JMP
100
Special
1111 100011 000ccc
JSR
100
Special
1111 110000 bbbbbb
- 1111 110111 bbbbbb
NOT, NEG, INC, DEC, ROT*, SHIFT*
010
1-operand instructions
1111 111111 00 0000
- 1111 111111 00 1111
RTS, SWI, RTI, NOP, STC, CLC, etc.
011
0-operand instructions
aaaaaa = Source operand bits
bbbbbb = Destination operand bits
ccc = Condition code for branches and conditional jumps
dddddd = Displacement (-128 - +127) for relative branches
mmmmmm = bit mask for interrupts
OK, now that seems to be sorted, let me rebuild the logic that will detect the instruction group (for the high bits of the Instruction Decoder) and route the bit fields that distinguish the specific instructions to the low bits of the Instruction Decoder.
Here is the logic for the high bits of the Instruction Decoder lookup table.
Note that ID6-4 = 000 means we are looking up the subroutine to fetch an operand according to the Addressing Mode. The Instruction Sequencer state machine will force ID6-ID4 = 000 during those states and will drive ID3-ID0 from the Addressing Mode Decoder.
ID6 = AND(IW15, IW14, IW13, IW12, IW10*) OR
AND(IW15, IW14, IW13, IW12, IW11, IW10*)
(BY observation)
ID5 = AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9*) OR
AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9, IW8, IW7, IW6, IW5*, IW4*)
(By Karnaugh map)
ID4 = NAND(IW15, IW14, IW13, IW12) OR
AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9, IW8, IW7, IW6, IW5*, IW4*)
(By observation)
Let’s see how I will route the instruction bits to the low bits of the ID.
Instruction Group
ID6-ID4
Source for ID3-ID0
Operand Fetch
000
AMD
2-Operand
001
IR15-IR12
1-Operand
010
IR9-IR6
0-Operand
011
IR3-IR0
I’ll use four 4-1 multiplexers to do this. I’ll need to select parts with an Output Enable function so I can suppress the outputs when ID6-ID4 is 100.
For the Special instructions, there is no “instruction field” in the IW, so we need to synthesize a compact set of values for ID3-ID0 based on the IR bits.
First, I lay out the table of the information I have to work with. Reflecting, I can see that bits IW9-6 map directly to the instruction, except for BRA.
Instruction Word
Instruction
ID3-ID0
1111 0 dddddddd ccc
BRA
xxxx
1111 100000 mmmmmm
- 1111 100000 mmmmmm
IM
0000
1111 100001 mmmmmm
- 1111 100001 mmmmmm
SWI
0001
1111 100010 000ccc
JMP
0010
1111 100011 000ccc
JSR
0011
So, I could implement a little logic to map BRA to a value like 0100 and I’d be set.
ID3 = 0
ID2 = AND(IW11*, IW10*) OR
AND(IW11*, IW10)
ID1 = AND(IW11, IW10*, IW9*, IW8*, IW7)
ID0 = AND(IW11, IW10*, IW9*, IW8*, IW6)
I generated these with four Karnaugh maps for ID3, ID2, ID1, and ID0 with inputs: IW11, 10, 9, 8, 7, 6.
I can probably wire-OR these to the outputs of the 4-1 multiplexers I mentioned above, or use a buffer with a tri-state output.
END OF THE DO-OVER
One Final Comment
At the start of the project, I set as one of my objectives to use only TTL. I’m seeing now that using PLDs (22V10, 16V8, etc.) will reduce my chip-count (and wiring work) by a LOT. I think I’m going to back down on my initial commitment not to use PLDs. And, I’ve never used one before, so that will be fun.
-
Instruction Set and Addressing Modes
03/21/2025 at 13:01 • 0 commentsI think my first tasks are to define my instruction set and addressing modes, i.e.; the "Instruction Architecture".
I’m going to steer clear of a full CISC instruction set. Forget polynomial evaluation, floating point, and single-instruction block transfers or string lookups.
After reviewing manuals for the Motorola 6800, Motorola 68020, Zilog Z-80, Digital PDP-11, and MIPS RISC designs, I ‘ve settled on these instructions as sufficient to do anything I would need:
- LD (load)
- ADD, ADC - add with and without carry
- SUB, SBC - subtract with and without carry
- CMP (compare)
- AND
- OR
- NOT
- XOR
- NEG (negate)
- INC
- DEC
- Rotate (right/left with/without carry)
- RR, RRC
- RL, RLC
- Shift (right/left arithmetic/logical)
- SRA (arithmetic new MSB = old MSB), SRL (logical new MSB = 0)
- SL
- SLB, SRB (shift left/right 1 byte)
- BRA (relative jump on a variety of conditions, 1 word instruction)
- JMP (absolute jump on variety of conditions)
- JSR (jump to subroutine)
- IM (set interrupt mask)
- SETC (set CCR carry bit)
- CLRC (clear CCR carry bit)
- RTS (return from subroutine)
- SWI (software interrupt)
- RTI (return from interrupt)
- NOP (no operation)
I also feel my CPU needs to support these addressing modes:
- Implicit (no operand)
- Immediate (operand follows instruction in program memory)
- Register
- Indexed (address of operand is immediate address plus offset contained in a register)
- Indirect (content of register is address of operand)
- Indirect with pre-decrement of register (nice to have, but negotiable)
- Indirect with post-increment of register (nice to have, but negotiable)
- Doubly Indirect (content of register is the address of the address of the operand)
From my research, there are some addressing modes the M68020 has (and I think the VAX too) that I thought I might like, such as "Address Register Indirect with Index" where the address of an operand is a register, plus a constant times a second register, plus a fixed displacement. This is really convenient for accessing arrays of objects, where the base register is the start of the array, the “constant” is the size of the object, the register that multiplies the constant is the index in the array, and the final displacement is the offset within the object of the member value you want to access. I’ll have to live without it.
I was unsure initially about how many registers I wanted to support. My experience with the M6800 left me feeling that two registers was not enough. The Z-80’s accumulator plus six 8-bit or three 16-bit registers seemed to be the bare minimum. I left the question open, but came to a conclusion from another angle.
I didn’t want to deal with a combination of single-word and multi-word instructions, so I preferred that the instruction word should be able to contain the entire instruction, addressing mode, and source and destination register information. For immediate mode addressing, I was going to have to live with fetching the operand in the word after the instruction. I may be able to squeeze the (8-bit) relative offset of a branch instruction into the 16-bit instruction word if I’m creative. I’ll leave that for later.
My first thought was to have a certain bit-field in the instruction word for the instruction itself, then other bit fields for the addressing mode and register ID of each operand. If I allowed 3-bits for addressing mode and 3-bits for register ID for each of two operands, that would occupy 2 x (3 + 3) = 12 bits of my 16-bit word. That left just 4 bits to encode my 22-plus instructions.
At first, I saw my 8 registers melting away to two, but then I had an insight (OK, "insight" might just have been remembering it from someone else's architecture). Only a few instructions require two operands: LD, ADD, ADC, SUB, SBC, CMP, AND, OR, XOR. That’s 9. I can allow 4 bits to encode those, and if all 4 bits are 1s, say, that could indicate that the instruction was a one-operand instruction, and that the next 6-bits of the instruction word, no longer needed to encode the second operand, can be used to represent the instruction. Six bits is more than enough to encode all the one-operand instructions. And I can use the same trick so that if the middle 6 bits are all 1s, the instruction will be a zero-operand instruction (ie; implicit) and the instruction can be encoded in the bottom 6-bits.
It looks like this:
Instructions OpCode
Bits 15-12OpCode
Bits 11-6OpCode
Bits 5-0LD, ADD, SUB, CMP, AND, OR, XOR 0000 - 1110 Operand 2
Mode + RegOperand 1
Mode + RegNOT, NEG, INC, DEC, ROT*, SHIFT*, BRA 1111 00 0000
- 00 1111Operand 1
Mode + RegIM 1111 01 0000 mmmmmm SWI 1111 10 0000 mmmmmm BRA 1111 1 ddddd ddd ccc SETC, CLRC, RTS, RTI, NOP 1111 111 111 Operand 1
Mode + RegJMP 1111 111 111 100 ccc JSR 1111 111 111 101 ccc mmm mmm = interrupt mask
dddd dddd = 8 bit displacement
ccc = branching condition code
IM and SWI permits setting the interrupt mask value in one instruction word by reserving a specific combination of opcode bits to contain the new mask value. I may enforce some privileging system by detecting the top two bits of the PC and restricting access to lower mask values. (Low mask value restricts interrupts to highest priority), so for example, PC must be in the top 6% of memory (PC = 1111 X X X) to access mask values below 001000 and in the next 6% (PC = 1110 XXX) to access the next group of mask values (below 010000). All other PC values could access higher mask values. Attempts to write a non-permitted mask value would throw a mask permission fault interrupt).
BRA (conditional branch) deserves a little explanation. My idea is that if I reserve a particular combination of OpCode bits 11-6 (ie; starting with 1), then the ccc bits can encode the condition I want to test and the dddd dddd bits (8-bits) can encode a +127 to -128 offset). That keeps my branch instruction to one word.
I should make sure my "ccc" condition field is enough to encode what I need:
Code Condition CCR Logic 000 Always n/a 001 Carry Set C=1 010 Carry Clear C=0 011 Zero Z=1 100 Non-Zero Z=0 101 Negative N=1 110 Overflow V=1 111 Half-Carry H=1 At this point, I am satisfied that I can have my full instruction set and all my addressing modes and keep my 8 registers.
I realized I could apply the same bit-saving trick to register addressing, though. What if I split up my 6-bits for addressing mode and register ID differently for different modes?
I'll say that “Register” mode is represented by an addressing mode with the lead-bit 0, and the remaining 5 bits are used to select one of 32 registers.
If the lead bit is 1, that signals that the two next bits (1mm) can encode 3 (not 4) addressing modes (Indirect, Indexed, Doubly Indirect). The remaining 3 bits select one of 8 registers (1mm rrr).
If the first 3 bits are 1, that signals that the next bit (111n) is used to select between two addressing modes (Indirect pre-decrement, Indirect post-increment) but with only 3 register options (111n rr). I have to reserve rr=11 to indicate immediate addressing mode.
Addressing mode 1111 11 indicates immediate mode, which requires no register information.
Now, I have 32 registers that can be used for data, and of those, 8 registers can be used for any of 3 addressing modes. I have a further 2 addressing modes that can be applied to 3 of the registers.
It will make the implementation logic a little more complex, but I gain a lot of registers.
I'll make the registers accessible by the Indirect, Indexed, and Doubly Indirect modes different from the registers accessible by pre-dec and post-inc. That way the programmer won't have to trade off which registers to assign to which modes, although it may encroach on the registers available for general data use.
That leaves me with:
Mode Assembly
FormatAddr Mode Bits Register Bits Available Registers Register Rn 0 rrrrr R0-R31 Indexed (Rn+D) 100 rrr R20-R27 Indirect (Rn) 101 rrr R20-R27 Doubly-
Indirect@(Rn) 110 rrr R20-R27 Indirect
Pre-Dec-(Rn) 1110 rr R28-R30 Indirect
Post-Inc(Rn)+ 1111 rr R28-R30 Immediate #data 111111 n/a n/a Registers 20-27 will function as traditional index registers.
Registers 28-30 are equipped to serve as stack pointers.
Register 30, which is one of the three that support pre-decrement and post-increment can serve as my system stack pointer. It doesn’t need any special functionality, but I’ll notionally reserve it "by convention."
Register 31 isn’t available to the pre-dec and post-inc modes (because I need to reserve yy=11 to signify immediate mode), so I’ll reserve it as the program counter. Having the PC as a general register gives me some flexibility since now code can be self-aware of the PC. Jumps are just an alias for LD to R31. Relative jumps can be implemented by adding an offset to R31. The one thing I won't be able to do is to read a data byte at a relative offset to the PC. (Someone tell me what that's good for if you know. Some processors permit it.)
[Later...] I can see one potential challenge with this system. It would be easiest to route the register select bits from the addressing mode bit field to specific bits of the register address. That means the register ranges for different addressing modes may not be adjacent as they are in the table above. Mapping a 3-bit field so it addresses registers R20-R27 probably won't be convenient. If I simplify the logic, it may mean that there are data-only registers between the ranges of registers used for indexed-type access.
To illustrate, when I use Register Mode, I can just run my 5-bit field to the register address lines.
For Pre-dec and Post-Inc modes, I would prefer that the register address lines are: 111yy, with yy=11 not permitted. This addresses registers R28 - R30.
For Indirect, Indexed, and Doubly Indirect, most convenient would be to set the register address lines as 10yyy, which addresses registers R16-R23. That leaves an "island" of registers, R24-R27 (11000 - 11011) that are data-only registers. This would be an irritant to the programmer (me).
If I want Indirect, Indexed, and Doubly Indirect modes to run from R20 - R27 instead, then my register address lines in this mode need to run from 10100 to 11011, which is not possible to achieve just by routing address bits from the op code operand field to the register address lines. I could do it with some gates to a) invert bit 2 of the op code register select field (y2) on its way to the register address lines, b) make bit 3 of the register address the inverse of bit 2 of the register address lines when any of these three modes are selected, and running the addressing mode register select bits to bits 2, 1, and 0.
That would look like this (y2-y0 are operand mode bits, ra4-ra0 are register address lines):
Y2 Y1 Y0 RA4 RA3 RA2 RA1 RA0 Reg 0 0 0 1 0 1 0 0 R20 0 0 1 1 0 1 0 1 R21 0 1 0 1 0 1 1 0 R22 0 1 1 1 0 1 1 1 R23 1 0 0 1 1 0 0 0 R24 1 0 1 1 1 0 0 1 R25 1 1 0 1 1 0 1 0 R26 1 1 1 1 1 0 1 1 R27 OK, that's not as painful as I expected it to be. It does add a gate delay during resgister select. If that turns out to be a problem, I may have to remove the feature and revert to the system with the "island" of data-only registers.
To sum it all up (OF = Operand Field):
Mode OF
Bit 5OF
Bit 4OF
Bit 3OF
Bit 2OF
Bit 1OF
Bit 0Register 0 R4 RA3 RA2 RA1 RA0 Indexed 1 1 1 RA3,
RA2*RA1 RA0 Indirect 1 0 1 RA3,
RA2*RA1 RA0 Doubly
Indirect1 1 0 RA3,
RA2*RA1 RA0 Indir
Pre-Dec1 1 1 0 RA1 RA0 Indir
Post-Inc1 1 1 1 RA1 RA0 Immed 1 1 1 1 1 1 -
Inception
03/19/2025 at 19:26 • 0 commentsWhen I was 14, I built an Altair 680b computer from a kit. This was in the earliest days of personal computing. Earlier than that, even. There were instructions. There was a technical manual. There was a CPU manual. But it came with no software, no operating system, no applications. There was not even a hard disk. For the first six months I had the machine, there was no way to save a program; everything was lost the instant I turned the computer off.
Generic Altair 680b (https://www.retrotechnology.com/restore/altair680.html)
Later, I bought a 16K memory card (for $600 IIRC). It came with a text editor, an assembler program, and a debugger program. But, they were on paper tape, and I did not have a paper tape reader. I spent a lot of time looking at the paper tapes I had, longing for a tape reader or an assembler program I could use.
Eventually, I purchased a cassette tape interface, so I could store programs. As it happened, the tape interface came with a 4K Basic interpreter, on cassette tape. So, I finally had a programming language I could use, even if it did take 10 minutes to boot up. For a while, I forgot about an assembler. I wrote programs in Basic.
Years later, I decided to build a Z-80 based S-100 computer. I bought a CPU card, a 64K RAM card, a disk controller and two eight-inch floppy disk drives. But I never got the pieces to work together, and never wrote any code on that machine. By that time I was in college, writing code in C on a VAX-11/750. I was in heaven. I put the Z-80 parts in a box in my parents basement and forgot about them.
Recently, forty years later, I found the box of Z-80 parts in my own attic. I thought about putting them back together. And I realized: I had no operating system, no interpreter or compiler, no software. What would I do with my Z-80?
So, as I began assembling the parts to reconstitute the computer, I decided to write my own Z-80 assembler. From scratch. From first principles. Not copying anyone else’s design or code.
I studied the Z-80 CPU databook. I thought about what I had learned while earning a bachelors and then a master’s degree in computer science. (If I have time, I'll try to weave my development notes into a separate blog.) After I finished the assembler, and tested it on a Z-80 emulator (I haven’t finished the Z-80 hardware yet), I happened to re-read a book, The Soul of a New Machine.
The Soul of a New Machine, by Tracy Kidder, chronicles a small team of engineers at Data General, a mini-computer manufacturer in Massachusetts, working in the late ‘70s to design the next generation 32-bit mini-computer. I’d read the book in college, probably in 1983 or 1984. The first time, it was inspiring, but like a dream. The second time, I wondered “Could I have worked on that team? Could I have done what they did?”
Only one way to find out...
Here's what I'm hoping to achieve:
- Design and build a micro-coded 16-bit CPU that could easily be extended to 32-bits just by widening the data paths
- Include enough support for full-fledged operation of a UNIX-like operating system, meaning
- Virtual memory
- Supervisor / privileged / user modes
- Multi-tasking
- Efficient context / process switching
- Priority interrupts
- DMA
- Hard disk
- Include a few fun features. At the moment, that means:
- Memory cache to keep routine data / instruction fetches off the system bus so the DMA system can do things like file transfer and page swapping efficiently
- As much as possible, stick to early-'80s technology like 74LS series TTL
- No FPGAs
- I do get to use an IDE/ATA disk (ok, that's a cheat)