Inspired by The Soul of a New Machine Im designing and building a 16-bit CPU with virtual memory, CPU cache, and rich addressing modes
To make the experience fit your profile, pick a username and tell us what interests you.
We found and based on your interests.
April 18, 2025
Before I dig in:
As I was doing the circuit design, I made some observations that call for more revision of past decisions. Sorry, that’s just the way design goes. You lay something out, later reality points out an impracticality, you revise.
Also, I kind of realized that the Instruction Controller was over-reaching into the domain of the MicroCode lookup and MicroCode Sequencer. I’m going to reduce the boundary of the Instruction Controller so that it sends the MicroCode Start address and MicroController Start signal to a separate block. Here’s the reduced block diagram:
As I was designing the Instruction Group Decoder circuit, I realized there are four cases when the low 4-bits of the Instruction Decoder Address have to be driven by different bit-fields of the IW:
This clearly calls for a 4-1 multiplexer to map different parts of the IW to the low bits of IDA. Unfortunately, I assigned the Instruction Group IDs for these four cases as 001, 010, 011, and 100.
I could build some logic to map these to 00, 01, 10, and 11. Or, I can just redefine IDA6-IDA4 so that just two of the three bits are needed to drive a multiplexer directly. To do that, I need to re-map the Instruction Group for “Fetch Operand” to a different Instruction Group ID. So, I have to change some of the tables in previous posts. Sorry if this generates confusion.
Here’s the result:
Lookup Type | IDA6-IDA4 | IDA3-IDA0 |
Special 0-Operand | 000 | (Logic Mapping) |
2-Operand Instruction | 001 | IR15, IR14, IR13, IR12 |
1-Operand Instruction | 010 | IR9, IR8, IR7, IR6 |
0-Operand Instruction | 011 | IR3, IR2, IR1, IR0 |
Operand Fetch | 100 | 0, MLB2, MLB1, MLB0 |
Result Save | 101 | 0, MLB2, MLB1, MLB0 |
Unused | 110 | n/a |
Reserved | 111 | (Logic Mapping) |
Looking more closely at how the IW influences ID6-ID4:
Instruction Word | Instruction | ID6-ID4 | Instruction / Instruction Group |
1111 0 dddddddd ccc | BRA | 000 | Special |
1111 100000 mmmmmm to 1111 100000 mmmmmm | IM | 000 | Special |
1111 100001 mmmmmm to 1111 100001 mmmmmm | SWI | 000 | Special |
1111 100010 000ccc | JMP | 000 | Special |
1111 100011 000ccc | JSR | 000 | Special |
0000 aaaaaa bbbbbb to 1110 aaaaaa bbbbbb | LD, ADC, ADD, AND, CMP, OR, SUB, SBC, XOR | 001 | 2-operand instructions |
1111 110000 bbbbbb to 1111 110111 bbbbbb | NOT, NEG, INC, DEC, ROT*, SHIFT* | 010 | 1-operand instructions |
1111 111111 00 0000 to 1111 111111 00 1111 | RTS, SWI, RTI, NOP, STC, CLC, etc. | 011 | 0-operand instructions |
Using a Karnaugh map, I derive these equations to produce ID6-ID4:
ID6 = 0
ID5 = ((IW15-IW12) and IW11 and IW10 and NOT IW9) OR
(AND(IW15-IW12) and IW11 and IW10 and IW9 and IW8 and IW7 and IW6 and NOT IW5)
IW4 = NOT(AND(IW15-IW12)) OR
AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9, IW8, IW7, IW6, NOT IW5)
If I reassign the mapping for OP1 and OP0 as follows, I can even recycle the ID5 and ID4 bits to drive OP1 and OP0, saving some logic:
Instruction Group | OP1, OP0 |
Special | 00 |
2-operand | 01 |
1-operand | 10 |
0-operand | 11 |
OP1 = ID5
OP0 = ID4
Finally, I realized I made a minor mistake with mapping the ID3-0 bits for Special instructions. I left the bits undefined for BRA instruction. Here’s the fixed table.
Instruction Word | Instruction | ID3-ID0 |
1111 0 dddddddd ccc | BRA | 0100 |
1111 100000 mmmmmm to 1111 100000 mmmmmm | IM | 0000 |
1111 100001 mmmmmm to 1111 100001 mmmmmm | SWI | 0001 |
1111 100010 000ccc | JMP | 0010 |
1111 100011 000ccc | JSR | 0011 |
Here’s the logic for Special Instructions when ID6-4 = 000:
ID3 = 0
ID2 = AND(IW15, IW14, IW13, IW12, not IW11)
ID1 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW7
ID0 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW6
I used Logisim-evolution to design and simulate the circuitry, not based on extensive research but because it was free and came up near the top of my Google search.
To keep the circuit diagram from becoming too complex, I used Logisim’s “Subcircuits” ability to design each of...
Read more »Today, I’m going to focus on the CPU Instruction Controller. The Instruction Controller will orchestrate its subsystems to fetch an instruction, get any needed operands, perform the instruction itself, potentially store a result, and queue up the next instruction.
My initial vision is a sequencer that will:
Since the MicroCode for the actual instruction may take several cycles, I thought I might try to pipeline operations a little and start the instruction fetch and Addressing Mode Decoder steps as soon as the Controller turns control over to the MicroCode Sequencer for the instruction itself. Whoever finishes first would have to wait.
Hmm. It sounds nice, but as I type this I see a potential problem. If the result of the current instruction alters one of the operands of the subsequent instruction, that would be a problem. In theory, I could check to see if the addresses are the same and then pause the pre-fetch logic if they are, or even detect if the operand >changed<. Or it could be more trouble than it’s worth. I’m voting for the latter.
Maybe the Controller could at least pre-fetch the next instruction. Ah, but what if there’s a branch? I could pre-fetch the instruction as long as the current instruction isn’t a branch, jump, RTS, RTI, etc. OK, KISS for now.
Allright, in any case, I see the Instruction Sequencer (IS) using a shift register to sequence the steps as:
The Addressing Mode Decoder (AMD) unit needs to do a few computations:
I think my first tasks are to define my instruction set and addressing modes, i.e.; the "Instruction Architecture".
I’m going to steer clear of a full CISC instruction set. Forget polynomial evaluation, floating point, and single-instruction block transfers or string lookups.
After reviewing manuals for the Motorola 6800, Motorola 68020, Zilog Z-80, Digital PDP-11, and MIPS RISC designs, I ‘ve settled on these instructions as sufficient to do anything I would need:
I also feel my CPU needs to support these addressing modes:
From my research, there are some addressing modes the M68020 has (and I think the VAX too) that I thought I might like, such as "Address Register Indirect with Index" where the address of an operand is a register, plus a constant times a second register, plus a fixed displacement. This is really convenient for accessing arrays of objects, where the base register is the start of the array, the “constant” is the size of the object, the register that multiplies the constant is the index in the array, and the final displacement is the offset within the object of the member value you want to access. I’ll have to live without it.
I was unsure initially about how many registers I wanted to support. My experience with the M6800 left me feeling that two registers was not enough. The Z-80’s accumulator plus six 8-bit or three 16-bit registers seemed to be the bare minimum. I left the question open, but came to a conclusion from another angle.
I didn’t want to deal with a combination of single-word and multi-word instructions, so I preferred that the instruction word should be able to contain the entire instruction, addressing mode, and source and destination register information. For immediate mode addressing, I was going to have to live with fetching the operand in the word after the instruction. I may be able to squeeze the (8-bit) relative offset of a branch instruction into the 16-bit instruction word if I’m creative. I’ll leave that for later.
My first thought was to have a certain bit-field in the instruction word for the instruction itself, then other bit fields for the addressing mode and register ID of each operand. If I allowed 3-bits for addressing mode and 3-bits for register ID for each of two operands, that would occupy 2 x (3 + 3) = 12 bits of my 16-bit word. That left just 4 bits to encode my 22-plus instructions.
At first, I saw my 8 registers melting away to two, but then I had an insight (OK, "insight" might just have been remembering it from someone else's architecture). Only a few instructions require two operands: LD, ADD, ADC, SUB, SBC, CMP, AND, OR, XOR. That’s 9. I can allow 4 bits to encode those, and if all 4 bits are 1s, say, that could indicate that the instruction was a one-operand instruction, and that the next 6-bits of the instruction word, no longer needed to encode the...
Read more »When I was 14, I built an Altair 680b computer from a kit. This was in the earliest days of personal computing. Earlier than that, even. There were instructions. There was a technical manual. There was a CPU manual. But it came with no software, no operating system, no applications. There was not even a hard disk. For the first six months I had the machine, there was no way to save a program; everything was lost the instant I turned the computer off.
Generic Altair 680b (https://www.retrotechnology.com/restore/altair680.html)
Later, I bought a 16K memory card (for $600 IIRC). It came with a text editor, an assembler program, and a debugger program. But, they were on paper tape, and I did not have a paper tape reader. I spent a lot of time looking at the paper tapes I had, longing for a tape reader or an assembler program I could use.
Eventually, I purchased a cassette tape interface, so I could store programs. As it happened, the tape interface came with a 4K Basic interpreter, on cassette tape. So, I finally had a programming language I could use, even if it did take 10 minutes to boot up. For a while, I forgot about an assembler. I wrote programs in Basic.
Years later, I decided to build a Z-80 based S-100 computer. I bought a CPU card, a 64K RAM card, a disk controller and two eight-inch floppy disk drives. But I never got the pieces to work together, and never wrote any code on that machine. By that time I was in college, writing code in C on a VAX-11/750. I was in heaven. I put the Z-80 parts in a box in my parents basement and forgot about them.
Recently, forty years later, I found the box of Z-80 parts in my own attic. I thought about putting them back together. And I realized: I had no operating system, no interpreter or compiler, no software. What would I do with my Z-80?
So, as I began assembling the parts to reconstitute the computer, I decided to write my own Z-80 assembler. From scratch. From first principles. Not copying anyone else’s design or code.
I studied the Z-80 CPU databook. I thought about what I had learned while earning a bachelors and then a master’s degree in computer science. (If I have time, I'll try to weave my development notes into a separate blog.) After I finished the assembler, and tested it on a Z-80 emulator (I haven’t finished the Z-80 hardware yet), I happened to re-read a book, The Soul of a New Machine.
The Soul of a New Machine, by Tracy Kidder, chronicles a small team of engineers at Data General, a mini-computer manufacturer in Massachusetts, working in the late ‘70s to design the next generation 32-bit mini-computer. I’d read the book in college, probably in 1983 or 1984. The first time, it was inspiring, but like a dream. The second time, I wondered “Could I have worked on that team? Could I have done what they did?”
Only one way to find out...
Here's what I'm hoping to achieve:
Create an account to leave a comment. Already have an account? Log In.
Peabody1929 Thanks for the advice and for following!
Microprogrammed is definitely the path I'm taking. I'm currently working through the instruction sequencer and I should be posting that in a day or two.
It's funny you mention writing a microcode emulator. This was one of the big debates they had in the DataGeneral team during Eclipse development -- whether to invest the time and effort. But, I agree.
I have also been thinking about using ALS both for power/fanout as well as speed.
Interesting project! I suggest on a microprogrammed machine. It is far simpler to change microcode than hardware. First design the micromachine. Second write the microcode to implement the instruction set. Third write a simulator for the micromachine to run the microcode. This will simulate the macromachine as well. Fourth, write macrocode for an application. An assembler would be good. Then run the application on the simulator. Gather instruction counts during execution. Usually, the Load and Store instructions are used the most. Calculate how long these instuctions take to execute will tell you the performance of the machine.
Last, don't use 74LS logic. I suggest 74ALS or one of the CMOS logic families.
Good Luck!
Become a member to follow this project and never miss any updates
Hello Anthony, you might get inspiration by looking at my project #Isetta TTL computer !