• Designing and Implementing the MicroCode Sequencer

    Anthony Faulise05/06/2025 at 21:21 0 comments

    It’s time to translate my intent for the MicroCode Sequencer into design decisions.

    Narratively, when the CPU executes an instruction:

    1. The Instruction Controller presents the MC Program Counter where execution should start (MC_PC_START) and sets the MC_START bit
    2. The MC Sequencer outputs the control signals from the MC ROM at the address specified by the MC_PC to the ALU and other parts of the CPU
    3. At every clock cycle when an instruction is executing*, the MC Sequencer increments the MC_PC or loads a new value entirely in the event of a Jump or Branch, loop back to 3
    4. At the end of an instruction, the MC ROM sets a signal (MC_DONE) that prevents the MC_PC from loading a new value and instead stops execution and signals the Instruction Controller that the MicroCode Sequencer is done.

    *Note (from above): whenever the MicroCode Sequencer requests a memory read or write from the CPU bus controller, it should wait any number of clock cycles until the memory operation completes. So, in the case of memory-wait condition, the MC PC won’t actually advance on every clock cycle.

    The inputs to the MicroCode Sequencer will be:

    • MC_START - Instruction Controller calls the MicroSequencer to start
    • MC_PC_START - Starting MicroSequencer Program Counter, supplied by Instruction Controller
    • BUS_CONTROLLER_DONE - Memory has been read and data is available, or data has been written
    • COND - Branch condition bits extracted from the Instruction Word by the Instruction Controller
    • BRA_OFF - Branch condition relative offset extracted from the Instruction Word by the Instruction Controller

    The outputs of the MicroCode Sequencer are:

    • To the ALU
      • A bunch of signals that I now realize I can’t define until I design the ALU
    • Other
      • BUS_CONTROLLER_READ_REQUEST - to read instructions and data
      • BUS_CONTROLLER_WRITE_REQUEST - to write data to RAM
      • MC_DONE - Signal to Instruction Controller that the MicroSequencer is done executing the current instruction
      • MC_PC_SEL - Select which data to load into the MC_ProgramCounter on the next clock cycle:
        • 00 = 0, used for RESET
        • 01 = BRANCH_REGISTER
        • 10 = INSTRUCTION CONTROLLER START ADDR
        • 11 = NEXT = MC_PC+1

    Let’s address the implementation itself.

    We’ll use a simple S-R flip-flop to track whether the MC_Seq is busy or idle.

    When the MC_Seq is done with an instruction, we’ll use an S-R flip flop to generate a one-clock-cycle DONE pulse to the Instruction controller.

    We’ll use a register to capture the MC_START_ADDRESS on every clock cycle when the MC_Seq is idle, so that when we receive MC_START, the start address is already captured.

    We’ll need some logic to examine MC_Seq control signals like ready/busy, branch, jump, done, and RESET to determine which value to load into the MC_PC. Conceptually:

    • If RESET, load the value 0x0000
    • If the MCSeq is not busy, load from MC_PC_START
    • If the MCSeq is busy and done, don’t care
    • If the MCSeq is busy, not doing a branch or jump, and not done, load from MC_PC_INC
    • If the MCSeq is busy and doing a jump, load from MC_ROM_LOW_BITS
    • If the MCSeq is busy and doing a branch, and the CCR_Condition matches, load from MC_ROM_LOW_BITS
    • If the MCSeq is busy, and doing a branch, and the CCR_Condition does not match, load from MC_PC_INC

    I’ll use a 4-1 multiplexer for the routing of the MC_PC address.

    Select mapping for the multiplexer is:

    Select

    Input

    00

    0x00

    01

    MC_ROM output low-bits

    10

    MC_START_ADDR

    11

    MC_PC + 1

    To implement the logic of selecting which source to use for the next MC_PC, I’ll use this Karnaugh map:

    Jump, Branch, CCR Match

    000

    001

    011

    010

    110

    111

    101

    100

    Reset, Busy, Done

    000

    10

    10

    10

    10

    10

    10

    10

    10

    001

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    011

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    d/c

    010

    11

    11

    01

    11

    n/a

    n/a

    01

    01

    110

    00

    00

    00

    00

    00

    00

    00

    00

    111

    00

    00

    00

    00

    00

    00

    00

    00

    101

    00

    00

    00

    00

    00

    00

    00

    00

    100

    00

    00

    00

    00

    00

    00

    00

    00

    n/a - not applicable, condition should not arise

    d/c - don’t care

    Select_Bit_1 = AND(Not_RESET, Not_Busy) OR AND(Not RESET, Not_Jump, Not_Branch) OR AND(Not_RESET, Branch, Not_Match)

    Select_Bit_0 = AND(Not_RESET, Busy, Not_Done)

    I’ll route the output...

    Read more »

  • Planning the MicroCode Sequencer

    Anthony Faulise05/05/2025 at 17:06 0 comments

    The purpose of the MicroCode Sequencer is to fetch MicroCode instructions in order and direct them to the ALU and other CPU components in order to manipulate data that accomplishes the intended machine instructions.  The MicroCode Sequencer needs to be able to jump and branch within the MicroCode, either to effectuate branch instructions or to loop (potentially in the case of multiply or divide instructions). The MicroSequencer needs to be able to pause its operation while it waits for the Bus Controller to fetch memory contents (and maybe write to memory, though I think that could potentially happen asynchronously, with the MicroSequencer carrying on before the Bus Controller confirms a write is complete). Finally, the MicroSequencer output will control the elements of the ALU and CPU without further intermediation, so the control outputs of the MicroSequencer will be enabling and disabling buffers, adders, and multiplexers directly. As a result, we expect the MicroSequencer to have a very wide data output word, potentially dozens and as many as 50 bits wide.

    Narratively, I need the MicroSequencer to:

    1. Receive a starting address from the Instruction Controller
    2. Receive a request to start from the Instruction Controller
    3. Fetch the requested MicroCode Instruction
    4. Direct the MC Instruction outputs to the ALU or other elements of the CPU
    5. When necessary, evaluate a branch condition and proceed to a non-sequential next MicroCode instruction
    6. Otherwise, fetch the next sequential MicroCode Instruction, and repeat from 4
    7. Detect the last MicroCode instruction in a machine instruction and halt operation
    8. Signal the Instruction Controller that the current machine instruction has been completed

    As a reminder, the Instruction Controller will call “subroutines” within the MicroSequencer to fetch and store Machine Instruction operands according to the addressing mode specified in the operand fields within an Instruction Word. These subroutines will not be accessible directly to the Machine Instruction programmer, but only indirectly by specifying a particular addressing mode in an instruction. Thus, when the MicroController is called to perform a Machine Instruction (like ADD or SHIFT LEFT), all necessary operands will already have been latched into the ALU operand latches. LIkewise, storage of the result of a Machine Instruction happens invisibly to the MicroCode of that Machine Instruction.

    For the most part, I expect the code for a Machine Instruction will be relatively simple. For example, for ADC (add with carry), in the first clock cycle, the MicroController would simultaneously output control signals to:

    • Enable gating the Condition Code Register’s Carry bit to the Adder Carry-In
    • Select ALU operand 2 to be routed to the Adder directly
    • Enable gating of the Adder’s carry-out bit to the latch for the Condition Code Register’s Carry bit
    • Select the Adder output should be routed to the ALU result latch
    • Enable the ALU result register to latch its input
    • Increment the MicroCode Program Counter

    In the second clock cycle, the MicroController would simultaneously output control signals to:

    • Latch the ALU Adder carry bit into the CCR
    • Latch the ALU result register
    • Enable output of the ALU result register to the Internal Data Bus
    • Signal the Instruction Controller that the MicroController is done

    Branching and jumping seem to present some complexity. When we branch, we load a new value into the MicroCode Program Counter register, potentially based on the value of the CCR. We need to represent the destination address somewhere, and that has to be in a MicroCode instruction. The destination address field has to be the full width of the MicroCode ROM address word. The MicroCode ROM data word will already be quite wide. Do I really want to dedicate another 8-12 bits to hold branch and jump destination addresses?

    One solution I see is to allow the bottom 8-12 bits of the MC ROM to hold the destination address when we are branching or jumping,...

    Read more »

  • Implementing the Instruction Controller

    Anthony Faulise04/18/2025 at 21:20 0 comments

    April 18, 2025

    Before I dig in:

    As I was doing the circuit design, I made some observations that call for more revision of past decisions. Sorry, that’s just the way design goes. You lay something out, later reality points out an impracticality, you revise.

    Also, I kind of realized that the Instruction Controller was over-reaching into the domain of the MicroCode lookup and MicroCode Sequencer. I’m going to reduce the boundary of the Instruction Controller so that it sends the MicroCode Start address and MicroController Start signal to a separate block.  Here’s the reduced block diagram:

    Instruction Group Mapping - ReDo

    As I was designing the Instruction Group Decoder circuit, I realized there are four cases when the low 4-bits of the Instruction Decoder Address have to be driven by different bit-fields of the IW:

    • 2-operand
    • 1-operand
    • 0-operand
    • Special-group instructions. 

    This clearly calls for a 4-1 multiplexer to map different parts of the IW to the low bits of IDA. Unfortunately, I assigned the Instruction Group IDs for these four cases as 001, 010, 011, and 100. 

    I could build some logic to map these to 00, 01, 10, and 11. Or, I can just redefine IDA6-IDA4 so that just two of the three bits are needed to drive a multiplexer directly. To do that, I need to re-map the Instruction Group for “Fetch Operand” to a different Instruction Group ID. So, I have to change some of the tables in previous posts. Sorry if this generates confusion.

    Here’s the result:

    Lookup Type

    IDA6-IDA4

    IDA3-IDA0

    Special 0-Operand

    000

    (Logic Mapping)

    2-Operand Instruction

    001

    IR15, IR14, IR13, IR12

    1-Operand Instruction

    010

    IR9, IR8, IR7, IR6

    0-Operand Instruction

    011

    IR3, IR2, IR1, IR0

    Operand Fetch

    100

    0, MLB2, MLB1, MLB0

    Result Save

    101

    0, MLB2, MLB1, MLB0

    Unused

    110

    n/a

    Reserved

    111

    (Logic Mapping)

    Looking more closely at how the IW influences ID6-ID4:

    Instruction Word

    Instruction

    ID6-ID4

    Instruction / Instruction Group

    1111 0 dddddddd ccc

    BRA

    000

    Special

    1111 100000 mmmmmm

    to

     1111 100000 mmmmmm

    IM

    000

    Special

    1111 100001 mmmmmm

    to

    1111 100001 mmmmmm

    SWI

    000

    Special

    1111 100010 000ccc

    JMP

    000

    Special

    1111 100011 000ccc

    JSR

    000

    Special

    0000 aaaaaa bbbbbb

    to

    1110 aaaaaa bbbbbb

    LD, ADC, ADD, AND, CMP, OR, SUB, SBC, XOR

    001

    2-operand instructions

    1111 110000 bbbbbb

    to

    1111 110111 bbbbbb

    NOT, NEG, INC, DEC, ROT*, SHIFT*

    010

    1-operand instructions

    1111 111111 00 0000

    to

    1111 111111 00 1111

    RTS, SWI, RTI, NOP, STC, CLC, etc.

    011

    0-operand instructions

    Using a Karnaugh map, I derive these equations to produce ID6-ID4:

    ID6 = 0

    ID5 = ((IW15-IW12) and IW11 and IW10 and NOT IW9) OR

    (AND(IW15-IW12) and IW11 and IW10 and IW9 and IW8 and IW7 and IW6 and NOT IW5)

    IW4 = NOT(AND(IW15-IW12)) OR

    AND(IW15, IW14, IW13, IW12, IW11, IW10, IW9, IW8, IW7, IW6, NOT IW5)

    If I reassign the mapping for OP1 and OP0 as follows, I can even recycle the ID5 and ID4 bits to drive OP1 and OP0, saving some logic:

    Instruction Group

    OP1, OP0

    Special

    00

    2-operand

    01

    1-operand

    10

    0-operand

    11

    OP1 = ID5

    OP0 = ID4

    Finally, I realized I made a minor mistake with mapping the ID3-0 bits for Special instructions. I left the bits undefined for BRA instruction.  Here’s the fixed table.

    Instruction Word

    Instruction

    ID3-ID0

    1111 0 dddddddd ccc

    BRA

    0100

    1111 100000 mmmmmm

    to

    1111 100000 mmmmmm

    IM

    0000

    1111 100001 mmmmmm

    to

    1111 100001 mmmmmm

    SWI

    0001

    1111 100010 000ccc

    JMP

    0010

    1111 100011 000ccc

    JSR

    0011

    Here’s the logic for Special Instructions when ID6-4 = 000:

    ID3 = 0

    ID2 = AND(IW15, IW14, IW13, IW12, not IW11)

    ID1 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW7

    ID0 = AND(IW15, IW14, IW13, IW12) AND (IW11, not IW10, not IW9, not IW8) AND IW6

    Back to the Instruction Controller Design

    I used Logisim-evolution to design and simulate the circuitry, not based on extensive research but because it was free and came up near the top of my Google search.

    To keep the circuit diagram from becoming too complex, I used Logisim’s “Subcircuits” ability to design each of...

    Read more »

  • Planning the Instruction Controller

    Anthony Faulise04/01/2025 at 04:00 0 comments

    Today, I’m going to focus on the CPU Instruction Controller. The Instruction Controller will orchestrate its subsystems to fetch an instruction, get any needed operands, perform the instruction itself, potentially store a result, and queue up the next instruction.

    Overview

    My initial vision is a sequencer that will:

    1. Load the PC to the Address Latch
    2. Ask the Bus Controller to fetch an instruction and store it in the Instruction Register
    3. Have the Addressing Mode Decoder examine the Instruction Register to see if we need an operand, or skip to 7.
    4. Load the MicroCode PC with the start address of the routine to fetch the first operand according to the Addressing Mode. The MicroCode Sequencer will execute until the first operand is stored in the appropriate ALU latch.
    5. Have the Addressing Mode Decoder examine the Instruction Register again to see if we need an additional operand, or skip to 7, leaving any computed or fetched indirect address in the Address Scratch register.
    6. Load the MicroCode PC with the start address of the routine to fetch the second operand according to the Addressing Mode. The MicroCode Sequencer will execute until the second operand is stored in the appropriate ALU latch. This step will potentially overwrite the computed or fetched indirect address in the Address Scratch register with a new address, which will serve as the destination of the result, if appropriate.
    7. Have the Addressing Mode Decoder examine the Instruction Register again and route the appropriate bits to the Instruction Decoder.
    8. The Instruction Decoder looks up the start address of the MicroCode routine for the subject instruction.
    9. The Controller turns control over to the MicroCode Sequencer, which executes the necessary instructions to perform the actual data manipulation. The MicroCode will store the result of any computation, possibly making use of the address in the Address Scratch register or the Register Address latch, before returning.
    10. When the MicroCode signals it is done, the Controller will then return to step 1 above.

    Since the MicroCode for the actual instruction may take several cycles, I thought I might try to pipeline operations a little and start the instruction fetch and Addressing Mode Decoder steps as soon as the Controller turns control over to the MicroCode Sequencer for the instruction itself. Whoever finishes first would have to wait.

    Hmm. It sounds nice, but as I type this I see a potential problem. If the result of the current instruction alters one of the operands of the subsequent instruction, that would be a problem. In theory, I could check to see if the addresses are the same and then pause the pre-fetch logic if they are, or even detect if the operand >changed<. Or it could be more trouble than it’s worth.  I’m voting for the latter.

    Maybe the Controller could at least pre-fetch the next instruction. Ah, but what if there’s a branch? I could pre-fetch the instruction as long as the current instruction isn’t a branch, jump, RTS, RTI, etc. OK, KISS for now.

    Allright, in any case, I see the Instruction Sequencer (IS) using a shift register to sequence the steps as:

    1. Route PC to AR, fetch instruction to IR, evaluate IR with AMD logic on the fly, preload the IS if needed to jump ahead to 3 or 4
    2. Load Operand 2 (src) to ALU-Left, evaluate IR with AMD logic on the fly
    3. Load Operand 1 (dest / src+dest) to ALU-RIGHT
      1. Start here for 1-operand instructions
    4. Lookup Instruction MC Start Address
      1. Start here for 0-operand instructions
    5. Start MC for instruction processing
      1. Instruction processing is responsible for writing result of any instruction to dest
      2. Instruction processing is responsible for incrementing PC
    6. When instruction MC is done, signal IS to jump back to 1

    Addressing Mode Decoder

    The Addressing Mode Decoder (AMD) unit needs to do a few computations:

    • Examine the Instruction Register and determine the number of operands, this decides when to turn control over to the instruction MicroCode and which ALU input latch each operand is routed...
    Read more »

  • Instruction Set and Addressing Modes

    Anthony Faulise03/21/2025 at 13:01 0 comments

    I think my first tasks are to define my instruction set and addressing modes, i.e.; the "Instruction Architecture".


    I’m going to steer clear of a full CISC instruction set. Forget polynomial evaluation, floating point, and single-instruction block transfers or string lookups. 


    After reviewing manuals for the Motorola 6800, Motorola 68020, Zilog Z-80, Digital PDP-11, and MIPS RISC designs, I ‘ve settled on these instructions as sufficient to do anything I would need:

    • LD (load)
    • ADD, ADC - add with and without carry
    • SUB, SBC - subtract with and without carry
    • CMP (compare)
    • AND
    • OR
    • NOT
    • XOR
    • NEG (negate)
    • INC
    • DEC
    • Rotate (right/left with/without carry)
      • RR, RRC
      • RL, RLC
    • Shift (right/left arithmetic/logical)
      • SRA (arithmetic new MSB = old MSB), SRL (logical new MSB = 0)
      • SL
      • SLB, SRB (shift left/right 1 byte)
    • BRA (relative jump on a variety of conditions, 1 word instruction)
    • JMP (absolute jump on variety of conditions)
    • JSR (jump to subroutine)
    • IM (set interrupt mask)
    • SETC (set CCR carry bit)
    • CLRC (clear CCR carry bit)
    • RTS (return from subroutine)
    • SWI (software interrupt)
    • RTI (return from interrupt)
    • NOP (no operation)


    I also feel my CPU needs to support these addressing modes:

    • Implicit (no operand)
    • Immediate (operand follows instruction in program memory)
    • Register
    • Indexed (address of operand is immediate address plus offset contained in a register)
    • Indirect (content of register is address of operand)
    • Indirect with pre-decrement of register (nice to have, but negotiable)
    • Indirect with post-increment of register (nice to have, but negotiable)
    • Doubly Indirect (content of register is the address of the address of the operand)


    From my research, there are some addressing modes the M68020 has (and I think the VAX too) that I thought I might like, such as "Address Register Indirect with Index" where the address of an operand is a register, plus a constant times a second register, plus a fixed displacement. This is really convenient for accessing arrays of objects, where the base register is the start of the array, the “constant” is the size of the object, the register that multiplies the constant is the index in the array, and the final displacement is the offset within the object of the member value you want to access.  I’ll have to live without it.


    I was unsure initially about how many registers I wanted to support. My experience with the M6800 left me feeling that two registers was not enough. The Z-80’s accumulator plus six 8-bit or three 16-bit registers seemed to be the bare minimum. I left the question open, but came to a conclusion from another angle.


    I didn’t want to deal with a combination of single-word and multi-word instructions, so I preferred that the instruction word should be able to contain the entire instruction, addressing mode, and source and destination register information. For immediate mode addressing, I was going to have to live with fetching the operand in the word after the instruction. I may be able to squeeze the (8-bit) relative offset of a branch instruction into the 16-bit instruction word if I’m creative. I’ll leave that for later.


    My first thought was to have a certain bit-field in the instruction word for the instruction itself, then other bit fields for the addressing mode and register ID of each operand. If I allowed 3-bits for addressing mode and 3-bits for register ID for each of two operands, that would occupy 2 x (3 + 3) = 12 bits of my 16-bit word. That left just 4 bits to encode my 22-plus instructions.


    At first, I saw my 8 registers melting away to two, but then I had an insight (OK, "insight" might just have been remembering it from someone else's architecture). Only a few instructions require two operands: LD, ADD, ADC, SUB, SBC, CMP, AND, OR, XOR. That’s 9. I can allow 4 bits to encode those, and if all 4 bits are 1s, say, that could indicate that the instruction was a one-operand instruction, and that the next 6-bits of the instruction word, no longer needed to encode the...

    Read more »

  • Inception

    Anthony Faulise03/19/2025 at 19:26 0 comments

    When I was 14, I built an Altair 680b computer from a kit. This was in the earliest days of personal computing. Earlier than that, even. There were instructions. There was a technical manual. There was a CPU manual. But it came with no software, no operating system, no applications. There was not even a hard disk. For the first six months I had the machine, there was no way to save a program; everything was lost the instant I turned the computer off.


    Generic Altair 680b (https://www.retrotechnology.com/restore/altair680.html)

    Later, I bought a 16K memory card (for $600 IIRC). It came with a text editor, an assembler program, and a debugger program. But, they were on paper tape, and I did not have a paper tape reader. I spent a lot of time looking at the paper tapes I had, longing for a tape reader or an assembler program I could use.


    Eventually, I purchased a cassette tape interface, so I could store programs. As it happened, the tape interface came with a 4K Basic interpreter, on cassette tape. So, I finally had a programming language I could use, even if it did take 10 minutes to boot up. For a while, I forgot about an assembler. I wrote programs in Basic.


    Years later, I decided to build a Z-80 based S-100 computer. I bought a CPU card, a 64K RAM card, a disk controller and two eight-inch floppy disk drives. But I never got the pieces to work together, and never wrote any code on that machine. By that time I was in college, writing code in C on a VAX-11/750. I was in heaven. I put the Z-80 parts in a box in my parents basement and forgot about them.


    Recently, forty years later, I found the box of Z-80 parts in my own attic. I thought about putting them back together. And I realized: I had no operating system, no interpreter or compiler, no software. What would I do with my Z-80?


    So, as I began assembling the parts to reconstitute the computer, I decided to write my own Z-80 assembler. From scratch. From first principles. Not copying anyone else’s design or code. 


    I studied the Z-80 CPU databook. I thought about what I had learned while earning a bachelors and then a master’s degree in computer science. (If I have time, I'll try to weave my development notes into a separate blog.) After I finished the assembler, and tested it on a Z-80 emulator (I haven’t finished the Z-80 hardware yet), I happened to re-read a book, The Soul of a New Machine.


    The Soul of a New Machine, by Tracy Kidder, chronicles a small team of engineers at Data General, a mini-computer manufacturer in Massachusetts, working in the late ‘70s to design the next generation 32-bit mini-computer. I’d read the book in college, probably in 1983 or 1984. The first time, it was inspiring, but like a dream. The second time, I wondered “Could I have worked on that team? Could I have done what they did?”


    Only one way to find out...


    Here's what I'm hoping to achieve:

    • Design and build a micro-coded 16-bit CPU that could easily be extended to 32-bits just by widening the data paths
    • Include enough support for full-fledged operation of a UNIX-like operating system, meaning
      • Virtual memory
      • Supervisor / privileged / user modes
      • Multi-tasking
      • Efficient context / process switching
      • Priority interrupts
      • DMA
      • Hard disk
    • Include a few fun features. At the moment, that means:
      • Memory cache to keep routine data / instruction fetches off the system bus so the DMA system can do things like file transfer and page swapping efficiently
    • As much as possible, stick to early-'80s technology like 74LS series TTL
      • No FPGAs
      • I do get to use an IDE/ATA disk (ok, that's a cheat)