Patterns for creating microcode-driven core in VHDL

A project log for TMS0800 FPGA implementation in VHDL

Inspired by and I set out to do similar in VHDL, for MicroNova Mercury board.

zpekiczpekic 09/05/2019 at 03:190 Comments

There are various well known and documented patterns and best practices to create FSM (finite state machine) designs that work well on FPGAs. However from what I have seen, not much in terms of how to have a simple, good microcode pattern. One could of course adapt the methodology (and even tooling) popular in the bit-slice era when microcoding was the most popular way to create custom processors and logic (which I have done too for the Am9080 project using Am2901 slices), but that approach is not very streamlined either. 

My first attempt was to write a separate microcode compiler in C# which would "spit out" file in a format that could directly be used to prime read-only memory (microcode or mapping ROM) in the VDHL. Starting work on that I realized that may be too heavy-weight for my needs, and also has the disadvantage of adding another proprietary tool to the toolchain and extra step in the journey towards the .bit file. 

Better alternative seemed to be to do this right in VHDL, "synthesizing" the contents of the microcode ROM as VHDL code is being compiled. Eventually, I settled on the approach described below, which I used in this project both for the calculator core and for the VGA tracer components.

That simpler solution turned out to be just a combination of few tricks in VHDL. Here is the pattern:

Most of this can be seen in a single file:

For example look at how alu function is defined:

-- 3 BITS 13..11
-- alias alu_fun: std_logic_vector(2 downto 0) is u_code(13 downto 11);
impure function uc_alu(alu_fun: in std_logic_vector(2 downto 0)) return std_logic_vector is
     return X"00000" & "000000000000000000" & alu_fun & "00000000000";
end uc_alu;

as you can see it just returns 3 bits in the right place in the microinstruction word defining the ALU function, and the rest of what it returns is all zeros:

-- ALU functions
constant fun_zero : std_logic_vector(2 downto 0) := "000";
constant fun_s : std_logic_vector(2 downto 0) := "001";
constant fun_r : std_logic_vector(2 downto 0) := "010";
constant fun_xor : std_logic_vector(2 downto 0) := "011";
constant fun_adchex :std_logic_vector(2 downto 0) := "100";
constant fun_adcbcd :std_logic_vector(2 downto 0) := "101";
constant fun_sbchex :std_logic_vector(2 downto 0) := "110";
constant fun_sbcbcd :std_logic_vector(2 downto 0) := "111";

Obviously, the actual ALU can now consume the same definition and implement the functionality accordingly:

with fun select
y <=   s when fun_s,
          r when fun_r,
         (s xor r) when fun_xor,
         sum0(3 downto 0) when fun_adchex,
         sum2(3 downto 0) when fun_adcbcd,
         dif0(3 downto 0) when fun_sbchex,
         dif2(3 downto 0) when fun_sbcbcd,
         "0000" when others;

given that the return is "zeros" for anything outside this field - meaning NOP for all other components driven by the microinstruction, it won't impact them. So one can simply "or" it together with any other similar functions ("helpers") to create a microinstruction to do drive other components as needed:

110 =>
     uc_ss(ss_off) or
     uc_sam(sam_update) or
     uc_alu(fun_sbcbcd) or
     uc_reg(bcd_fromalu) or

It should be pretty obvious to read the microinstruction above and figure out what it is trying to do - which is critical to minimize otherwise extremely error-prone microcoding (note: using "and" instead of "or" may be even more intuitive, but in that case NOP microinstruction should be defined as all "1111...." and all components driven by it should interpret all "111" field as do nothing)

The microinstruction controller is just another component driven by the microinstruction, and in my implementation it needs the code to select condition (4 bits = 16 conditions), and where to go when condition is true or false. With that, one can write convenient "high level language" branch statements:

     uc_ss(ss_off) or
     uc_alu(fun_s) or
     uc_if(cond_e11, upc_next, uc_label(CONTINUE)),

If cond_e11 is true, then continue with next microinstruction, otherwise jump to label "continue". The trick here is that while the destination for if and else are real locations in the microcode, some special values are "reserved" - value 0x00 does not jump to location 0, but is actually a "next" (again, remember that means NOP for the microinstruction controller):

-- special microcode "goto" codes (all others will be jump to that location)
constant upc_next:   std_logic_vector(7 downto 0) := X"00"; -- means we can't jump to location 0!
constant upc_return: std_logic_vector(7 downto 0) := X"01"; -- means we can't jump to location 1!
constant upc_repeat: std_logic_vector(7 downto 0) := X"FF"; -- means we can't jump to location 255!
constant upc_fork:   std_logic_vector(7 downto 0) := X"FE"; -- means we can't jump to location 254!

With this, any microinstruction can define not just the behavior of all the driven components, but also a rather powerful but simple if which can jump, return, fork or repeat based on condition (as set by the execution of previous microinstruction! - this is another common source of bugs). There is no implicit "call" - each jump saves the return address in one layer (1 deep stack), but this could be extended, it was sufficient for my calculator project. Remember, if "if(cond, then_destination, else_destination)" is missing, that means 0x00000 will be in the right location of the microcode meaning "if(true, upc_next, upc_next) - so simply continue. Now we just need to drive the microinstruction pointer register accordingly:

-- update microcode program counter

update_upc: process(clk, reset, u_next)
if (reset = '1') then
       -- start execution at location 0, microinstructions 0 - 127 can be shared by any instruction
      u_pc <= X"00";
      u_ra <= X"00";
       if (rising_edge(clk)) then
              case u_next is
                   -- if condition(0) = '1' then X"00000" (default) will cause simple u_pc advance
                   when upc_next =>
                         u_pc <= std_logic_vector(unsigned(u_pc) + 1);
                   -- used to repeat same microinstruction until condition turns true
                   when upc_repeat =>
                        u_pc <= u_pc;
                   -- start executing macroinstruction routine, which are mapped to 128 - 255
                   when upc_fork => 
                         if (instruction(6 downto 5) = "00") then
                             -- if the instruction is JUMP on condition reset, mask out the jump target
                             -- this way 32 microcode locations are freed up!
                             u_pc <= "10000000";
                             -- map 7 bit instruction directly to upper 128 words of microcode
                             u_pc <= '1' & instruction;
                       end if;
                -- return from "1 level subroutine"
                when upc_return => 
                      u_pc <= u_ra;
                -- any other value is a jump to that microinstruction location, save return address for "1 level stack"
               when others =>
                    u_pc <= u_next;
                   u_ra <= std_logic_vector(unsigned(u_pc) + 1);
              end case;
        end if;
end if;
end process; 

With this I was able to write pretty complex microcode in a standardized fashion. As part of the compile the generated microcode ROM is also output to a file so one can compare that output with the input and use that to spot bugs early on (a classic bug is to have overlapping microinstruction fields...):

procedure dump_microcode(out_file_name: in string; temp_mem: in rom256x52; depth: integer; base: integer) is
    file out_file : text; -- open write_mode is out_file_name;
    variable out_line : line;



-- alu_fun
write(out_line, decode8(temp_mem(i)(13 downto 11), "alu_y = 0; ", "alu_y = s(alu_sel); ", "alu_y = r(alu_sel); ", "alu_y = xor(alu_sel); ", "alu_y = adchex(alu_sel); ", "alu_y = adcbcd(alu_sel); ", "alu_y = sbchex(alu_sel); ", "alu_y = sbcbcd(alu_sel); "));  
-- alu_inp


end dump_microcode;

The intent here is to write code here that tries to "reverse engineer" already existing microcode store. In a way, this is a test driven development as it is advisable to first write this function right after defining the microinstruction format, and then as microcode is developed, after each compile watch its output, compare with the intent of the microcode written and if they differ that is 100% indication of a bug. 

A note about instruction mapping: 

When implementing microcode driven CPUs or controllers, a common problem is to "map" op-codes to the first location of microcode executing that instructions. This can be done in several ways:

- a separate "mapping" ROM is introduced - its depth equals the width of the opcode covered and the width equals the depth of the microcode ROM. So location 0x76 (op code for HLT for 8080/8085/Z80) may contain for example 0x3E3 which would be the location of first microinstruction is some microcode ROM that has at least 1024 locations.

- static logic that cleverly translates op codes into microinstruction entry points. Usually this is possible with CPUs with highly orthogonal and/or reduced instruction sets

- there is direct mapping. This approach was taken here - the TMS0800 never has more than 7 bits to define an instruction, so a microcode ROM of 256 locations is sufficient - the lower 128 locations to implement them, and the upper 128 to map directly to the first microinstruction to be executed. In other words, opcode 0x3E will start with microinstruction at location 0xBE etc.

In all cases above, there is a "fork" microinstruction field, which has the task to load the output coming from any of these methods into  the microinstruction pointer.