With the circuit finished I started on the microcode tonight. The first step was to spell out each instruction as a series of microinstructions (in pseudocode). For example:
reg(rs1) -> alu_a reg(rs2) -> alu_b alu_o -> reg(rsd); alu_op(and)
is the AND instruction. That reads as
- Transfer the value of the register specified in the rs1 portion of the instruction to the A input of the ALU
- Do the same with rs2 and the ALU's B input
- Set the ALU operation to AND and transfer the output of the ALU to the register rsd points at
This isn't very interesting for anyone who has done any CPU design, but I'd like to do a series of posts detailing how everything works at that level of detail. Hopefully this project will be simple enough for use when explaining things to a newbie (as long as they don't look too long at the corners I'm cutting here and there).
Now that the microcode is done we can see how many cycles each instruction will take. An instruction fetch will take 6 cycles (ouch!), and the most expensive instruction (LW) takes 18 cycles including the fetch. I will give more details when I go over the memory unit, but the biggest reason that things take so long is the fact that we're doing our reads one byte at a time. Much slower than having a 32-bit external data bus, but it makes loading misaligned values much easier.
What was the mistake that the title mentions? The LH and LB instructions sign-extend the value loaded from memory and I completely overlooked that when I did the memory unit. Luckily we had some free spots in the ALU for an extend halfword and extend byte operation. That will work well enough for now, but if I ever go to actually build risk-vee in hardware I will look at doing that in the memory unit itself.
Up next is writing a program to generate the microcode ROM.