• ### ALU - Shift

The most complex part of our ALU is handling the various shift operations. We need to handle shifting left and right as well as the logical and arithmetic variations. So the design is constructed in the following modular way.

Our 32bit machine allows a shift distance of 0 to 31 places that means we have 5bits describing the shift distance. To implement this we need a single shifter for each of the 5bits. The first shifter looks at bit 0 and if set will output a value that is the input shifted one place left. The second shifter looks at bit 1 and shifts the output 2 places. Bit 3 causes a shift of 4 places, bit 4 shifts 8 places and finally the last shifter moves 16 places left. By flowing the output of the first shifter as the input to the second, and so forth, the final result is shifted the correct distance.

By default we shift left (to higher bit positions) but we also need to handle a right shift as well. We definitely want to avoid creating another 5 level shifter for handling right shifting. Instead, we reverse the input bits before and then again after the shifter.

The SRA and SRAI operations are arithmetic right shift operations, meaning the most significant bit is replicated to the right as the shift occurs. This is useful if you want to shift a negative number and retain it as still being negative afterwards. To support this we use some extra logic for deciding if zero or one is used when filling blank bits during shifting.

Finally we need to use a tristate buffer so that the entire output can be turned off when the ALU wants to ignore the shift output. Our final design consist of the following boards from top to bottom:-

• Reverse
• Shift 1
• Shift 2
• Shift 4
• Shift 8
• Shift 16
• Reverse
• Buffer

The final tower looks like this:-

To test the stack we need a 32bit input, 10 control lines and then be able to examine the 32bit output. Like this:-

Implementing a shift board needs 8 x 74HC241 and the reverse board also uses 8 x '241. We can also make use of the '241 for the buffer board. So I ended up with a total of 56 of them in the stack.

• ### ALU - Less

The third section of the ALU is the ability to compare two numbers and output either 0 or 1 depending on if the first value is less than the second. It has the extra complication of allowing comparison of signed and unsigned numbers.

Implementation is simple enough as it is just a cut down version of the full compare board I have already created elsewhere.  It needs two boards stacked in the usual vertical fashion using long female headers. It uses 4 x 76HC682 8bit comparators and then a little gate logic to determine the result. Unfortunately there is a just a little too much logic to fit on a single board.

• ### Immediate Values

Many of the RISC-V instructions encode immediate values within the instruction. Converting these encoded values into actual 32 bit values is dependent on the instruction type. There are 6 instruction formats but only 5 contain an immediate value.

Our design spans two boards and takes as input the full 32 bit instruction and control lines indicating the instruction type. (The type is discovered by the decoder elsewhere). The logic is simple enough but not so simple that I avoided a mistake the first time around. Hence the boards are marked as V1.1, as shown below.

• ### ALU - Logical Operations

RISC-V specifies three different logical operations, AND, OR and XOR.

Each operation needs 8 chips for the actual operation, such as using the 74HC08 for the logical AND of the 32bit A and B inputs. Plus, another 4 x 74HC241 for the output buffer. All ALU operations are output to the same bus and so each operation needs an output buffer that ensures only one result is output at a time. Unfortunately, 12 IC's is too many to squeeze onto the 100mm x 80mm boards that I use. So we are forced into splitting it over two boards, one that does the lower 16bits and the other does the upper 16 bits.

We end up with a stack of 6 boards where long female headers providing the A and B inputs and the output result traversing the entire stack. Control lines determine which operation, if any, is performed.

• ### ALU - Add and Subtract

The ALU for our CPU needs to perform several types of operation. The first of these is the ability to add or subtract two 32-bit numbers. To implement this I have built a stack of four boards that combine together to give the output.

Adding two numbers is simple, we just use 4 x 74283 and chain the carry output of one to the carry input of the next in sequence. This has been implemented as the 'AddSub Middle' board below.

Implementing the subtraction operation requires a bit of extra work as we need to apply 2's compliment to the second of the two inputs. 2's compliment is the method used to invert the sign of a value. Converting a positive number like 5 into the negative version -5.

This approach avoids the need to implement a separate subtraction circuit and instead allows reuse of the existing addition circuit. For example, 10 - 5 can be implemented as 10 + (-5).

2's compliment is very simple, you just invert all the bits and then add 1. The purpose of the top two boards is to invert all 32-bits of the second input when the control line indicates a subtraction is needed. Adding the extra 1 can be done in the middle layer by providing a initial carry in value to the first adder.

Finally, the bottom board is used to buffer the output, as only sometimes will the ALU want to perform an add/sub operation. All outputs from the different parts of the ALU are connected to the same output bus and so only the appropriate ALU section should put its output on the bus.

• ### Comparator Complete

I have been delayed recently by a bad batch of 16 pin long female headers. They arrived from China but were sub-standard, inserting a connector at the top caused some of the bottom pins to be pushed out. They simply didn't work. But a new batch has arrived that works and we are back in business again.

The 32bit comparator is split into a top and bottom board because there is not enough space to fit it all on a single 100mm x 80mm board. The top board begins by using 4 x 74682 chips to compare each of the 4 bytes from the first and second values.

Only if all four compares are equal is the 32bit compare also an equal. Discovering if the first value is less than the second is more complicated because there are several combinations that result in the less than result: -

• First byte is less than and the top three higher bytes are equal
• Second byte is less than and the top two bytes are equal
• Third byte is less than and the top byte is equal\
• Fourth byte is less than

Adding the logic for calculating the equal and less than outputs of the top board results in the following logic diagram: -

The bottom board takes the basic output of the top board and applies some extra logic. One of the control inputs is called NEG and is used to negate the output. For example, the branch if equal (BEQ) and branch if not equal (BNE) instructions only differ by setting this NEG control line. Another control line is called UNS and determines if the comparison should be signed or unsigned (BLTU, BGEU instructions). Finally a RES line is used to decide if a result should ever be output. This allows the output to be forced to zero. This is needed because the output of the board will be used as the control line input for a MUX elsewhere. We want to force the value to zero unless we are processing a branch instruction.

Here is the logic for the bottom board: -Testing shows that the logic is correct and the boards work as required. Although each board only worked on the third attempt at debugging the logic.

The comparator is going to be one of the slowest paths in the whole design. If you look at the worst path through both boards then it has 18 gates to get through. Yikes, good job I am not designing for speed!

• ### 32bit Comparison

My processor needs the ability to compare 2 x 32bit values and output a simple true or false. In addition to the values it has a few control lines that indicate the operation (=, <, >=), if the comparison should be inverted and if the comparison is between signed or unsigned values.

The design is split between a top board that does the basic comparison assuming the values are unsigned. Then the bottom boards take into account the negation and unsigned requirements. Doing a test on this bottom board shows it giving incorrect outputs for when the negation is applied. So I check the circuit design so I could start debugging. Well it didn't take long to find the issue. Here is the last stage of the circuit where it takes negation into account...

Well that is garbage. Instead of negating the output if the NEG input is true, it simply always outputs zero. So a quick fix solves the problem thus...

Amazingly the cost of boards from PCB Way has gotten even cheaper. Now just \$5 for ten boards, so the fix is a pretty cheap one to resolve.

• ### Instruction decoder

Instruction decoding is one of the simpler parts of the system. For example, all the branch instructions can be identified using the same 7 bits of the incoming instruction value:-

All branches have the same '1100011' pattern at positions [0-6]. So I only need a single decoder board that compares and matches this 7 bit sequence in order to generate the correct control lines for the rest of the CPU. The actual comparison of registers rs1 and rs2 is performed by a separate comparison board. This comparison board takes as input the 3 bits at positions [21-14] that uniquely identity the actual branch operation. So '000' will mean BEQ and '001' BNE and so forth.

This same approach works for other major groups as well. All the ALU operations that take an immediate value have the same '0010011' pattern. The ALU operations that take two registers have the '0110011' pattern. In both cases the ALU implementation will perform the correct function by using the function code that is part of the instruction format and passed directly to the ALU.

By grouping this way we only need a decoder board for each instruction group that we need to support. As I am only implementing 37 instructions it works out I will need 9 decoder boards.

A single board looks like this:-

At the bottom right is an 8-DIP switch that is used to match against the incoming first 7 bits. This means the left most switch on this DIP package is not connected and ignored, but it is easier to order and use 8-DIP packages than try to find ones that support exactly 7.

Above that are two more DIP packages that are used to define the 16 control output signals. I am assuming that my final design will not have more than 16 different MUX or other boards that need control lines!

By making the board completely configurable I have the flexibility to implement the rest of the CPU in multiple ways and to change the design if needed. Like my other boards these can be stacked vertically because I am using long female headers for the instruction input, control line outputs and the power connection. So I anticipate a final setup with 9 of these high.

I use a 74HC688 for the bit pattern comparison, 2 x 74HC241 output drivers for the 16 control lines and an 74HC04 to invert the output of the '688 for use with the output drivers.

• ### Let it fly!

It took a whole weekend but finally the register file is complete. I took a delivery of new boards last week and the first to be built is the register control board. This is used to drive the 16 individual registers. Once finished it looks quite impressive...

The control board takes three 4-bit values. The register to output on port A, the register to output on port B and the register to be written to. Our control board has three output headers that are 16 lines wide. So register zero takes bit 0 from each of the three headers as its input. Register one takes bit 1 from all three headers and so forth. This results in the monster wiring loom shown above. I had to make it all by hand using DuPont connectors, a crimper and some heat shrink tubing.

It would have gone much faster except for some errors that took a long time to notice. Each of the two register stacks had misconnected pins in the middle of the stacks. When connecting them together it can be tricky getting all the 100 long headers to match up. Well I hadn't noticed a couple that had missed on insertion and were not connected. Compounding this I found that one of the ribbon cables I was using to make a test connection had two of the wires transposed. Luckily it was a rainbow coloured cable so I eventually noticed that red and blue lines were in the opposite order at the other end of the cable. Otherwise I would still be scratching my head!