-
Blinking action - testing the hardware
02/22/2020 at 17:17 • 1 commentAbove you can see the an image of the hardware while it is operating. The red LEDs are part of the low threshold gates in the latches, while all other gates use green LEDs to set the threshold to ~2.3V.
A size comparison with a TTL ICs. The gate density on the PCB in NAND equivalants is slightly lower than what would be achieved with 7400 NAND gates. Something to be addressed for later iterations...
The scope image above shows the A0 and A2 output of the address unit with SEL_DATA=0 and SEL_PC=1. In this configuration, the PC is increased with every clock cycle. Here, a clock of 2 MHz clock was applied. Operation was also successful at 4 MHz. I was not able to test any higher due to limitations on the generator side. Obviosuly this would reduce when more 4 bit slices of the adress path are combined, because the length of the carry chain would increase. One can see from the signal shape that the operating speed is very much limited by the rising edge. Right now, the collector current is limited to 2mA. Increasing the collector current further would allow even higher clock rates at the expense of power. Introducing push-pull output drivers may also be an option.
Finally some blinking action, clocked at 2 Hz! Higher resolution here. The red LEDs on the left side of the PCB indicate the state of the address lines and are not part of any gate. You can see how the PC is counting up.
Btw, only after looking at this animation, I realized that the fast (red) gates actually reflect the output state of the latch, even though they are not connected to the output. So, next time I can omit additional indicator LEDs.
-
The data-datapath design
03/04/2020 at 06:25 • 0 commentsTime for another update. Mind you, at this point I am not documenting work that was already done in the past, but am following my actual development in real time. Therefore updates will be spaced much further apart, depending on how much time I can devote top this project.
After finishing the adress-datapath design, it is time for the data-datapath.
The data-dapath encompasses the left side of the system design. The original M-CPU ALU was rather simple and only supported the two operations that were needed directly by the ISA: (Y,Cout)=A + B, Y=A NOR B.
However, I changed the datapath a little to allow use of latches in the design. As a consequence, all reads are directed through the ALU. Storing the accumulator to memory also requires data to pass throught the ALU. Therefore, in addition to the operations above, also Y=A and Y=B have to be supported. When looking at the detailed logic, this adds quite some overhead.
I lot of deliberation followed, also considering to merge accu+data latch into one edge triggered register as done in the original MCPU. The problem is, in that case an additional data input latch has to be added to the address-datapath. In the end it turned out to be more part-count efficient to stay with the latch based architecture.
Two different designs were evaluated in detail.
A first implementation option for a single bit of the data-datapath is shown above. This design implements a full adder consisting out of two XNOR2 gates, an inverter and an AOI2 gate. Y=A/Y=B/Y=A OR B is realized with an AOI3 based multiplexer in the first stage. The carry line can be fixed to zero or one by using the Sinv and Sadd control inputs. This allows configuring the second XNOR gate to invert the result, resulting in Y=A NOR B.
The table above shows the individual component usage. The total part count for this option is 80 per bit.
The second option I looked into is a bit more universial and uses an AND-OR-INVERTER based multiplexer for the first stage of the ALU. (EDIT: 20/03/28, updated to fixed version.) By using the four control signals, it is possible to generate any boolean functions, offering the possibility to extend the instruction set of the CPU at a later time. This design is, of course, heavily inspried from the MT-15 ALU. Since the first stage is now also able to implement NOR, it was possible to remove the Sinv control signal. Instead only Sadd is present, which will inhibit carry generation if set to 0.
Component requirements are shown in the table above. The total number of components is 82, only two more than the less versalite first option. Therefore I elected to go with this option, since it allows more flexibility.
The design functionality was tested in the test bench above.
The image above shows carry propagation through all stages. The carry is low active. The carry propagation delay is approximately 11 ns per stage (45 ns total for 4 stages). Although this will propably increase in the real circuit due to more parasitics, it is a very reasonable result and suggests that an acceptably high clockspeed can be achieved even without resorting to a more elaborate carry chain architecture.
-
ALU board PCB Design
03/18/2020 at 19:22 • 0 commentsThe ALU board is quite a bit more complex than the datapath board. To still be able to meet a <10x10cm² footprint, the layout had to be densified compared to the adress-datapath board.
The circuit of one slice of the ALU board is shown above. The number if dual-diodes is quite high in this design due to wide-input AOI gates used as multiplexers for the boolean units. The XNOR gate on the right side is based on a cross coupled transistor pair. No LED could be used here due to stacking of several PN junctions in the current path.
Eagle was used for the layout, making extensive use of the block functionality. Since only a two layer PCB is used, a good strategy is needed to make the layout as tight as possible.
As a general rule, I used the front side for local interconnects and the power rails. Each slice was layouted with a fixed height of 0.65", which allows placing two gates between the power rails. Copper fill on both front and rear side is used for the ground plane. The rear side is only used for global routing (I/O and control signals).
After doing some estimations regarding copper line resistivity I noticed that it was possible to go to much smaller line widths.
The image above shows a comparison of the old and new polarity hold latch layout. The new layout is much more dense and takes 30% less PCB space. One interesting observation was that going from 0603 to 0402 resistors did increase the layout area, since it was not possible to pass connections below the components anymore. Therfore I stayed with 0603 passives. Using smaller transistor packages was not an option either, since I was not able to source the transitor in a smaller package.
The full layout, excluding ground plane, is shown above. Again, it was possible to fit four bit-slices on one board. In contrast to the previous design, I decided to include buffers for the control signals, located at the top. The fan out of the clock signal on this board is 8 gates. Combining several boards requires a clock tree for proper drive-ability.
The total PCB size is 10x7 cm², only marginally larger than the adress-board. Due to the more dense layout it contains 379 devices (75xT, 95xD, 67xLED, 122xR, 20xC) , 41% more than the datapath board.Board render is shown above. The populated boards should arrive in a few weeks.
-
Testing the ALU Board
03/28/2020 at 17:53 • 1 commentThanks to express shipping, the populated PCBs arrived less then a week after ordering them. A top down photo of one ALU board with 4 bitslices is shown below.
The dual-input diodes for the boolean unit multiplexer line up neatly in one row.
I tested the board using an ATMega168 to generate stimulus signals. Unfortunately, I found a small mistake I made earlier in the design: The generate term in the carry chain should not have an additional carry input, see below. Luckily it was easily possible to fix this by removing one diode per slice from the board. You can see the unpopulated pads in the center of the top down image.
Removing this signal would have allowed to densify the layout significantly and also add the carry-chain inverter to the output, so the carry signal is non-inverting. But well, next time...
The list below shows the control signal configurations that were tested. Only the first three cases apply to operations that are used by the MCPU ISA. In addition, Y=B is needed for address loading.
The other operations would allow for a later extension of the instruction set.Clockspeed was tested up to 2 MHz. Even for the worst case (ADD with carry in=1), the signal integrity looked excellent, suggesting that also 4 MHz and higher may pass. One limitation seems to be in the pull-up capability of the clock driver with a fan-out of 8. It may be a good idea to reduce the collector resistor on high fan-out drivers or to design a push-pull driver.
Finally, the ALU board in action, repeating a sequence of ADD A,3 and NEG A:
-
Designing the Control Unit
04/27/2020 at 20:30 • 0 commentsDesigning the control unit is one of the more complex tasks in MCU design. What used to be a few lines of VHDL turns out to be a tedious task when boiling it down to discrete gates. Or maybe I need a better design flow...
Lets look at the problem at hand. The final LCPU will consist of an 8 Bit datapath and an 8 Bit adresspath. The upper two bits of the addresspath will be used for the opcode. Since both the data and the ALU board are 4 bits wide, two of each ports have to be used.
To complete the CPU, we need a control unit that outpus the right control signals in depends of opcode and state machine cycle.
The original MCPU has a rather simple control unit that consists of a state machine that is directly mapped to opcodes. Decoding of the states is localized at the datapath. Since the LCPU uses a bitslice approach, it is more efficient to also include decoding of states into control signal in a centralized control unit. Generally, that is a design choice that allows for extending of the design and easier bugfixing. In terms of robustness it would be better to do localized state decoding to reduce the number of control signals.
There are 8 ALU board control inputs (Carry In, CLK_ACCU, CLK_DAT, Aluctrl[4:0]) and one output (Carry Out). The address board has 4 control inputs (SEL_PC, SEL_DATA, nCLK_PC, nCLK_ADR) and two outputs (OPC1,OPC0 -> A[7:6].
The control board itself has a few internal registes for carry and states and inputs for a two phase clock (phi1,phi2) and reset.
As a first step, I mapped all control signals to states and clock sub-cycles. The LCPU is still able to execute every instruction in a maximum of two clockcycles, despite switching from edge-triggered FF to latches.ALU control signal encoding in dependence of Opcode and state.
Many of the simpler control signal encodings could be directly mapped to single gates. In additional you can find three latches for Carry signal and state enconding here, in what could best be described as "sea of gates". Very tedious to layout on the PCB, but straightforward in implementation by using the building blocks I designed earlier.
Of much more interest is the AND-OR-INVERT logic array that is used to decode the more complex signals, specifically the opcode into ALU control signals. You can see the circuit below.
Instead of using double diodes I switched to single diodes to allow layout in a regular matrix. Input signals are routed vertically on the rear side, output terms vertically on the front side. Typically both the inverted and non-inverted inputs signals are provided by means of the inverters on the left side. Each column represents a multiple input AND-gate (Minterm). By connecting the diodes not at all or to the positive or negative input signal it is possible to crate any convolution of AND-operation. The opon collecter inverters at the bottom perform a "NOR" operation on any number of input terms. By combining these, it is possible to implement any boolean operation with only minimal changes in the layout. I decided to populate also unused diodes in the matrix to allow for later fixability.
The full layout of the logic array is shown above. As you can see, the schematic represents the arrangement of the layout quite well. The density is quite good, the full layout is just 45x33mm². Even though the array itself does not use LEDs, the input and output inverters are true LTL gates, so that blinkyness is garuanteed!
For the sake of completness, the full schematics for the board level implementation are shown above.
You can see the full board with a few annotations above. (This feels like annotating an IC...). Total size is 78x50mm². Off to manufacturing... -
Testing the Control Unit
05/31/2020 at 22:43 • 2 commentsThe assembled control unit already arrived a while ago, but I only got around testing it today.
You can see a photograph of the PCB above. Unfortunately I already noticed a stupid mistake at this level: I forgot to route the reset signal to the I/O header. Since this was a circuit level mistake, no DRC caught this. Well, luckily that's easily fixed with an additional wire directly to the reset-driver.
Due to the construction of the logic array, there is a LED for every row (input) and column (minterm/output), creating a very nice regular structure. Every mintern has an output inverter and uses wired-AND as a combiner. See previous log on control unit design.
Verifying the control unit is not that easy due to the high number of outputs with irregular behavior in combination with several internal states. I used an ATMega168 microcontroller to exercise every possible sequence and record the outputs.
Each of the four instructions is tested individually, once for carry high and once for carry low. The state of the carry flag only plays a role for the branch instruction. Each instruction test sequence consists of three cycles: In the first cycle reset is high to reset the state registers, then the instruction is clocked for two cycles to exercise both possible instruction states. (I also tested longer sequences to make sure the state machine jumps back to S0)
The full sequences are listed below. The output was verified in the true-and-tested "visual inspection of output waveforms" method, that is still completely ok for designs of this size.
There is a slight misbehvaior of nCLKaddr during the first test sequence, but this is inconsequential and seems to be related to an issue with the test setup. Apart from that, everything was tested to work fine.You can see the control unit in action above. Due to it's irregular pattern, this truely seems to be "peak blinkenlight".
The test of the control unit concludes the design of all major functional units of the LCPU: ALU, Adress path and control unit.
Now on to a slightly more annoying part: Designing the backplane board. I still have no idea how to do it in a way that does not obscure all the nice LEDs. Apart from that, the backplane needs to hold the memory and I need to find a way to integrate bootloading and output in an efficient manner.