20 hours ago •
The TOM-1 can add two numbers together to get 0x10 (seen in the lower left red LED strip). The 16-pin grey ribbon cable connects to the ALU on the right (breadboards #7 and #8). I'm debating the best strategy to wire circuits together, going back and forth between ribbon cables and repositioning chips to allow better direct wiring.
07/13/2020 at 04:54 •
When I started this project I wasn’t sure if I were going to attempt to breadboard this design, but now I’m grateful I tried. It looks like this will span several breadboards so I’ve connected two sets of four together, which should be enough? As a way of debugging the circuit immediately without fussing with EPROM programming and counter logic, I am using an Arduino-like from Adafruit to drive the ROM bus and perform branches. I also put together a step circuit using a 555 where each clock cycle can be controlled by a button press.
The biggest challenge seems to be how to wire up a 16-bit bus, which consumes a lot of wires and makes wiring really tedious. I’m trying to split all logic into 8-bit slices, so the high and low bytes of the CPU bus can be handled separately. On the right module, beardboards 1 and 3 correspond to the high byte, while 2 and 4 control the low byte. The 16-pin IDC interface is the TOS bus, which loops back to the input of the Top it Stack register on the left module. Hopefully this breakdown makes it easy to keep wires under control when I start putting together the bus.
My biggest blocker: I didn’t learn about the distinction between the 373 (silly pin order) and 573 (left to right pin order) before trying to use these chips. Now I’m stuck waiting for them to come in, since the alternative is cutting a lot more custom wire lengths!
07/09/2020 at 04:42 •
A short update: I figured out that Digital supports a fairly flexible Graphics integration called "Graphics RAM" that is a great way to test out CPU loops and timing, and also make demos more interesting. In the interest of keeping chip count small, I wanted to see if the TOM-1 could support just a single bidirectional transceiver (74LS245) for all of its I/O. I tested this out with Graphics RAM by first writing a byte for the address (into an 8x8 grid) and a second byte for the graphics value. With some additional circuitry, the TOM-1 can draw 256 color values onto a 8x8 Graphics display. The open question is whether it can do so using SPI to draw onto a command-based display, like Adafruit's SPI TFT display.
07/08/2020 at 00:54 •
I finally have enough parts to start breadboarding! I bought a 3742-Contact Point Elenco Breadboard which houses four regular sized breadboards. I think I'll wind up needing two of them. Here's an animation of me testing out the 74LS283 adder chip, with inputs A1 and B1 being controlled by the buttons and Σ1 and Σ2 being wired to the LEDs. I've never done a massive breadboarding project like this before, so I anticipate being pretty slow at it.
07/03/2020 at 19:24 •
06/24/2020 at 05:42 •
I needed a better way to test the branching logic of the CPU and decided to come up with a fairly basic "assembler". Taking inspiration from Forth, we can invent a very simple programming language from scratch taking a string, splitting all the whitespace into a set of tokens, and then converting each token into a 4-byte instruction in the code binary that the circuit uses to operate. With just a little compiler magic we can throw in some convenience features like named labels and mixed decimal and hex (0x-prefixed) numbers.
Here's an example of a test that verifies subtraction:
from tom1 import * labels = generate(""" [start] 0x8372 0x35aa -1 ~& 1 + + [check_subtraction] 0 branch0 start """) debug() step_until(pc=labels['check_subtraction']) validate(tos=0x4dc8) print("success")
This language isn't actually Forth but something much less powerful. Here's the module "tom1" imported at the top of the file: tom1.py
So in the above script, call to generate("""..."""") will take a TOM-1 "script" as its first argument and actually compile it, generating a .hex binary file that can be used by the CPU in the simulator. This function generate() also returns a value labels that can be used to actually run the test. The methods debug() and the rest that follow generate() are actually Python code to interactively run the CPU, to step the CPU until we hit specific labels, and then validate if all the CPU values are correct. If the script doesn't throw during a call to validate(), then the test succeeded.
Let's look at the at the first and last lines of the script inside the triple quotation marks ("""):
""" [start] ... 0 branch0 start """
[start] declares a label that we can conditionally jump to. When the compiler sees it, it will treat the token start everywhere else in the script as a reference to the location in the program where you wrote [start].
The token 0 means the CPU will push the number 0 to the stack with its "push_literal" opcode. The next token branch0 is implemented on the CPU as a "jump_if_0" opcode that rewrites the Program Counter with a new address if and only if the value on the top of the stack is equal to 0. So 0 branch0 is a two-opcode way to "always branch". All the tests end with a jump to the start of the program since there is no way to stop the CPU...
The middle line in the script is the actual test:
0x8372 0x35aa -1 ~& 1 + + [check_subtraction]
This is actually an executable program that subtracts two numbers! The CPU needs a lot of assistance from a compiler to abstract this away, because the CPU can do very little. For now I'm forced to write this out by hand. Let's break this down in steps:
0x8372 0x35aa=> We push these two values onto the stack. The top of the stack is now 0x35aa.
-1 ~&=> We push the value -1 and then NAND the top two values on the stack. The token for the NAND opcode is arbitrarily ~& so it looks similar to the other arithmetic function +. After this opcode, the top of the stack will be equal to "0x35aa nand 0xFFFF", which is 0xca55. This is actually a bit hack that inverts 0x35aa without needing a dedicated "invert" opcode.
1 +=> We need to add 1 to the inverted number. There's a difference between inverting a number and turning it negative, and that difference is two's complement.
+=> We add the top two values on the stack, which are now the first number we pushed (0x8372) and the two's complement of the second (0x35aa). Adding them together performs subtraction, and we get our one result on the stack: 0x4dc8.
And there you have it, arithmetic subtraction on the TOM-1 implemented in a blazing, uh, 14 clock cycles. The final token in the script [check_subtraction] is used for testing but also is a helpful comment, since this code is very hard to follow. Eventually I'll have to stop working on an assembler and start working on an actual compiler, so
-1 ~& 1 + +can be tucked away inside a function.
06/23/2020 at 00:43 •
When coming up with requirements for TOM-1, I knew that the opcode space would be limited enough that the system would probably not require microcode, since I had seen Forth implementations which did not require many instructions. This meant I didn't tackle the question of how opcodes actually would be decoded until later. This organically built up into a collection of signals I could embed ad-hoc in a binary generated by Python:
def push_literal(arg): writeh( TOS_BUS_ROM | CCK | DR_UnD | STACK_W_nR, TOS_BUS_ROM, arg )
This function would write out the TOM-1 "push literal" opcode, by writing the CPU flags for two clock cycles (a full opcode) and a 16-bit operand. Take a look at the TOM-1 system diagram, or read this explanation: On the first clock cycle, we ask for the ROM to be on the TOS (top of stack) bus, for the D register to increment, and for us to write the value of TOS to the data stack. On the second clock cycle, we just leave ROM on the TOS bus. At the end of the second cycle TOS latches the 16-bit new value from ROM.
Having written this out, it's clear that describing these flags isn't helpful in forming an "intuition" of what is happening in the CPU with each opcode. After struggling with different 2:4 and 3:8 decoder strategies, I actually came across another microcode-less design which seemed to share some design goals, as well as having an 8-bit opcode width. Here is the opcode format used by the #Microcode-less TTL CPU :
Opcode format: 7 6 5 4 3 2 1 0 +--------+--------+--------+--------+--------+--------+--------+--------+ |Carry fb| On Zero| Src_0 | Src_1 | Src_2 | Dst_0 | Dst_1 | Dst_2 | +--------+--------+--------+--------+--------+--------+--------+--------+
The instruction decoder is composed of two demultiplexer ICs (2 x 74HC138) and is driven by the instruction register. A CPU instruction word is 8 bits wide, 3 bits select the data source and 3 bits select the destination. Each demultiplexer ICs apply the control signals to the selected destination/source at the execute phase (3. phase). The instruction decoders (2x74HC138) use up 6 bits of an instruction word, I used the remaining two bits for instruction modifications [carry and zero].
Just from looking at this encoding, we can infer a few things about the CPU:
- It can only move a value from a "src" register to a "dst" register each clock cycle
- There are up to eight "src" registers and "dst" registers, and may be discrete sets
- Conditional logic is done via the "on zero" flag
This is confirmed in the project's description which reads "Eventually each CPU instruction is a hardwired MOVE instruction, the instruction code itself determines the component that will be the data source (e.g. accumulator, input port, RAM, program memory, etc...) and the data destination (accumulator, adder, inverter, output port, program counter, etc...)." The circuit is an example of transport-triggered architecture. From taking a look at the schematics, "src" registers appear to include the accumulator and program memory, and destinations include the program counter, accumulator, adder, and inverter. Other chip enable lines appear to be cycle-dependent.
TOM-1 Opcode Design
We can look back at the TOM-1 system diagram. TOM-1 design and draw some parallels. While I wouldn't describe the architecture as TTA, it does have three "concerns" each clock cycle:
- Selecting a stack register (D or R) and incrementing or decrementing it
- Computing a new value to be loaded into TOS (the accumulator)
- Performing a stack or ram load or write
With this in mind, here are the first 4 bits of the TOM-1 opcode which will probably not change:
- DR_SEL — LO selects the D register, HI selects the R register.
- DR_UP — LO decrements, HI increments. This flag does nothing if DR_CCK is LO.
- DR_CCK - HI enables clocking (incrementing/decrementing the register) and LO disables it.
- TOS_DISABLE — When LO, TOS output is enabled. This means that TOS will be available on the BUS as an ADD and NAND operand, as the RAM address input, and an input to a buffer connected to the stack bus. When HI, TOS is disabled.
These signals cover concern #1 and part of concern #2.
After reworking the rest of the signals, I came up with the following design, build around using dual 2:4 decoders. The next two bits select from one of four behaviors using the stack bus:
- Stack outputs value to stack bus (as an operand)
- TOS writes to stack
- Stack writes value to to RAM
- RAM writes to stack
The last two bits select from one of four behaviors on the TOS bus:
- Enable buffer from ROM to TOS bus
- NAND result is on bus
- ADD result is on bus
- JumpIf0 is performed
This covers the rest of the CPU's concerns. Though it doesn't impact the TOS bus, including our conditional Jump flag here makes sense–we perform a jump if the value in TOS is 0!
Putting this all together as a graph:
Opcode format for the TOM-1: 7 6 5 4 3 2 1 0 +--------+--------+--------+--------+--------+--------+--------+--------+ | DR_SEL | DR_UP | DR_CCK | TOS_EN |STACKb_1|STACKb_2| TOSb_1 | TOSb_2 | +--------+--------+--------+--------+--------+--------+--------+--------+
A new opcode is read clock cycle, and it cleanly describes the behavior for that cycle of a) both of the stack address counters, b) whether TOS is an operand in ALU calculations, c) the behavior of the stack bus, and d) the behavior of the TOS bus, which is written to TOS on even cycles. I have to go back and clean up the old signals in my Python code, but this should make it easier to know for each opcode a) which part of the circuit is affected by each flag and b) for what reason.
Comparing this to the more compact #Microcode-less TTL CPU, here are my takeaways:
- Both small-ALU designs only required small decoders. It looks like the Microcode-less TTL CPU does not need two 3:8 demultiplexers and could also settle for 2:4, but the schematics may be out of date.
- Because the TOM-1 is a stack machine, it has to dedicate a lot more control lines to register management. Aside from the program counter, the Microcode-less design has one register (the accumulator) but TOM-1 has three (TOS, D, and R).
- Both designs use a decoder to select ALU instructions and another decoder to facilitate register transfers.
- The Microcode-less TTL CPU has more room to expand opcodes, but the TOM-1 does not.
A note about expandability
On this last point: once I felt satisfied with the basic TOM-1 design, I went back to investigate how to add I/O to the system. It's fairly trivial to hook up a latch or a bidirectional latch to the TOS bus, and at first I thought about implementing a single 3:8 decoder to select all these different operations: ADD, NAND, IN, OUT, etc. But! Because this is a stack machine, there are some harsh tradeoffs to this design. If you are trying to stream data fast, it must come through TOS by popping off the stack. But the stack is limited to 256 bytes, which is partly consumed by program code already. Whereas RAM is plentiful if the CY62256 is used (32K values).
After rethinking the decoder design, I considered how a 2:4 decoder would work, and realized that the arbitrary TTL gates and ad-hoc wires I'd been using (e.g. "STACK_W_nR") might be more intuitive if they were just a decoding. However, 2:4 is a small number of options to decode to and it still limits what can be done with the RAM device connected to the stack bus. Taking the above encoding and adding onto that an "input" or "output" operation using the available bits seems unlikely. But! Every RAM write and read shares a common input value for its operation: its address. And the maximum address of the CY62256 is only 15 bits. So A16 might become the ninth decoder bit I need. 😂 Similarly, we might be able to implement HALT by jumping to an odd address.
06/20/2020 at 06:06 •
Today is the first day I got unit tests working for the CPU in Digital, so I think I'm finally confident enough to share a status update. Although it's been really fun to play around with the Digital toolkit it's always nervewracking to get a circuit working, and then be too afraid to modify it. You might start building other signals around the signals you don't understand. Soon enough you're scribbling arcane boolean nonsense into your diary just to keep track of any signal at all:
June 17: "some comments on pin inputs" D_RCK is pulse D_CCK is OR(AND(OR(p4, RAM), clk, pulse) and(~clk, pulse)) R_U/~D is ~D_U/~D
Surprisingly, none of this was correct in the end! Unit tests can solve this problem of letting you refactor ugly circuits while confirming the result of the circuit is correct. Digital's internal "test" component isn't the right model for a CPU-scale simulation, but instead, they provide an the emulated CPU that in its source tree that shows how to properly build a remote test harness. The Digital example processor has an accompanying Assembler that has an example of a TCP client that can control a running instance of Digital, and step through each clock cycle or whenever a BREAK occurs to walk through a circuit.
The only thing missing from this protocol in order to run unit tests is a way to tap into Digital's "measurement" system, which allows you to display signals in real-time as the circuit runs (cool!) and track them in a dedicated "Measurement" pane. I pitched a "measure" command for the TCP connection and added a command like this for a fork on my own Github.
I am experimenting with writing Python code to generate .hex files (used to load the program ROM) and also to test the circuit. Here is a test file "t01.py" that writes out a temporary binary, then loads it via the TCP interface "debug()", then asserts values of specific signals at a given clock cycle:
from tools import * start() push_literal(0xcafe) push_literal(0x0010) store() push_literal(0x0010) load() drop() push_literal(0x0000) jump_if_0(0x0000) push_literal(0x0001) jump_if_0(0x0000) debug() step_until(PC=0x3) validate(tos_bus=0xCAFE) step_until(PC=0xc) validate(TOS=0xCAFE) print('success')
By running t01.py and several other tests, it's quick to validate whether a circuit change makes a consequential impact on the circuit. One example is that this makes it easy to refactor excess gate logic in a 74xx-based simulation and confirm that with fewer spare gates, your circuit operation isn't impacted. Reducing the total number of spare NOT and AND gates in signal logic has been easy to do with this setup in place.