Close
0%
0%

TTL Operation Module (TOM-1)

A 16-bit TTL CPU and stack machine built out of 74xx chips.

Similar projects worth following
TOM-1 stands for TTL Operation Module. It is a 16-bit TTL-based CPU built around 74xx chips. It has a "stack machine" memory model similar to Forth, with separate RAM, data, and return stacks. I started this project in May 2020 after learning about and being inspired by the Gigatron and nybbleForth projects. TOM-1 is dedicated to my dad and the hobbies we shared.

TOM-1 is a TTL-based processor with a small instruction set. The idea came from reading about projects like Gigatron and Ben Eater's 8-bit video series, as well as Verilog CPU implementations like nybbleForth. The design goal is to minimize chip count while still offering a 16-bit data bus and being useful for computation.

Status: I'm currently modeling the CPU in Digital and reworking an earlier schematic attempt done in Kicad.

Overview

  • 16-bit data bus and address space
  • Stack machine with separate data and return stacks
  • Supports two 16-bit ALU operations (NAND and ADD)
  • Not microcoded
  • Harvard architecture (ROM and RAM do not share address space)
  • All logic is built using 74xx chips

Here is a system diagram of the TOM-1 (click for full size):

Addressing

TOM-1 performs one 4-byte operation every two clock cycles. This includes two control bytes and a 16-bit operand, though we interleave how these values are written. Because the final design will use a 27C1024, which has a 16-bit data bus, we dedicate the low byte to a 16-bit operand and the high byte to processor control flags:

PUSH_LITERAL(0x4299) encoded in ROM:
   0000:         1c42
   0001:         0c99
   0002:         ....

In this binary, 0x1c and 0x0c control on each clock cycle how values are loaded into the top of the stack and how values are pushed onto the data stack. As for the lower byte, the circuit uses a latch on every other clock cycle so it can can access both bytes of this two-byte operand at once (0x4299). The top of stack register (TOS) is written at the end of the opcode when all 16-bit values are available.

Opcodes

This is a non-exhaustive list of opcodes supported by the TOM-1:

  • halt
  • no_op
  • push_literal — pushes a 16-bit operand to the top of stack
  • branch0 — if the top of stack is equal to 0, jump to a 16-bit address (operand)
  • add — pops values A and B from the data stack, pushes the value A + B to the top of stack
  • nand — pops values A and B from the data stack, pushes the value A nand B to the top of stack
  • load — pops an address from the stack, pushes the value of that address in RAM to the top of stack
  • store — pops an address, then a value, from the stack and sets that address in RAM to the value
  • return_push — pushes the value at the top of the stack onto the return stack
  • return_pop — pops a value off the return stack and pushes it to the top of stack
  • dup — duplicates the value a the top of stack
  • drop — drops the top of the stack
  • in — read a value from the 8-bit I/O port
  • out — output a value on the 8-bit I/O port

I/O Port

The I/O port on the TOM-1 is a bidirectional transceiver that can read and write to any address in RAM. This port has the following pins:

  • Power
  • GND
  • 8 bits of data (low byte of 2-byte value in RAM)
  • CLK
  • Data Valid

Because the bus can only send or receive 8 bits each clock cycle, and the CPU does not support interrupts, this port is best suited for CPU-driven protocols like parallel clocked output or SPI. The port is also not latched, so external circuitry must respect the "Data Valid" bit, and latch on the rising edge of CLK while this bit is high.

  • 1 × 27C1024 32K x 16-bit EPROM
  • 2 × CY62256 32K x 8-bit SRAM
  • 2 × CY6116 2K x 8-bit SRAM
  • 4 × 74LS573 Electronic Components / Misc. Electronic Components
  • 2 × 74LS574 Electronic Components / Misc. Electronic Components

View all 14 components

  • Breadboarding (take 1)

    Tim Ryan2 days ago 0 comments


    Current breadboard layout.

    When I started this project I wasn’t sure if I were going to attempt to breadboard this design, but now I’m grateful I tried. It looks like this will span several breadboards so I’ve connected two sets of four together, which should be enough? As a way of debugging the circuit immediately without fussing with EPROM programming and counter logic, I am using an Arduino-like from Adafruit to drive the ROM bus and perform branches. I also put together a step circuit using a 555 where each clock cycle can be controlled by a button press.

    The biggest challenge seems to be how to wire up a 16-bit bus, which consumes a lot of wires and makes wiring really tedious. I’m trying to split all logic into 8-bit slices, so the high and low bytes of the CPU bus can be handled separately. On the right module, beardboards 1 and 3 correspond to the high byte, while 2 and 4 control the low byte. The 16-pin IDC interface is the TOS bus, which loops back to the input of the Top it Stack register on the left module. Hopefully this breakdown makes it easy to keep wires under control when I start putting together the bus.

    My biggest blocker: I didn’t learn about the distinction between the 373 (silly pin order) and 573 (left to right pin order) before trying to use these chips. Now I’m stuck waiting for them to come in, since the alternative is cutting a lot more custom wire lengths!

  • Graphics demo (take 1)

    Tim Ryan6 days ago 0 comments

    A short update: I figured out that Digital supports a fairly flexible Graphics integration called "Graphics RAM" that is a great way to test out CPU loops and timing, and also make demos more interesting. In the interest of keeping chip count small, I wanted to see if the TOM-1 could support just a single bidirectional transceiver (74LS245) for all of its I/O. I tested this out with Graphics RAM by first writing a byte for the address (into an 8x8 grid) and a second byte for the graphics value. With some additional circuitry, the TOM-1 can draw 256 color values onto a 8x8 Graphics display. The open question is whether it can do so using SPI to draw onto a command-based display, like Adafruit's SPI TFT display.

  • Adding a breadboard

    Tim Ryan7 days ago 0 comments

    I finally have enough parts to start breadboarding! I bought a 3742-Contact Point Elenco Breadboard which houses four regular sized breadboards. I think I'll wind up needing two of them. Here's an animation of me testing out the 74LS283 adder chip, with inputs A1 and B1 being controlled by the buttons and Σ1 and Σ2 being wired to the LEDs. I've never done a massive breadboarding project like this before, so I anticipate being pretty slow at it.

  • Bought an EPROM Programmer + Eraser

    Tim Ryan07/03/2020 at 19:24 0 comments

    Having done as much as I can for the moment in Digital, it's time to switch over to validating in circuity. I bought an EPROM programmer and UV eraser online for the M27C1024, a 256x16 bit EPROM that this is now obsolete, but which has the rare property of having a 16-bit wide data bus. Getting one of these EPROMs will probably take another week or two.

  • Writing a TOM-1 program (take 1)

    Tim Ryan06/24/2020 at 05:42 0 comments

    I needed a better way to test the branching logic of the CPU and decided to come up with a fairly basic "assembler". Taking inspiration from Forth, we can invent a very simple programming language from scratch taking a string, splitting all the whitespace into a set of tokens, and then converting each token into a 4-byte instruction in the code binary that the circuit uses to operate. With just a little compiler magic we can throw in some convenience features like named labels and mixed decimal and hex (0x-prefixed) numbers.

    Here's an example of a test that verifies subtraction:

    from tom1 import *
    
    labels = generate("""
    
    [start]
    
    0x8372 0x35aa -1 ~& 1 + + [check_subtraction]
    
    0 branch0 start
    
    """)
    
    debug()
    step_until(pc=labels['check_subtraction'])
    validate(tos=0x4dc8)
    print("success")
    

    This language isn't actually Forth but something much less powerful. Here's the module "tom1" imported at the top of the file: tom1.py

    So in the above script, call to generate("""..."""") will take a TOM-1 "script" as its first argument and actually compile it, generating a .hex binary file that can be used by the CPU in the simulator. This function generate() also returns a value labels that can be used to actually run the test. The methods debug() and the rest that follow generate() are actually Python code to interactively run the CPU, to step the CPU until we hit specific labels, and then validate if all the CPU values are correct. If the script doesn't throw during a call to validate(), then the test succeeded.

    Let's look at the at the first and last lines of the script inside the triple quotation marks ("""):

    """
    [start]
    
    ...
    
    0 branch0 start
    """
    

    [start] declares a label that we can conditionally jump to. When the compiler sees it, it will treat the token start everywhere else in the script as a reference to the location in the program where you wrote [start].

    The token 0 means the CPU will push the number 0 to the stack with its "push_literal" opcode. The next token branch0 is implemented on the CPU as a "jump_if_0" opcode that rewrites the Program Counter with a new address if and only if the value on the top of the stack is equal to 0. So 0 branch0 is a two-opcode way to "always branch". All the tests end with a jump to the start of the program since there is no way to stop the CPU...

    The middle line in the script is the actual test:

    0x8372 0x35aa -1 ~& 1 + + [check_subtraction]
    

    This is actually an executable program that subtracts two numbers! The CPU needs a lot of assistance from a compiler to abstract this away, because the CPU can do very little. For now I'm forced to write this out by hand. Let's break this down in steps:

    1. 0x8372 0x35aa => We push these two values onto the stack. The top of the stack is now 0x35aa.
    2. -1 ~& => We push the value -1 and then NAND the top two values on the stack. The token for the NAND opcode is arbitrarily ~& so it looks similar to the other arithmetic function +. After this opcode, the top of the stack will be equal to "0x35aa nand 0xFFFF", which is 0xca55. This is actually a bit hack that inverts 0x35aa without needing a dedicated "invert" opcode.
    3. 1 + => We need to add 1 to the inverted number. There's a difference between inverting a number and turning it negative, and that difference is two's complement.
    4. + => We add the top two values on the stack, which are now the first number we pushed (0x8372) and the two's complement of the second (0x35aa). Adding them together performs subtraction, and we get our one result on the stack: 0x4dc8.

    And there you have it, arithmetic subtraction on the TOM-1 implemented in a blazing, uh, 14 clock cycles. The final token in the script [check_subtraction] is used for testing but also is a helpful comment, since this code is very hard to follow. Eventually I'll have to stop working on an assembler and start working on an actual compiler, so -1 ~& 1 + + can be tucked away inside a function.

  • Opcode Encoding for a Stack Machine

    Tim Ryan06/23/2020 at 00:43 0 comments

    When coming up with requirements for TOM-1, I knew that the opcode space would be limited enough that the system would probably not require microcode, since I had seen Forth implementations which did not require many instructions. This meant I didn't tackle the question of how opcodes actually would be decoded until later. This organically built up into a collection of signals I could embed ad-hoc in a binary generated by Python:

    def push_literal(arg):
      writeh(
        TOS_BUS_ROM | CCK | DR_UnD | STACK_W_nR,
        TOS_BUS_ROM,
        arg
      )
    

    This function would write out the TOM-1 "push literal" opcode, by writing the CPU flags for two clock cycles (a full opcode) and a 16-bit operand. Take a look at the TOM-1 system diagram, or read this explanation: On the first clock cycle, we ask for the ROM to be on the TOS (top of stack) bus, for the D register to increment, and for us to write the value of TOS to the data stack. On the second clock cycle, we just leave ROM on the TOS bus. At the end of the second cycle TOS latches the 16-bit new value from ROM.

    Having written this out, it's clear that describing these flags isn't helpful in forming an "intuition" of what is happening in the CPU with each opcode. After struggling with different 2:4 and 3:8 decoder strategies, I actually came across another microcode-less design which seemed to share some design goals, as well as having an 8-bit opcode width. Here is the opcode format used by the #Microcode-less TTL CPU :

    Opcode format:
        7        6        5        4        3        2        1        0
    +--------+--------+--------+--------+--------+--------+--------+--------+
    |Carry fb| On Zero|  Src_0 |  Src_1 |  Src_2 |  Dst_0 |  Dst_1 |  Dst_2 |
    +--------+--------+--------+--------+--------+--------+--------+--------+
    
    The instruction decoder is composed of two demultiplexer ICs (2 x 74HC138) and is driven by the instruction register. A CPU instruction word is 8 bits wide, 3 bits select the data source and 3 bits select the destination. Each demultiplexer ICs apply the control signals to the selected destination/source at the execute phase (3. phase). The instruction decoders (2x74HC138) use up 6 bits of an instruction word, I used the remaining two bits for instruction modifications [carry and zero].

    Just from looking at this encoding, we can infer a few things about the CPU:

    • It can only move a value from a "src" register to a "dst" register each clock cycle
    • There are up to eight "src" registers and "dst" registers, and may be discrete sets
    • Conditional logic is done via the "on zero" flag

    This is confirmed in the project's description which reads "Eventually each CPU instruction is a hardwired MOVE instruction, the instruction code itself determines the component that will be the data source (e.g. accumulator, input port, RAM, program memory, etc...) and the data destination (accumulator, adder, inverter, output port, program counter, etc...)." The circuit is an example of transport-triggered architecture. From taking a look at the schematics, "src" registers appear to include the accumulator and program memory, and destinations include the program counter, accumulator, adder, and inverter. Other chip enable lines appear to be cycle-dependent.

    TOM-1 Opcode Design

    We can look back at the TOM-1 system diagram. TOM-1 design and draw some parallels. While I wouldn't describe the architecture as TTA, it does have three "concerns" each clock cycle:

    1. Selecting a stack register (D or R) and incrementing or decrementing it
    2. Computing a new value to be loaded into TOS (the accumulator)
    3. Performing a stack or ram load or write

    With this in mind, here are the first 4 bits of the TOM-1 opcode which will probably not change:

    1. DR_SEL — LO selects the D register, HI selects the R register.
    2. DR_UP — LO decrements, HI increments. This flag does nothing if DR_CCK is LO.
    3. DR_CCK - HI enables clocking (incrementing/decrementing the register) and LO disables it.
    4. TOS_DISABLE — When LO, TOS output is enabled. This means that TOS will be available...
    Read more »

  • Unit Testing

    Tim Ryan06/20/2020 at 06:06 0 comments

    Today is the first day I got unit tests working for the CPU in Digital, so I think I'm finally confident enough to share a status update. Although it's been really fun to play around with the Digital toolkit it's always nervewracking to get a circuit working, and then be too afraid to modify it. You might start building other signals around the signals you don't understand. Soon enough you're scribbling arcane boolean nonsense into your diary just to keep track of any signal at all:

    June 17: "some comments on pin inputs"
    
    D_RCK is pulse
    D_CCK is OR(AND(OR(p4, RAM), clk, pulse) and(~clk, pulse))
    R_U/~D is ~D_U/~D

    Surprisingly, none of this was correct in the end! Unit tests can solve this problem of letting you refactor ugly circuits while confirming the result of the circuit is correct. Digital's internal "test" component isn't the right model for a CPU-scale simulation, but instead, they provide an the emulated CPU that in its source tree that shows how to properly build a remote test harness. The Digital example processor has an accompanying Assembler that has an example of a TCP client that can control a running instance of Digital, and step through each clock cycle or whenever a BREAK occurs to walk through a circuit.

    The only thing missing from this protocol in order to run unit tests is a way to tap into Digital's "measurement" system, which allows you to display signals in real-time as the circuit runs (cool!) and track them in a dedicated "Measurement" pane. I pitched a "measure" command for the TCP connection and added a command like this for a fork on my own Github.

    A Measurement component from the Digital toolkit while running a TOM-1 simulation.


    I am experimenting with writing Python code to generate .hex files (used to load the program ROM) and also to test the circuit. Here is a test file "t01.py" that writes out a temporary binary, then loads it via the TCP interface "debug()", then asserts values of specific signals at a given clock cycle:

    from tools import *
    
    start()
    push_literal(0xcafe)
    push_literal(0x0010)
    store()
    push_literal(0x0010)
    load()
    drop()
    push_literal(0x0000)
    jump_if_0(0x0000)
    push_literal(0x0001)
    jump_if_0(0x0000)
    
    debug()
    step_until(PC=0x3)
    validate(tos_bus=0xCAFE)
    step_until(PC=0xc)
    validate(TOS=0xCAFE)
    print('success')

    By running t01.py and several other tests, it's quick to validate whether a circuit change makes a consequential impact on the circuit. One example is that this makes it easy to refactor excess gate logic in a 74xx-based simulation and confirm that with fewer spare gates, your circuit operation isn't impacted. Reducing the total number of spare NOT and AND gates in signal logic has been easy to do with this setup in place.

View all 7 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates