One-instruction TTL Computer

A breadboard-able computer which uses only a single instruction - MOVE

Similar projects worth following
This project is my attempt at making a computer which has only one instruction - the move instruction. Specifically, to move data from one location to another location. Those locations can be either registers or RAM or other special functions. This will be my submission for a neat small breadboard computer built by hand.The specific type of computer is called a Transport Triggered Architecture, but the triggering will be very simple. It works by moving data from one location to another. Every function has a memory location. To perform that function, you only need to move data to those memory locations. For example, to perform an ADD, just move bytes to the two ADD memory location, and the result will show up in the third memory location on the next clock. So the programs will only be a series of source and destination addresses.

Project status:

I have a working version of the architecture simulated in VHDL. It is synthesizable, and I have downloaded it to an Artix-7 development board - the Digilent Cmod. I have a few simple programs running with a UART interface. I have a bootloader running so I can load programs over the UART. I've written a simple assembler to generate machine code I can send to the bootloader. And I've started my breadboard hardware build.

Primary project goals:

  1. Implements only one instruction - move
  2. Can be built on a breadboard using a small number of simple DIP components (74xx TTL circuitry).
  3. All components are active. No obsolete or hard-to-find components. All components in stock at popular distributors.
  4. Useful - it needs to be able to run programs in a short amount of time.

Secondary project goals:

  1. Useful input mechanism. I expect a set of DIP switches for the input, but I'd prefer something more useful like a UART port or maybe a keyboard.
  2. Useful output mechanism. I could do a simple LED numeric display, but I'd prefer something more useful like a UART or at least a 4x20 character display. Perhaps I'd even do a video memory with a separate display.
  3. Easy and natural loading of programs. I'd prefer not to have a roundabout way to load programs into memory. Perhaps even use the above UART to send programs.
  4. Easily expandable for more memory. I'd like to be able to load very large programs.
  5. Write/adapt a compiler/assembler. I'd like to be able to take existing programs and compile them to run on the computer. This would probably be the last thing I do considering the complexity.

Build Plan:

  1. Clock circuit with single-step mode (DONE!)
  2. Program counter
  3. Instruction ROM
  4. Src/Dst registers
  5. Control state machine - 6-bit shift reg
  6. Decoding ROMs
  7. Load reg
  8. ALU A
  9. ALU chips
  10. ALU result reg
  11. AEB/Carry FFs/buffers
  12. Pointer address registers
  13. ROM data tristate buffer (read byte at PC)
  14. UART
  15. RAM

sheet - 13.83 kB - 06/16/2017 at 17:29



Clock circuit schematic

Adobe Portable Document Format - 119.50 kB - 06/16/2017 at 17:28


  • 1 × Texas Instruments PC16550D UART interface chip
  • 2 × Texas Instruments SN54LS181 4-Bit ALU
  • 4 × Texas Instruments SN54ACT245 8-bit bus transceiver with 3-state outputs
  • 2 × Texas Instruments SN74ALS867A 8-bit counter
  • 9 × Texas Instruments SN54ALS996 8-bit Register with readback and tri-state outputs

View all 7 components

  • TTL back on the menu

    Justin Davis2 days ago 0 comments

    Now that I've accomplished a milestone for the software (the BM9 benchmarking), I'm thinking I should swing back to developing the TTL version of the system.  I'm pretty happy with the architecture.  So I've been going back and doing the implementation in TTL.  Eventually I'll make a PCB with it, but I might prototype it on a breadboard first.

    I'd like to split the project into two separate boards: an S100-inspired backplane with the CPU in one slot, and the peripherals in another slot.  Here's an example I found on Google search for illustration:

    Maybe I would have a separate clock board in another slot, but I may integrate that with the CPU.  Maybe I would have a debug board which shows the data/address bus with LEDs.  Of course having a big backplane would be expensive, so I'll have to see how large these boards need to be.  And if I have all my boards stacked up, then they don't show off all their components as well.  Looks go a long way.  Perhaps I'll just split it into two just to illustrate how one board is the CPU, and the other side is flexible with ROM/RAM/UART, etc.  Then I can lay them side-by-side and still be able to show off all the chips.

    But schematic comes first, and I'm only maybe 1/3 of the way there.

    image source:

  • BM9 benchmarking results

    Justin Davis09/14/2017 at 13:53 1 comment

    I finally finished the BM9 benchmarking program in assembly code.  The results: completes in 64 seconds with the clock running at 12MHz.  Compared with the APOLLO181 finishing at 56 seconds running at 3MHz.  I can assume that if I lowered my clock speed to the same, it will run four times slower, putting it at a run time of 256 seconds.  I expect this because a one-instruction CPU is going to be less efficient especially with each instruction taking 4 clock cycles just for instruction fetching.  This is because of my 16-bit memory bus instead of the 8-bit memory bus like on the APOLLO181.  Also, I made my algorithm operate on 16-bit numbers instead of 12-bits.  (Which makes the divider take 16 loops instead of 12).  So I imagine if I reduced the complexity of my system down, it would operate roughly in the same amount of time.  

    As that website points out, the BM9 program was traditionally run in BASIC with an interpreter which drastically slows the operation, so a direct comparison is difficult.  I also output the prime numbers in hexadecimal - I didn't bother to convert to decimal.  Other notes from the APOLLO181 website:

    • CDC CYBER 171, a late-1970s mainframe-class supercomputer, which run "BM9" (in Basic) in only 5 seconds (ranked 1st)
    • TRS-80 Pocket Computer, which painfully run "BM9" in 55830 seconds (ranked last)
    •  45 seconds for the DEC PDP 11/70 (ranked 5th)
    • IBM 3033 (1977) run BM9 in 10 seconds (ranked 2nd)
    • The Apple II Plus (1979) equipped with MOS 6502 microprocessor run BM9 in 325 seconds (ranked 15th)
    • APOLLO181 would rank 6th
    • As is, my one-instruction CPU would rank about 11th running at 3MHz (TTL speeds) - still faster than any of the 6502 systems

    So now, I have to think where I want to go next.  I may come back around and work on the TTL design.

    Here is a link to the paper which has all of the results:

  • 16-bit Dividing

    Justin Davis09/12/2017 at 17:54 0 comments

    I've been programming furiously.  I have 16-bit functions going - Add, Subtract, and Divide.  And I have 8-bit multiply going.  I've changed from putting the operands on the stack to having a dedicated memory location for them.  I have so much memory, it's easier/faster just to hard-code specific locations for them.  Next I'm going to implement the BM9 benchmark program as seen in the APOLLO181 website:

    This is also where I got my divider program, except I adapted it for 16-bit words.  I've added these functions to the ROM as subroutines which I can call and return to my main program.  I still have plenty of room in the ROM and RAM, so I don't think I will run out any time soon.  I'd like to get the BM9 program into the ROM so if I ever get to the TTL version of the CPU, I can compare its run time to the FPGA version.

  • Back up to speed

    Justin Davis09/06/2017 at 15:54 0 comments

    I've completed the hardware and software updates for the 16-bit register address bus.  I now have the monitor program running (not really a bootloader which I've been calling it).  I have also updated my PUSH/POP commands along with my CALL and RETURN commands.  So my stack is working correctly again.  I also implemented putting variables onto the stack and using them in functions.  The first function I put into the ROM is the BinaryToASCII function.  I push a byte onto the stack, and then call it.  It converts the byte into two ASCII bytes which it then sends over the UART.  I made this function so I can use it in my monitor program to read memory locations and display them.  

    I've found coding goes much faster now with the new architecture.  There's a lot less mental gymnastics.  This makes it a lot more fun to code for it as well.

  • Top-level block diagram

    Justin Davis09/05/2017 at 14:42 0 comments

    I redesigned the top-level block diagram so that it illustrates the new datapath better.  It's much simpler than the previous one, but it doesn't have the TTL chips called out.  This makes it simpler to understand the operation of the different functions.  I'd like to be able to animate it and make a gif out of it so I can show what happens in the 6 clock cycles per instruction, but doesn't support it.  I would like to have a better flowgraph program, but I do like open web-based tools.

    Breaking down the system like this makes it much easier to understand its simplicity.  The CPU only really controls which of the three 16-bit registers control the address bus, and then decodes the address bus for reading/writing of its own 16-bit registers to/from the data bus.  It starts to look a lot like a normal 1970s CPU interface (like a 6502 for example).  Perhaps I could make two different PCBs - one for the CPU and one for the external peripherals - just to show what parts the CPU could be if put into an ASIC.

  • 16-bit register space implemented

    Justin Davis08/31/2017 at 19:29 0 comments

    I have updated the VHDL code to implement the change to a single 16-bit address space which includes the register space and the RAM/ROM space.  It seems to be working ok.  

    I'm rewriting the bootloader for this new architecture.  I'm still debugging, but it is definitely easier and more fluid to write code for this.  However, the code is generating almost twice the memory size as expected (about 1.8x) even though it's fewer instructions.  I'm pretty happy with this change.  I notice I do a lot of LOAD-type commands which are now more straight-forward since I'm just copying from the ROM.

    I've found in this iteration the ALU and pointer registers are really more like peripherals.  It's almost like if you took a 6502 and added a co-processor for floating bit operations, except all operations are done in the peripherals.  So outside of the peripherals and memory, it's just a simple state machine, a couple of counters to hold the program counter, and a couple registers for the source/destination address.  If I was in the early 1970s, I could have made an ASIC with this and then bring out the data and address bus to an external RAM/ROM and peripherals like an ALU or UART, etc.  I may reorganize my block diagram to show this better.  In doing this project it makes me realize that an ALU is not necessary for a minimum-viable CPU (Turing complete).

  • 16-bit register/memory space

    Justin Davis08/29/2017 at 14:41 0 comments

    One thing that's been very annoying in coding is how I interface to the memory.  I have a 16-bit page register that allows me to access 128 memory locations with the source/destination bus.  So when I push/pop I have to change my memory page to the stack.  Then I lose access to my other variables.  Copying a variable onto the stack is tough because I have to store it in the few registers I have, change the memory page, and then copy it back without disturbing those registers.  

    I would really like to be able to access the whole 16-bit memory at any time.  The old 6502 allows you to access the whole 16-bit memory space and I'm jealous.  This would let me pop/push without losing access to my other memory space.  I can move variables onto the stack easily.  And I can transfer from anywhere to anywhere in my memory very easily.  


    So the memory map will look something like: 

    0x00000x00FFFunction Registers
    • Each instruction now goes from 16-bits to 32-bits.  So my programs increase in size pretty quickly. 
    • I'll have to rewrite all the code I previously wrote.
    • The new instruction cycle would be two fetches for the source address, copy the source data to a temporary register, two more fetches for the destination, then write the temporary register to the destination - 6 clock operation.  Boy this went up fast from my original desire for 1 clock per instruction!
    • I won't need a LOAD register anymore.   It will be trivial to put a 256 constants in the ROM and do a transfer from there.  This removes all the LOAD register chips.
    • The pointer address register will have to change to point to a 16-bit address, so a HI and LO register.
    • I'll have to change the boot vector to 0x0100.
    • I'll have to decode a 16-bit address now, but maybe I can just check if the upper byte is all zeros.

    This is almost a complete tear-up of the design.  However, it will be much easier and elegant to code for this CPU (and more fun).  And I don't think it will increase the chip count 

  • Benchmarks

    Justin Davis08/01/2017 at 16:56 0 comments

    I completed a simple 8-bit x 8-bit multiply function (iterative adds).  I decided to do a quick benchmark to check its performance.  Since this is an iterative add function, the worst case is FF x FF.  It takes 1.61ms to perform with the clock running at 12MHz.  So it takes 19,397 clock cycles to perform the operation.  That's a lot of moving.  If there's 256 loops, then it takes about 76 clocks per loop.  There's currently 3 clocks per instruction, so about 25 instructions per loop.  That sounds about right.  

    A better algorithm would speed this up like the shift-and-add algorithm (probably).  But it's good to get a baseline.  The number of loops is really the killer.  A shift-and-add algorithm does I think 8 loops with an add and a shift right each loop.  Of course, in the best case, my algorithm takes zero loops, so it's possible to be faster, but statistically I'm sure it's slower.

    But now I can store this function in the ROM and call it whenever I need to do a multiply.  However, I have it pointing at dedicated memory locations.  I need to work on using the stack to pass values to functions instead.

  • Stack code

    Justin Davis07/31/2017 at 17:57 0 comments

    I've finished the stack code including pop,push,call,return.  So that brings me to all of the following instructions implemented:

    • move
    • load (immediate)
    • memload (immediate)
    • memread
    • memwrite
    • jump
    • branchif1
    • pop
    • push
    • call
    • return
    • rotateleft

    That should give me quite a lot to make some real programs now (on top of the inherent functions of the ALU).  I'll need to bring up some better debug tools next.  Memory inspection.  Probably enhance my bootloader.  Maybe start on an emulator.

  • Focus

    Justin Davis07/20/2017 at 15:54 0 comments

    I realize in looking over everything I'm doing for this project, it's actually several projects that are falling under one umbrella. 

    • A TTL design (which is only at the design phase)
    • An FPGA design (which is fully working)
    • An assembler (which is fully working)
    • Software code development to implement macro-functions which emulate normal architecture instructions (significant development)
    • Software code development to implement more advanced features like a stack and ASCII conversion
    • Emulated hardware on a PC for easier debug (not started)
    • Emulated hardware on a PC as a video game (not started)

    Each of these can be a separate project.  Looking over other projects, some people's whole project is to develop an FPGA microprocessor which is only one component for me.  Considering I have very little time/energy to devote to this project, I need to decide which components to focus on.

    However, there is an underlying commonality which is the architecture.  All of these assume a similar architecture even if they have different implementations.  The FPGA doesn't use tri-state buses, but the TTL circuit does.  It doesn't even matter if the TTL version has a single source/destination bus, and the FPGA has two separate ones, or even if it's emulated hardware.  As long as they can all execute the same code, then I would say they are the same.  I have to admit, I'm pretty happy with the FPGA version because I can change it very quickly and have decent debug capabilities with the simulator.

    I guess what I'm saying is I may delay building the TTL version for now, but keep the design so it can easily be built.  I'd like to focus on developing the software and run it on the FPGA.  The software development has been a lot of fun, which is why I keep thinking about making a video game about it similar to Human Resource Machine or the Zachtronics games.  If only I had more time/energy, but my priorities are family->work->hobbies, I'm left with maybe an hour a day for the last one and no energy.  

View all 66 project logs

Enjoy this project?



agp.cooper wrote 07/13/2017 at 15:27 point

Hi Justin,
Step back and relook at what a TTA is:
The machine cycles are:
Fetch SRC (address)
Fetch Data from [SCR]
Fetch DST (address)
Deposit Data to [DST]
This is what the timing diagram of the control signal looks like for the above machine cycle:

Sorry there is no load immediate but your assembler can store a constant for you to simulate a load immediate.
For indirect addressing (pointer to pointer moves, you will need these at some point), you will need self modifying code. 
That is for another time.
If your happy with the above the decoder logic is four chips including the clock.

  Are you sure? yes | no

Justin Davis wrote 07/13/2017 at 15:46 point

I can see one different in our design strategies.  I've been using the same clock for all my components, but then using the enable lines to control when I want them to do something.  You send each component it's own clock only when you want it to do something.  I've found a lot of TTL components do not have enables, so this makes a lot of sense.

  Are you sure? yes | no

agp.cooper wrote 07/13/2017 at 08:38 point

Hi Justin,

After 56 logs and frustrated by the load instruction do you want some help?


  Are you sure? yes | no

Justin Davis wrote 07/13/2017 at 11:48 point

I'm always open to advice.  The load instruction only became a problem when I decided to combine the source and destination buses into one.  It removed like 6 chips, but had to add two back.  So it's not a huge deal - just optimizing the design.

  Are you sure? yes | no

Andrew Starr wrote 04/28/2017 at 00:39 point

Very interesting! TTA has a certain austere simplicity that appeals....

  Are you sure? yes | no

agp.cooper wrote 04/25/2017 at 08:43 point

Hi Justin,

Have a look my Weird CPU it is a true move only CPU build with TTL:

It is a lot more primitive than what you are proposing.

Regards AlanX

  Are you sure? yes | no

Justin Davis wrote 04/25/2017 at 21:53 point

That looks great! And gives me some ideas for my own project.

  Are you sure? yes | no

Justin Davis wrote 04/22/2017 at 11:27 point

well that takes the wind out of my sails a bit.  I may have to review my goals based on these projects to keep mine unique

  Are you sure? yes | no

Yann Guidon / YGDES wrote 04/22/2017 at 13:36 point

no, just continue in your own way, you can only make something unique and discover new ideas if you don't look too much at other things :-)

  Are you sure? yes | no

Justin Davis wrote 04/22/2017 at 17:29 point

Ya, looking over that project and a few others, I still think my direction is unique. I think the one posted today is not a true one-instruction since it decodes the instruction into 4 different functions even taking a different number of clock cycles for each instruction. The others handle branches differently from how I'm planning on doing it. But they did have some good ideas.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates