Close
0%
0%

DIP-8 TTL Computer

Digital Information Processor - an 8-bit computer made out of 7400 series logic and some EEPROMs.

Public Chat
Similar projects worth following
An 8-bit computer without a microprocessor - just some 7400 series logic and some EEPROMs. Custom architecture, with custom software to follow - the plan is for a multitasking OS and a compiled language.

CPU

As my first CPU design to leave my notebook, I wanted to keep things simple, so I'm using EEPROMs as simple programmable logic devices. Two 64KB devices make up the ALU, and another three decode instructions into 23 control lines. The design is faster than it might sound, thanks to the existence of the Winbond W27C512-45, a 45 ns EEPROM which is readily available (on eBay!) and I think the fastest EEPROM of its size.

Using ROMs for instruction decoding allows for a featureful instruction set while the rest of the hardware is quite simple. There are some 16-bit operations, a load of different addressing modes (including stack-relative for higher-level languages), and the ALU operations can work on registers or values in memory.

  • 6x general-purpose 8-bit registers: X, Y, BH, BL, CH, CL
  • BH/BL/CH/CL form two 16-bit register pairs (B, C)
  • 16-bit stack pointer and program counter
  • ROM-based ALU can perform add, subtract, and, or, xor, rotate right, signed and unsigned comparisons
  • Carry, zero and negative flags for conditional jumps
  • No interrupts

System

  • 8-bit data bus, 16-bit address bus
  • 4 MHz clock
  • Serial interface

Software

  • Development tools (assembler, emulator) written in Python
  • I have a plan for a cooperatively-multitasked operating system
  • Ultimate plan is to design a high(ish) level language and write a compiler

  • Moving to a single board

    Kyle McInnes11/08/2022 at 20:05 1 comment

    As I said in the last post, I wasn't happy with the multi-board format, so I've merged my Kicad schematics into one project and I'm going to do a new PCB. It should look roughly like this, measuring 225x150mm:

    The PLCC part will be a dual UART (possibly a 68681?). Expansion cards can be plugged into the 40-pin IDC connectors. A couple of things are still missing, mainly a clock source and the address decoding logic.

    This is my current plan for the memory map. It makes full use of the 64K SRAM and the 64K EEPROM, and provides another 64K of address space for expansion - leaving options open for a future graphics card. I'll use the UART's output port as the bank selection register, and it should only take a few extra gates to implement this.

    |---------------| FFFF
    |  I/O devices  |        256 bytes (32 bytes each for 8 I/O devices)
    |---------------| FF00
    |               |
    |               |
    |    48K RAM    |        Upper 48K of the 64K SRAM
    |               |
    |               | 
    |---------------| 4000   Banked area can be: lower 16K of RAM,
    |    16K bank   |         any of the four 16K ROM pages,
    |---------------| 0000    or any of the four 16K I/O pages

  • Project update

    Kyle McInnes11/03/2022 at 18:08 2 comments

    With a few extra parts cobbled together on another eurocard, the DIP8 CPU becomes the DIP8 computer.

    From top to bottom we have:

    • Reset button
    • 16550 UART plus its 16 MHz crystal
    • AVR microcontroller - was using this for debugging, now its sole purpose is to generate the computer's 4 MHz clock. Obviously this will be disappearing in the future!
    • UM61512AK-15 - 64K RAM (32K usable: 0x8000 - 0xFEFF)
    • 74LS688 comparator - selects the I/O devices (currently just the UART) when the top 256 bytes of memory are addressed (0xFF00 - 0xFFFF)
    • 64K EEPROM (32K usable: 0x0000 - 0x7FFF)
    • Status LEDs - negative, zero, carry

    This is really just a prototype, first incarnation of the computer - hence the 32K ROM/32K RAM split which I intend to improve upon in the future.


    The system runs happily at 4 MHz, so my mandelbrot test program now only takes 5 seconds. The clock can actually go a bit higher, up to at least 6 MHz, by increasing the duty cycle (only the high part of the clock needs to be 125 ns wide).

    So what's next?

    • I want a memory banking system to allow use of the full 64K of RAM.
    • For developing the OS, I could do with some kind of storage, maybe an SD card interface - flashing EEPROMs gets pretty annoying, even with a ZIF socket! Or maybe I could load programs over the serial link, hmm.
    • My main issue at the moment is that I've come to dislike the eurocard/backplane form factor - most of the chips are hidden away :( To its credit, it's a nice way to design something incrementally - being able to knock something up on a protoboard, plug it in, and then replace it with a PCB has worked quite well. But now that I know what the system looks like, I'd prefer it to mostly be on one "motherboard", with some expansion slots for I/O devices.

  • Mandelbrot

    Kyle McInnes09/24/2022 at 19:07 2 comments

    While I wait for more parts to arrive, here's a mandelbrot program running in real time in the simulator. I had fun optimising this down from over 30 seconds to 10.5 - couldn't manage to get it below 10 though.

    asm source: https://github.com/kylesrm/dip8-computer/blob/main/src/mandelbrot.asm

    The terminal is https://github.com/Swordfish90/cool-retro-term. Hopefully one day the real DIP8 will be hooked up to a real amber terminal, I love the look of them.

  • First run

    Kyle McInnes09/04/2022 at 20:37 0 comments

    IT'S ALIVE!

    The boards are not fully populated yet, and I have no memory or I/O - just an AVR microcontroller emulating a ROM. The three LEDs are the carry, zero and negative flags, and the tiny program in the ROM is just doing a chaser pattern on these.

    Projects like this require that you give things fancy names. So here's the "fetch/decode" unit - instruction register, control ROMs, program counter and address register:

    And here's the "execute" unit - register file and ALU.

  • ALU design

    Kyle McInnes08/16/2022 at 13:16 0 comments

    A lot of the hardware design is done now and I've sent a couple of PCBs off, so I can document the various bits. Here's the ALU:

    There are two 64K x 8 EEPROMs, each generating 4 bits of the result. Both ROMs use the same image, but A15 is pulled low on one and high on the other, so they can behave slightly differently. The "A" operand comes from the register file and the "B" operand is hardwired to a temporary register called T. To do x=x+y, y is first moved to t, and then "add x, t" will add t to x.

    There are 16 possible operations, set with the four AluOp bits:

    • 2x pass-through operations, Q=A and Q=B. Needed to store a register into memory for example, as the only way to read a register is through the ALU. Q=B is used by some instructions that use the T register as a temporary location. For example the "stl" instruction (store literal to memory) first writes the literal value to T, and then writes it back to memory.
    • 6x arithmetic: add, sub, adc, sbc, inc, dec. These are exposed in the instruction set.
    • 3x logic: and, or, xor. These are also available as instructions.
    • 2x conditional increment: ci increments if the carry is set, and cd decrements if the carry is clear. These are used by the microcode to (for example) add an 8-bit offset to a 16-bit address.
    • 2x operations (ror1 and ror2) that together perform a rotation right by one bit - explained below!
    • 1x operation (sig) that makes the next comparison signed - also explained below!

    These are all defined in a Python script that generates the ROM image.

    Flags

    The three status flags come out of the ALU and are stored in a register. The high ROM outputs the final carry and the negative/sign flag, which is equal to bit 7 of the output. The zero flag is the zero output of both ROMs, ANDed together.

    There is another, hidden, "internal" carry flag, stored in U6. This is used by instructions that need to use the ALU to do 16-bit operations, without disturbing the normal status flags. An example is the push instruction: after storing the given register at the stack pointer address, it has to decrement the SP. It first does a dec on the SP low byte, and then a cd on the high byte, which decrements it if there was an underflow on the low byte. The internal carry stores the carry across these operations, keeping the status flags unchanged. The nSetFlags pin tells the ALU which flags to use and update.

    Rotate

    A ROM-based ALU is theoretically a very powerful thing. You can have lookup tables in there for any function you like: multiply, divide, sine, cosine, shifts by an arbitrary number of bits. Except to make that work you need a single ALU chip, where the full widths of each input are available. My high ROM only has the upper four bits of each input to work with, along with the one-bit carry output from the low ROM - so no fast multiply for me. 

    I realised though that there's also a one-bit communication channel from the high ROM to the low ROM - through the carry flag. And this is enough to allow right shifts or rotations - it just takes an extra cycle:

    input      76543210 C
    output     C7654321 0     input is rotated right through the carry flag
    
    
                Cin  Lo   Cnib Hi   Cout     Cnib = carry from lo to hi ROM
    input       -C-> 3210      7654
    after ror1       C321 -0-> 0765 -4->     Each nibble is shifted right into the carry out. Carry in goes into the hi bit
    after ror2  -4-> 4321 -C-> C765 -0->     Hi bit to carry out, Carry in to hi bit.
    result           4321      C765  0       This is the correct result
    

    The ror instruction just does ror1 and ror2 sequentially, and there you go - rotate right using a 4-bit ROM-based ALU.

    Why is ror a useful instruction to add? I didn't really understand the need for rotate instructions until I started reading about how to do multiplication and division on 8-bit machines. You can think of rotates as the "with carry" version of logical shifts - you can use them to chain shifts together to work on values wider than 8 bits. I can already shift and rotate left,...

    Read more »

  • Log #6: Accidentally wrote a C backend

    Kyle McInnes08/08/2022 at 21:21 0 comments

    Getting C programs running on this machine was never one of my goals, but I was looking idly at the documentation for LLVM and GCC, wondering how hard it would be to write a code generator. It seemed like far too big a task and I was happy to let it go, and then I stumbled across the vbcc compiler. It has many different backends, including some small 8-bit processors and a clean "generic" 32-bit RISC. Each backend is basically a single 1000-2000 line C file. Those examples, coupled with this useful document were enough to get me hacking together a DIP8 backend. It is quite a complicated and frustrating process, but the nice thing is you can add functionality bit by bit, and before you know it you can compile some fairly interesting C programs.

    I haven't got function calls, structs or arrays working properly yet, but I can compile the C version of the Byte sieve test that I previously wrote in assembly:

    #define size 8191
    
    int main(void) {
    
        unsigned int count;
        unsigned int prime;
        unsigned int k;
        unsigned char *flags = 0x1000;
    
        count = 0;
        for (unsigned int i=0; i<size; i++) {
            flags[i] = 1;
        }
        for (unsigned int i=0; i<size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k < size) {
                    flags[k] = 0;
                    k += prime;
                }
                count = count + 1;
            }
        }
    
        return count;
    
    }

    This works, and to my amazement it's actually quite fast - only 20% slower (and 50% bigger in size) than my hand-crafted assembly. 

    In the process of doing this, I added some more instructions that C likes to use a lot and were otherwise a bit difficult - signed comparisons, rotate right, some more 16-bit arithmetic and a way to do arithmetic on variables in memory, without loading into a register. I'll go through those in another log.

    I would like to at some point write a tutorial on how to write a vbcc backend for a homebrew CPU, as the vbcc code isn't very easy to read and the docs are a bit lacking in places. Actually the most useful backend to look at is for the "Z-machine", a VM used for Infocom text adventure games, as it's well commented - link here.

  • Log #5: Emulation and testing performance

    Kyle McInnes07/24/2022 at 21:07 0 comments

    I had the idea to rewrite my emulator so that it uses the same decoder ROM images that the real machine will. This was quite a good move - the emulator now stays in sync with any changes I make to the instruction set. It also means the emulator is now cycle accurate - it can tell me exactly how long a program will take to run. With that, I thought I'd implement the "Byte sieve" and see how my architecture performs.

    Statistics:
        328882 instructions
        1184625 cycles
        0.5923125 sec at 2.0 MHz
        3.60 cycles/inst
    
    

    I was surprised by this - one iteration of the sieve (of size 8191, the standard size) takes 0.59 seconds. How does that compare to other processors? I found this article from 1983 that lists reader-submitted times for various systems and languages. Those times are for 10 iterations so I need to multiply my number by 10:

    1 MHz 6502  asm   13.9
    ? MHz Z80   asm    6.8
    5 MHz 8088  asm    4.0
    8 MHz 8086  asm    1.9
    
    2 MHz DIP8  asm    5.9

    Not bad! It performs about the same as a 6502 at the same clock rate, which I wasn't expecting. Maybe the 6502 code wasn't a very efficient implementation. Anyway, it's just one benchmark. I have some nice addressing modes and 16-bit registers which will have helped here, but modifying variables in memory is quite clunky at the moment. Luckily I have a plan to fix that.

    The downside to this new emulator is that it's a bit slow - it can't run in real time. Which I think is a bit funny - my Python code running at 3 GHz can't emulate a system running at a puny 2 MHz!

    Byte sieve assembly code is here for anyone interested: https://github.com/kylesrm/dip8-computer/blob/main/src/sieve.asm

  • Log #4: First board

    Kyle McInnes07/20/2022 at 17:54 0 comments

    After some breadboarding and a late night soldering session I have the first bit of working hardware. This board contains the program counter, address buffer, and instruction decoding circuitry. I will eventually replace this board with a proper PCB - this construction method is fine until you get a loose connection, and then it's a huge pain. It's fine for now though and it allows me to work on the other parts of the system.

    For the program counter and address buffer, I'm using 4x 74LS469 - a synchronously presettable 8-bit up/down counter with tristate outputs. Pretty much the ideal counter IC - I'm not aware of any other counter that has all the features I want in one IC. None of the 4-bit counters have tristate outputs, so you need double the number of chips plus a couple of octal buffers. The '590 is 8 bits wide with an output enable, but it's not presettable. The '593 is presettable, but only through the single input/output port.

    The '469 does have a couple of issues though. Firstly it's obsolete and quite hard to find - there's a few on eBay. Secondly, for some reason I don't understand, they each consume about 100 mA (the datasheet says 120 mA typ), and get pretty hot! So maybe I'll replace them with something else when I do the PCB.


  • Log #3: Timing

    Kyle McInnes07/16/2022 at 12:54 1 comment

    Thinking about timing and whether I'm happy with this aspect of the design.


    Each instruction consists of some number of operations (phases? cycles?), where the 24 control lines are set appropriately. That information is held in a control store of 3 64Kx8 EEPROMs, which are addressed by the 8-bit instruction register, a 4-bit sequence counter, and the three status flags (for conditional jumps).

    Here's an example of how an instruction is defined:

    add x, #L
        pcinc memrd twr                         ;write literal to T
        regoe alu opadd regwr selx setflags     ;X = ALU(X, T, op=add, setflags=yes)
        pcinc memrd irwr                        ;write next opcode into IR (fetch - common to all instructions)

     So far I've been thinking of the timing like this:

    • On the clock's rising edge, clock the program counter, instruction register and sequence counter.
    • During the high half of the clock, instruction decoding and execution (i.e. ALU computation) is underway.
    • On the clock's falling edge (rising edge of ¬CLK), write back any values to registers, or to memory.

    While this has its problems, it seems to be a popular method with simple homebrew CPUs. For example the very well documented CSCvon8: An 8-bit TTL CPU and the SPAM-1 - 8 Bit CPU.

    Interestingly, many people describe this as "fetch/decode" on the high part of the clock, and "execute" on the low part. Is that correct? Maybe you can argue either way, but to me it makes sense to consider the time it takes for the ALU to produce a valid result as execution time, just as decoding is what is happening while the control store ROMs are producing a valid output. Writing back values to RAM or to registers takes little time in comparison (in the case of registers, no time at all).

    And that's one of this method's weaknesses - pretty much everything happens during only one half of the clock cycle, limiting performance. In my case, it looks like 2 MHz will be the maximum clock rate.


    A design where things only happen on the clock's rising edge would be more pleasing to me - and more performant. But for the sake of simplicity I'll probably keep it the way it is. Things can always be improved in the next design!

  • Log #2: Stacks and subroutines

    Kyle McInnes07/13/2022 at 13:54 0 comments

    When writing assembly, there are no rules about how to pass data to and from subroutines. Each subroutine can do it in a different way, for maximum efficiency. If there are just a few parameters, it probably makes sense to pass them in registers. Extra ones can go on the stack, although this can require a bit of juggling if the same stack is used for return addresses. Alternatively, data can be passed via a "parameter block" in-line with the code - check out this useful resource to see how that was done on the 6502.

    Likewise, return values (and there can be more than one) can go in registers, on the stack, in fixed memory locations, or even in the form of the carry and zero status flags in the case of boolean values.

    Here's an example of a hand-written subroutine in assembly. As the comment says, the inputs are the x and y registers, and the output is returned in b. It modifies the c register, so the caller would need to save its value on the stack, if it was important.

    The "call" instruction is actually a macro that pushes the return address (the one after the call) and jumps to the given address. So it's really two instructions - 6 bytes in total. The "ret" instruction is a real instruction that pops a 16-bit value into the program counter.

    The routine uses a "local" variable, mulbit, which is stored in a fixed location in memory.

                mov x, #56
                mov y, #39
                call mulxy
                ; b == 2184
    
    
    ;mulitply (8bit x 8bit = 16bit result)
    ; xy  inputs
    ; b   output
    ; c   clobbered
    
    mulxy       mov b, #0
                mov cl, x
                mov ch, #0
                stl #1, mulbit
    mloop       mov x, y        ; add if y & bit
                and x, [mulbit]
                jz  mnoadd
                add bl, cl      ; b += c
                adc bh, ch
    mnoadd      add cl, cl      ; c *= 2
                adc ch, ch
                adr mulbit      ; bit *= 2
                ldx
                add x, x
                stx
                jcc mloop
                ret
    
    mulbit      .byte 1

    From assembly to a higher level

    Eventually I'd like to implement a high-level language though (at least higher-level than assembly), and then we do need some rules - a calling convention. This will make heavy use of the stack, for arguments, return values, and each function's local variables. A stack-relative addressing mode, with which we can read and write values on the stack without pushing and popping them, is key, and that's why I have sp+imm8 and sp+imm16 addressing modes in the ISA.

    Because the modes can only add a positive offset to the stack pointer, it makes more sense for the stack to grow down. If it grew up, you would need a negative offset to access a function's parameters.


    So here's what the output might look like from a high-level compiler:

                ; call myfunc with 2 1-byte parameters
                push #12
                push #34
                push #return_addr   ; call
                jmp myfunc          ; call
                ...
    
    
    myfunc      push x          ; callee saves x and y
                push y          
                sub sp, #32     ; allocate space for locals
                ...
                ldx sp+37       ; load parameter
                stx sp+3        ; store local variable
                ...
                call func2      ; call child function
                ...
                add sp, #32     ; deallocate locals
                pop y           ; restore x/y
                pop x    
                pop b
                add sp, #2      ; deallocate parameters
                jmp b           ; return

     This is all very preliminary, but it shows that a high level language could be implemented fairly efficiently.

View all 11 project logs

Enjoy this project?

Share

Discussions

edsjac wrote 6 天前 point

Nice project!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates