Close
0%
0%

DIP8 TTL Computer

Digital Information Processor - an 8-bit computer made out of 7400 series logic and some EEPROMs. Under active development!

Public Chat
Similar projects worth following
An 8-bit computer without a microprocessor - just some 7400 series logic and some EEPROMs. Custom architecture, with custom software to follow - the plan is for a multitasking OS and a compiled language.

CPU

  • 6x general-purpose 8-bit registers: X, Y, BH, BL, CH, CL
  • 2x 16-bit registers B (BH:BL) and C (CH:CL)
  • 16-bit stack pointer and program counter
  • Flags: Carry, Zero, Negative. Flags are tested with conditional jumps
  • Instruction set is designed to be nice to write assembly for, but to also support higher level/compiled languages (with features like stack-relative addressing)
  • ROM-based ALU: add, sub, adc, sbc, cmp, and, or, xor, ror (8 bit); addw, subw (16 bit)

System

  • 8-bit data bus, 16-bit address bus
  • ~2 MHz clock
  • Text interface via UART for now
  • I/O system still being designed

Software

  • Development tools (assembler, emulator) written in Python
  • I have a plan for a cooperatively-multitasked operating system
  • Ultimate plan is to design a high(ish) level language and write a compiler

  • Log #6: Accidentally wrote a C backend

    Kyle McInnes6 days ago 0 comments

    Getting C programs running on this machine was never one of my goals, but I was looking idly at the documentation for LLVM and GCC, wondering how hard it would be to write a code generator. It seemed like far too big a task and I was happy to let it go, and then I stumbled across the vbcc compiler. It has many different backends, including some small 8-bit processors and a clean "generic" 32-bit RISC. Each backend is basically a single 1000-2000 line C file. Those examples, coupled with this useful document were enough to get me hacking together a DIP8 backend. It is quite a complicated and frustrating process, but the nice thing is you can add functionality bit by bit, and before you know it you can compile some fairly interesting C programs.

    I haven't got function calls, structs or arrays working properly yet, but I can compile the C version of the Byte sieve test that I previously wrote in assembly:

    #define size 8191
    
    int main(void) {
    
        unsigned int count;
        unsigned int prime;
        unsigned int k;
        unsigned char *flags = 0x1000;
    
        count = 0;
        for (unsigned int i=0; i<size; i++) {
            flags[i] = 1;
        }
        for (unsigned int i=0; i<size; i++) {
            if (flags[i]) {
                prime = i + i + 3;
                k = i + prime;
                while (k < size) {
                    flags[k] = 0;
                    k += prime;
                }
                count = count + 1;
            }
        }
    
        return count;
    
    }

    This works, and to my amazement it's actually quite fast - only 20% slower (and 50% bigger in size) than my hand-crafted assembly. 

    In the process of doing this, I added some more instructions that C likes to use a lot and were otherwise a bit difficult - signed comparisons, rotate right, some more 16-bit arithmetic and a way to do arithmetic on variables in memory, without loading into a register. I'll go through those in another log.

    I would like to at some point write a tutorial on how to write a vbcc backend for a homebrew CPU, as the vbcc code isn't very easy to read and the docs are a bit lacking in places. Actually the most useful backend to look at is for the "Z-machine", a VM used for Infocom text adventure games, as it's well commented - link here.

  • Log #5: Emulation and testing performance

    Kyle McInnes07/24/2022 at 21:07 0 comments

    I had the idea to rewrite my emulator so that it uses the same decoder ROM images that the real machine will. This was quite a good move - the emulator now stays in sync with any changes I make to the instruction set. It also means the emulator is now cycle accurate - it can tell me exactly how long a program will take to run. With that, I thought I'd implement the "Byte sieve" and see how my architecture performs.

    Statistics:
        328882 instructions
        1184625 cycles
        0.5923125 sec at 2.0 MHz
        3.60 cycles/inst
    
    

    I was surprised by this - one iteration of the sieve (of size 8191, the standard size) takes 0.59 seconds. How does that compare to other processors? I found this article from 1983 that lists reader-submitted times for various systems and languages. Those times are for 10 iterations so I need to multiply my number by 10:

    1 MHz 6502  asm   13.9
    ? MHz Z80   asm    6.8
    5 MHz 8088  asm    4.0
    8 MHz 8086  asm    1.9
    
    2 MHz DIP8  asm    5.9

    Not bad! It performs about the same as a 6502 at the same clock rate, which I wasn't expecting. Maybe the 6502 code wasn't a very efficient implementation. Anyway, it's just one benchmark. I have some nice addressing modes and 16-bit registers which will have helped here, but modifying variables in memory is quite clunky at the moment. Luckily I have a plan to fix that.

    The downside to this new emulator is that it's a bit slow - it can't run in real time. Which I think is a bit funny - my Python code running at 3 GHz can't emulate a system running at a puny 2 MHz!

    Byte sieve assembly code is here for anyone interested: https://github.com/kylesrm/dip8-computer/blob/main/src/sieve.asm

  • Log #4: First board

    Kyle McInnes07/20/2022 at 17:54 0 comments

    After some breadboarding and a late night soldering session I have the first bit of working hardware. This board contains the program counter, address buffer, and instruction decoding circuitry. I will eventually replace this board with a proper PCB - this construction method is fine until you get a loose connection, and then it's a huge pain. It's fine for now though and it allows me to work on the other parts of the system.

    For the program counter and address buffer, I'm using 4x 74LS469 - a synchronously presettable 8-bit up/down counter with tristate outputs. Pretty much the ideal counter IC - I'm not aware of any other counter that has all the features I want in one IC. None of the 4-bit counters have tristate outputs, so you need double the number of chips plus a couple of octal buffers. The '590 is 8 bits wide with an output enable, but it's not presettable. The '593 is presettable, but only through the single input/output port.

    The '469 does have a couple of issues though. Firstly it's obsolete and quite hard to find - there's a few on eBay. Secondly, for some reason I don't understand, they each consume about 100 mA (the datasheet says 120 mA typ), and get pretty hot! So maybe I'll replace them with something else when I do the PCB.


  • Log #3: Timing

    Kyle McInnes07/16/2022 at 12:54 1 comment

    Thinking about timing and whether I'm happy with this aspect of the design.


    Each instruction consists of some number of operations (phases? cycles?), where the 24 control lines are set appropriately. That information is held in a control store of 3 64Kx8 EEPROMs, which are addressed by the 8-bit instruction register, a 4-bit sequence counter, and the three status flags (for conditional jumps).

    Here's an example of how an instruction is defined:

    add x, #L
        pcinc memrd twr                         ;write literal to T
        regoe alu opadd regwr selx setflags     ;X = ALU(X, T, op=add, setflags=yes)
        pcinc memrd irwr                        ;write next opcode into IR (fetch - common to all instructions)

     So far I've been thinking of the timing like this:

    • On the clock's rising edge, clock the program counter, instruction register and sequence counter.
    • During the high half of the clock, instruction decoding and execution (i.e. ALU computation) is underway.
    • On the clock's falling edge (rising edge of ¬CLK), write back any values to registers, or to memory.

    While this has its problems, it seems to be a popular method with simple homebrew CPUs. For example the very well documented CSCvon8: An 8-bit TTL CPU and the SPAM-1 - 8 Bit CPU.

    Interestingly, many people describe this as "fetch/decode" on the high part of the clock, and "execute" on the low part. Is that correct? Maybe you can argue either way, but to me it makes sense to consider the time it takes for the ALU to produce a valid result as execution time, just as decoding is what is happening while the control store ROMs are producing a valid output. Writing back values to RAM or to registers takes little time in comparison (in the case of registers, no time at all).

    And that's one of this method's weaknesses - pretty much everything happens during only one half of the clock cycle, limiting performance. In my case, it looks like 2 MHz will be the maximum clock rate.


    A design where things only happen on the clock's rising edge would be more pleasing to me - and more performant. But for the sake of simplicity I'll probably keep it the way it is. Things can always be improved in the next design!

  • Log #2: Stacks and subroutines

    Kyle McInnes07/13/2022 at 13:54 0 comments

    When writing assembly, there are no rules about how to pass data to and from subroutines. Each subroutine can do it in a different way, for maximum efficiency. If there are just a few parameters, it probably makes sense to pass them in registers. Extra ones can go on the stack, although this can require a bit of juggling if the same stack is used for return addresses. Alternatively, data can be passed via a "parameter block" in-line with the code - check out this useful resource to see how that was done on the 6502.

    Likewise, return values (and there can be more than one) can go in registers, on the stack, in fixed memory locations, or even in the form of the carry and zero status flags in the case of boolean values.

    Here's an example of a hand-written subroutine in assembly. As the comment says, the inputs are the x and y registers, and the output is returned in b. It modifies the c register, so the caller would need to save its value on the stack, if it was important.

    The "call" instruction is actually a macro that pushes the return address (the one after the call) and jumps to the given address. So it's really two instructions - 6 bytes in total. The "ret" instruction is a real instruction that pops a 16-bit value into the program counter.

    The routine uses a "local" variable, mulbit, which is stored in a fixed location in memory.

                mov x, #56
                mov y, #39
                call mulxy
                ; b == 2184
    
    
    ;mulitply (8bit x 8bit = 16bit result)
    ; xy  inputs
    ; b   output
    ; c   clobbered
    
    mulxy       mov b, #0
                mov cl, x
                mov ch, #0
                stl #1, mulbit
    mloop       mov x, y        ; add if y & bit
                and x, [mulbit]
                jz  mnoadd
                add bl, cl      ; b += c
                adc bh, ch
    mnoadd      add cl, cl      ; c *= 2
                adc ch, ch
                adr mulbit      ; bit *= 2
                ldx
                add x, x
                stx
                jcc mloop
                ret
    
    mulbit      .byte 1

    From assembly to a higher level

    Eventually I'd like to implement a high-level language though (at least higher-level than assembly), and then we do need some rules - a calling convention. This will make heavy use of the stack, for arguments, return values, and each function's local variables. A stack-relative addressing mode, with which we can read and write values on the stack without pushing and popping them, is key, and that's why I have sp+imm8 and sp+imm16 addressing modes in the ISA.

    Because the modes can only add a positive offset to the stack pointer, it makes more sense for the stack to grow down. If it grew up, you would need a negative offset to access a function's parameters.


    So here's what the output might look like from a high-level compiler:

                ; call myfunc with 2 1-byte parameters
                push #12
                push #34
                push #return_addr   ; call
                jmp myfunc          ; call
                ...
    
    
    myfunc      push x          ; callee saves x and y
                push y          
                sub sp, #32     ; allocate space for locals
                ...
                ldx sp+37       ; load parameter
                stx sp+3        ; store local variable
                ...
                call func2      ; call child function
                ...
                add sp, #32     ; deallocate locals
                pop y           ; restore x/y
                pop x    
                pop b
                add sp, #2      ; deallocate parameters
                jmp b           ; return

     This is all very preliminary, but it shows that a high level language could be implemented fairly efficiently.

  • Log #1: New start

    Kyle McInnes07/06/2022 at 19:33 0 comments

    I redesigned everything so here's another overview.

    • 8-bit computer with a 16-bit address bus, probably built on Eurocards
    • Simplicity > performance, but I want it to be a target for a high-level language - it's not "minimal"
    • Starting off with just a UART interface - would like to add other peripherals later (floppy disk, maybe graphics)
    • Clock speed in the few MHz range


    There are eight 8-bit registers, six of which can be combined into three logical 16-bit registers - B, C and SP (stack pointer). T is a temporary register containing the second operand for any ALU operations.

    Instruction set

    To keep instruction decoding simple, all instructions consist of 1 byte of opcode, followed by 0, 1 or 2 bytes of immediate data. This forces some trade-offs in the ISA. Some simple operations require multiple short instructions, but this fact is largely hidden by the assembler.

    For example, ALU operations either add an immediate value to a register, or add the contents of T:

    add x, #32     ; this is a single instruction
    add x, y       ; this is accepted by the assembler
                   ; and turned into:
    mov t, y       ; 69
    add x, t       ; 70

    Memory access requires a separate instruction to load the address buffer A:

    ldx sp+31      ; becomes:
    
    adr sp+31      ; 06 1f
    ldx            ; 20
    

    There are no interrupts to worry about on this system, so the values in A and T don't need to be saved, or given much thought to. A smart assembler/compiler (or human) might be able to keep track of them to avoid redundant writes.

    Not super performant, but I quite like the simplicity of it.

    Addressing modes

    The ISA design allows for many addressing modes. The adr instruction loads the address buffer, and there is one version of adr for each addressing mode - 20 in total:

    adr ADDR            ; absolute 16-bit address
    adr b               ; addr = contents of B register
    adr c               ; addr = C
    adr sp              ; addr = SP
    adr b/c/sp + #L     ; B or C or SP, plus 8-bit literal offset
    adr b/c/sp + #LL    ; B/C/SP plus 16-bit offset
    adr b/c/sp + x/y    ; B/C/SP plus X or Y
    adr sp + b/c        ; SP plus B or C
    adr b + c           ; B plus C
    adra                ; indirect: A = mem[A]

    As above, when writing assembly the adr instruction would not normally be used directly - instead these addresses are used as operands in other instructions:

    ldb sp+4       ; load b from stack pointer + 4    (3 byte instruction: adr #4 ldb)
    add x, [c+y]   ; x = x + mem[c+y]                 (3 bytes: adr ldt add)

    ALU

    The usual stuff - add and subtract with or without carry, inc/dec, compare, and three logic operations (and, or, xor). The ALU will be implemented with two EEPROMs.

    Assembly example

    The assembler has has native support for Pascal-like strings - no null-termination here!

                mov b, #string1
                call prints
                halt
    string1     .string "Hello, world!"
    
    
    UART = $f000
    ;print a length-prefixed string (max len 255)
    ; b   string to print
    ; xyc clobbered
    
    prints      mov c, #UART
                ldx b           ; x = string length (8 bits)
    psloop      inc b
                ldy b           ; read char from string
                sty c           ; write char to uart
                dec x
                jnz psloop
                ret

View all 6 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates