DIP-8 TTL Computer

Project Logs

Collapse

Quick update
kaimac • 01/16/2023 at 23:48 • 0 comments

Since I last posted I did the PCB, got it back and got it working. Here it is:

I'll post a longer update later. I also have a thread going at anycpu.org: http://anycpu.org/forum/viewtopic.php?f=23&t=978
Moving to a single board
kaimac • 11/08/2022 at 20:05 • 1 comment
As I said in the last post, I wasn't happy with the multi-board format, so I've merged my Kicad schematics into one project and I'm going to do a new PCB. It should look roughly like this, measuring 225x150mm:
The PLCC part will be a dual UART (possibly a 68681?). Expansion cards can be plugged into the 40-pin IDC connectors. A couple of things are still missing, mainly a clock source and the address decoding logic.
This is my current plan for the memory map. It makes full use of the 64K SRAM and the 64K EEPROM, and provides another 64K of address space for expansion - leaving options open for a future graphics card. I'll use the UART's output port as the bank selection register, and it should only take a few extra gates to implement this.
```
|---------------| FFFF
|  I/O devices  |        256 bytes (32 bytes each for 8 I/O devices)
|---------------| FF00
|               |
|               |
|    48K RAM    |        Upper 48K of the 64K SRAM
|               |
|               | 
|---------------| 4000   Banked area can be: lower 16K of RAM,
|    16K bank   |         any of the four 16K ROM pages,
|---------------| 0000    or any of the four 16K I/O pages
```
Project update
kaimac • 11/03/2022 at 18:08 • 2 comments
With a few extra parts cobbled together on another eurocard, the DIP8 CPU becomes the DIP8 computer.
From top to bottom we have:
- Reset button
- 16550 UART plus its 16 MHz crystal
- AVR microcontroller - was using this for debugging, now its sole purpose is to generate the computer's 4 MHz clock. Obviously this will be disappearing in the future!
- UM61512AK-15 - 64K RAM (32K usable: 0x8000 - 0xFEFF)
- 74LS688 comparator - selects the I/O devices (currently just the UART) when the top 256 bytes of memory are addressed (0xFF00 - 0xFFFF)
- 64K EEPROM (32K usable: 0x0000 - 0x7FFF)
- Status LEDs - negative, zero, carry
This is really just a prototype, first incarnation of the computer - hence the 32K ROM/32K RAM split which I intend to improve upon in the future.

The system runs happily at 4 MHz, so my mandelbrot test program now only takes 5 seconds. The clock can actually go a bit higher, up to at least 6 MHz, by increasing the duty cycle (only the high part of the clock needs to be 125 ns wide).
So what's next?
- I want a memory banking system to allow use of the full 64K of RAM.
- For developing the OS, I could do with some kind of storage, maybe an SD card interface - flashing EEPROMs gets pretty annoying, even with a ZIF socket! Or maybe I could load programs over the serial link, hmm.
- My main issue at the moment is that I've come to dislike the eurocard/backplane form factor - most of the chips are hidden away :( To its credit, it's a nice way to design something incrementally - being able to knock something up on a protoboard, plug it in, and then replace it with a PCB has worked quite well. But now that I know what the system looks like, I'd prefer it to mostly be on one "motherboard", with some expansion slots for I/O devices.
Mandelbrot
kaimac • 09/24/2022 at 19:07 • 2 comments

While I wait for more parts to arrive, here's a mandelbrot program running in real time in the simulator. I had fun optimising this down from over 30 seconds to 10.5 - couldn't manage to get it below 10 though.
asm source: https://github.com/kylesrm/dip8-computer/blob/main/src/mandelbrot.asm
The terminal is https://github.com/Swordfish90/cool-retro-term. Hopefully one day the real DIP8 will be hooked up to a real amber terminal, I love the look of them.
First run
kaimac • 09/04/2022 at 20:37 • 0 comments

IT'S ALIVE!
The boards are not fully populated yet, and I have no memory or I/O - just an AVR microcontroller emulating a ROM. The three LEDs are the carry, zero and negative flags, and the tiny program in the ROM is just doing a chaser pattern on these.
Projects like this require that you give things fancy names. So here's the "fetch/decode" unit - instruction register, control ROMs, program counter and address register:
And here's the "execute" unit - register file and ALU.
ALU design
kaimac • 08/16/2022 at 13:16 • 0 comments
A lot of the hardware design is done now and I've sent a couple of PCBs off, so I can document the various bits. Here's the ALU:

There are two 64K x 8 EEPROMs, each generating 4 bits of the result. Both ROMs use the same image, but A15 is pulled low on one and high on the other, so they can behave slightly differently. The "A" operand comes from the register file and the "B" operand is hardwired to a temporary register called T. To do x=x+y, y is first moved to t, and then "add x, t" will add t to x.

There are 16 possible operations, set with the four AluOp bits:
- 2x pass-through operations, Q=A and Q=B. Needed to store a register into memory for example, as the only way to read a register is through the ALU. Q=B is used by some instructions that use the T register as a temporary location. For example the "stl" instruction (store literal to memory) first writes the literal value to T, and then writes it back to memory.
- 6x arithmetic: add, sub, adc, sbc, inc, dec. These are exposed in the instruction set.
- 3x logic: and, or, xor. These are also available as instructions.
- 2x conditional increment: ci increments if the carry is set, and cd decrements if the carry is clear. These are used by the microcode to (for example) add an 8-bit offset to a 16-bit address.
- 2x operations (ror1 and ror2) that together perform a rotation right by one bit - explained below!
- 1x operation (sig) that makes the next comparison signed - also explained below!
These are all defined in a Python script that generates the ROM image.

Flags

The three status flags come out of the ALU and are stored in a register. The high ROM outputs the final carry and the negative/sign flag, which is equal to bit 7 of the output. The zero flag is the zero output of both ROMs, ANDed together.

There is another, hidden, "internal" carry flag, stored in U6. This is used by instructions that need to use the ALU to do 16-bit operations, without disturbing the normal status flags. An example is the push instruction: after storing the given register at the stack pointer address, it has to decrement the SP. It first does a dec on the SP low byte, and then a cd on the high byte, which decrements it if there was an underflow on the low byte. The internal carry stores the carry across these operations, keeping the status flags unchanged. The nSetFlags pin tells the ALU which flags to use and update.

Rotate

A ROM-based ALU is theoretically a very powerful thing. You can have lookup tables in there for any function you like: multiply, divide, sine, cosine, shifts by an arbitrary number of bits. Except to make that work you need a single ALU chip, where the full widths of each input are available. My high ROM only has the upper four bits of each input to work with, along with the one-bit carry output from the low ROM - so no fast multiply for me.

I realised though that there's also a one-bit communication channel from the high ROM to the low ROM - through the carry flag. And this is enough to allow right shifts or rotations - it just takes an extra cycle:
```
input      76543210 C
output     C7654321 0     input is rotated right through the carry flag


            Cin  Lo   Cnib Hi   Cout     Cnib = carry from lo to hi ROM
input       -C-> 3210      7654
after ror1       C321 -0-> 0765 -4->     Each nibble is shifted right into the carry out. Carry in goes into the hi bit
after ror2  -4-> 4321 -C-> C765 -0->     Hi bit to carry out, Carry in to hi bit.
result           4321      C765  0       This is the correct result
```
The ror instruction just does ror1 and ror2 sequentially, and there you go - rotate right using a 4-bit ROM-based ALU.

Why is ror a useful instruction to add? I didn't really understand the need for rotate instructions until I started reading about how to do multiplication and division on 8-bit machines. You can think of rotates as the "with carry" version of logical shifts - you can use them to chain shifts together to work on values wider than 8 bits. I can already shift and rotate left,...
Read more »
Log #6: Accidentally wrote a C backend
kaimac • 08/08/2022 at 21:21 • 0 comments
Getting C programs running on this machine was never one of my goals, but I was looking idly at the documentation for LLVM and GCC, wondering how hard it would be to write a code generator. It seemed like far too big a task and I was happy to let it go, and then I stumbled across the vbcc compiler. It has many different backends, including some small 8-bit processors and a clean "generic" 32-bit RISC. Each backend is basically a single 1000-2000 line C file. Those examples, coupled with this useful document were enough to get me hacking together a DIP8 backend. It is quite a complicated and frustrating process, but the nice thing is you can add functionality bit by bit, and before you know it you can compile some fairly interesting C programs.
I haven't got function calls, structs or arrays working properly yet, but I can compile the C version of the Byte sieve test that I previously wrote in assembly:
```
#define size 8191

int main(void) {

    unsigned int count;
    unsigned int prime;
    unsigned int k;
    unsigned char *flags = 0x1000;

    count = 0;
    for (unsigned int i=0; i<size; i++) {
        flags[i] = 1;
    }
    for (unsigned int i=0; i<size; i++) {
        if (flags[i]) {
            prime = i + i + 3;
            k = i + prime;
            while (k < size) {
                flags[k] = 0;
                k += prime;
            }
            count = count + 1;
        }
    }

    return count;

}
```
This works, and to my amazement it's actually quite fast - only 20% slower (and 50% bigger in size) than my hand-crafted assembly.
In the process of doing this, I added some more instructions that C likes to use a lot and were otherwise a bit difficult - signed comparisons, rotate right, some more 16-bit arithmetic and a way to do arithmetic on variables in memory, without loading into a register. I'll go through those in another log.
I would like to at some point write a tutorial on how to write a vbcc backend for a homebrew CPU, as the vbcc code isn't very easy to read and the docs are a bit lacking in places. Actually the most useful backend to look at is for the "Z-machine", a VM used for Infocom text adventure games, as it's well commented - link here.
Log #5: Emulation and testing performance
kaimac • 07/24/2022 at 21:07 • 0 comments
I had the idea to rewrite my emulator so that it uses the same decoder ROM images that the real machine will. This was quite a good move - the emulator now stays in sync with any changes I make to the instruction set. It also means the emulator is now cycle accurate - it can tell me exactly how long a program will take to run. With that, I thought I'd implement the "Byte sieve" and see how my architecture performs.
```
Statistics:
    328882 instructions
    1184625 cycles
    0.5923125 sec at 2.0 MHz
    3.60 cycles/inst
```
I was surprised by this - one iteration of the sieve (of size 8191, the standard size) takes 0.59 seconds. How does that compare to other processors? I found this article from 1983 that lists reader-submitted times for various systems and languages. Those times are for 10 iterations so I need to multiply my number by 10:
```
1 MHz 6502  asm   13.9
? MHz Z80   asm    6.8
5 MHz 8088  asm    4.0
8 MHz 8086  asm    1.9

2 MHz DIP8  asm    5.9
```
Not bad! It performs about the same as a 6502 at the same clock rate, which I wasn't expecting. Maybe the 6502 code wasn't a very efficient implementation. Anyway, it's just one benchmark. I have some nice addressing modes and 16-bit registers which will have helped here, but modifying variables in memory is quite clunky at the moment. Luckily I have a plan to fix that.

The downside to this new emulator is that it's a bit slow - it can't run in real time. Which I think is a bit funny - my Python code running at 3 GHz can't emulate a system running at a puny 2 MHz!

Byte sieve assembly code is here for anyone interested: https://github.com/kylesrm/dip8-computer/blob/main/src/sieve.asm
Log #4: First board
kaimac • 07/20/2022 at 17:54 • 0 comments

After some breadboarding and a late night soldering session I have the first bit of working hardware. This board contains the program counter, address buffer, and instruction decoding circuitry. I will eventually replace this board with a proper PCB - this construction method is fine until you get a loose connection, and then it's a huge pain. It's fine for now though and it allows me to work on the other parts of the system.

For the program counter and address buffer, I'm using 4x 74LS469 - a synchronously presettable 8-bit up/down counter with tristate outputs. Pretty much the ideal counter IC - I'm not aware of any other counter that has all the features I want in one IC. None of the 4-bit counters have tristate outputs, so you need double the number of chips plus a couple of octal buffers. The '590 is 8 bits wide with an output enable, but it's not presettable. The '593 is presettable, but only through the single input/output port.

The '469 does have a couple of issues though. Firstly it's obsolete and quite hard to find - there's a few on eBay. Secondly, for some reason I don't understand, they each consume about 100 mA (the datasheet says 120 mA typ), and get pretty hot! So maybe I'll replace them with something else when I do the PCB.
Log #3: Timing
kaimac • 07/16/2022 at 12:54 • 1 comment
Thinking about timing and whether I'm happy with this aspect of the design.

Each instruction consists of some number of operations (phases? cycles?), where the 24 control lines are set appropriately. That information is held in a control store of 3 64Kx8 EEPROMs, which are addressed by the 8-bit instruction register, a 4-bit sequence counter, and the three status flags (for conditional jumps).

Here's an example of how an instruction is defined:
```
add x, #L
    pcinc memrd twr                         ;write literal to T
    regoe alu opadd regwr selx setflags     ;X = ALU(X, T, op=add, setflags=yes)
    pcinc memrd irwr                        ;write next opcode into IR (fetch - common to all instructions)
```
So far I've been thinking of the timing like this:
- On the clock's rising edge, clock the program counter, instruction register and sequence counter.
- During the high half of the clock, instruction decoding and execution (i.e. ALU computation) is underway.
- On the clock's falling edge (rising edge of ¬CLK), write back any values to registers, or to memory.
While this has its problems, it seems to be a popular method with simple homebrew CPUs. For example the very well documented CSCvon8: An 8-bit TTL CPU and the SPAM-1 - 8 Bit CPU.

Interestingly, many people describe this as "fetch/decode" on the high part of the clock, and "execute" on the low part. Is that correct? Maybe you can argue either way, but to me it makes sense to consider the time it takes for the ALU to produce a valid result as execution time, just as decoding is what is happening while the control store ROMs are producing a valid output. Writing back values to RAM or to registers takes little time in comparison (in the case of registers, no time at all).

And that's one of this method's weaknesses - pretty much everything happens during only one half of the clock cycle, limiting performance. In my case, it looks like 2 MHz will be the maximum clock rate.

A design where things only happen on the clock's rising edge would be more pleasing to me - and more performant. But for the sake of simplicity I'll probably keep it the way it is. Things can always be improved in the next design!