YATAC78 - The WWW TTL Computer

Retro computer built from 1978-era TTL logic chips. Internet capable with built in web browser and server

Public Chat
Similar projects worth following
Can you browse the Web using pre-1980 TTL logic and memory speeds? The goal of this project is to demonstrate how. Internet connectivity is via an era-appropriate RS232 interface. The machine is upward compatible by a decade to support currently available keyboard and video interfaces (PS/2 and VGA). The video includes a native text mode capable of displaying 80-columns and two bitmapped color graphics modes for retro gaming.

YATAC78 - Yet Another TTL Archaic Computer (1978)

  • Dual Processor CPU/GPU (Harvard Architecture).
  • 32 MHz dot clock, 16 MHz machine clock, 8 MHz (4 MIPs) per processor.
  • 256k ROM: 128k program, 60k ALU, 44k relocatable code, 24k fonts.
  • 512k RAM: 8 banks of 64k, including 44k display.
  • 55 ALU functions including multiply/divide
  • Bitmapped Graphics 2 resolutions (75Hz refresh): 8 color hi-res mode at 320x240 (4 dithering patterns) or 256 color low-res mode at 160x120 (double buffered).
  • Text Mode 640x480 native resolution (75Hz refresh). 4 fonts, 80x30 using 8x16 glyphs or 80x60 using 8x8 glyphs. 256 line buffer for 10-page smooth scroll.
  • 8-bit PCM Audio with 20-18kHz bandwidth.
  • PS2 Keyboard interface built in.
  • RS232 Serial Port for host/client and network connectivity (up to 38400 baud).
  • Parallel Port for expansion (up to 8 additional registers out, 1 register in).
  • Chip Count 37 TTL, 3 analog, plus a single PAL, ROM, and RAM.
  • Target PCB size single 8" x 5" (203 x 127mm) 4-layer board.

The system bus is clocked at 16MHz and a typical CPU instruction spans 4 clock cycles as follows:

  1. Load Instruction from ROM.
  2. Read data from source register or RAM.
  3. Perform an ALU function using the ROM as lookup table.
  4. Write data to accumulator, a register, and optionally to RAM.

The alternating use of both ROM and RAM allows a second processor to be added to the system. Both processors use dedicated pipelines to cache data between the alternate program and data address spaces. One processor handles serial communications and general computational tasks (CPU) while the other is dedicated to the display and audio (GPU).

The following sequence of diagrams demonstrates the multiplexing of the CPU (shown in blue) and GPU (shown in red). In this example the GPU is operating in text mode and the CPU is executing the sequence described in the numbered list above.

In the first cycle the GPU reads the ASCII code point of a character from the RAM and stores the result in the GPU Cache (gc). The CPU addresses the ROM using the Program Counter (PC) and Page Register (Pg) to load the Instruction Register (I).

In the next cycle the context switches over and the CPU X and Y registers are used to address the RAM and load the CPU Cache (cc). Meanwhile the gc, along with the Scan Counter (SC), is used to address the ROM and load a character bitmap line in to the Glyph Register (G)

The GPU returns to the RAM where the H counter was moved to the next byte and loads the gc with text color values. The ROM is now configured as an ALU with a function specified in the instruction. The cc is combined with one half of the HL register and the result is stored in the Accumulator (A).

... Read more »


Memory map of RAM and ROM address layout.

image/png - 56.12 kB - 07/08/2019 at 04:02



Simulation of CPU State Machine from WinCUPL.

Adobe Portable Document Format - 146.81 kB - 07/04/2019 at 19:22



Schematic of complete system (CPU, ECU, GPU, I/O)

application/pdf - 940.18 kB - 07/04/2019 at 14:29



Bitmap image of Font Rom

Portable Network Graphics (PNG) - 10.29 kB - 06/17/2019 at 20:52



(Obsolete) Schematic of initial design

application/pdf - 1.02 MB - 05/31/2019 at 23:04


  • 14 × 74F574 Octal D-type Flip-flop with Tri-state Outputs
  • 4 × 74F163 Synchronous 4-Bit Binary Counter
  • 1 × 74F257 Quad 2-line to 1-line Multiplexers with Tri-State Outputs
  • 2 × 74LS139 Dual 2-line to 4-line Decoders
  • 3 × 74F32 Quad 2-Input OR Gates

View all 21 components

  • 16-bit Instructions

    Alastair Hewitt06/24/2019 at 05:16 0 comments

    The redesign continued to ECU section. The changes are fairly significant, so much so that the breadboard build needs to start over. It was almost back to the drawing board, but the CPU section remains fairly intact. The result is another reduction in chip count to bring the TTL count down to 37 chips.

    The original design packed the instructions into just 8-bits. The instructions need to define the instruction type, ALU function, data source, and destination register. The original encoding resulted in a lot of limitations in the available instructions to pack so much into so little space.

    The instructions do provide everything needed to write code, but actual programs were quickly exposing the instruction set limitations. Quite often additional instructions are needed to move the result from the ALU to the desired register. There is no specific move instruction, so this requires the ALU identity function to target the final register. This results in a lot of wasted cycles.

    The solution is to expand the size of the instructions. The ROM is still 8-bits wide, so this requires an additional cycle to load another byte to get to 16 bits. However, one bit of the first instruction byte can be used to tell the state machine if a second instruction byte should be loaded. This allows variable-length instructions, with a limited set of 7-bit instructions and a full set of 16-bit instructions.

    The new design is using instructions of this format:

    Destination is one of the 8 possible registers (Pg, PC, SC, V, HL, E, X, Y). Source is one of 4 data sources (A, X, E, RAM). There are 8 possible opcodes, with the following 6 defined:

    • LD - load operand (source not used)
    • LDC - load conditional (source defines condition)
    • MV - move source to destination
    • FNH - ALU unary function defined by H - destination = FNH(source)
    • FN4 - nibble-wide ALU binary function
    • FN8 - byte-wide ALU binary function

    The first 4 opcodes define 7-bit instructions. The most-significant bit of these is high and this stops the state machine from loading the second byte. The most-significant bit is also the output enable (active low) of the second instruction register, so the second register is tri-stated and the value pulled high to 0xFF. The ALU binary functions require all 16-bits, so the most-significant bit of the last two opcodes is low. Along with the 4-bit ALU function there are some additional bits as follows:

    The /WE bit (active low) enables write enable on the RAM cycle and stores the result in memory. The ZP bit (active high) specifies the zero page when the RAM is addressed. In a similar way, the ZB bit (active high) specifies the zero bank (the memory bank that contains the display). The EXT bit (active low) specified a set of extended registers and will switch the destination from the internal 8 registers to 8 possible external registers.

    The default values of the second instruction byte are therefore: internal registers, zero bank and page, memory read only, FNH ALU function. This means the RAM source comes from the zero page in the zero bank for the 7-bit instructions.

    This instruction format now defines almost everything needed in a single instruction. This improves performance, even though some instructions require an additional cycle to load the second instruction byte. The new format is using about 60% of the cycles to run the same code. The simpler encoding also reduces the amount of logic in the ECU, so a net reduction in chips, even though a second instruction register was added.

    Next up is a new schematic, complete redesign the PAL, and redo the entire breadboard... so basically, back to square one :(

  • Yet Another Redesign

    Alastair Hewitt06/20/2019 at 02:34 0 comments

    In keeping with a lot of the previous logs, the recent GPU writeup is now obsolete. As predicted, better insight gained through the software design is driving hardware changes. In this case the GPU can be simplified by moving the slower counters to software.

    The original GPU design takes care of almost all video timing, so the CPU interpreter loop can synchronize with a standard baud rate. If the baud rate can be aligned with the display then the same timing overhead can be used for both serial and video. This can be achieved by using a horizontal scan frequency of 38.4 kHz and a baud rate of 38400 (or dividing down to 19200, 9600, etc).

    Running the VESA GTF against this for a standard vertical frequency gives the following mode line:

    # 640x490 @ 75.00 Hz (GTF) hsync: 38.40 kHz; pclk: 31.95 MHz
    Modeline "640x490_75.00"  31.95  640 672 736 832  490 491 494 512  -HSync +Vsync

    The exact dot clock is 31.9488 MHz and this is available in a 31.95 MHz crystal. The mode defines 490 lines, but only 480 would be displayed with 5 additional vertical blanking lines at the top and bottom of the screen.

    With this change both the scan and vertical counters can be eliminated and replaced with registers that are updated every horizontal scan. An additional chip of gates is also eliminated to reduce the design by a total of 4 chips.

  • ROM

    Alastair Hewitt06/17/2019 at 03:37 0 comments

      Focus shifted to building the ROM this weekend. Here's a quick overview of the contents:

      1. Native Program (128k) - code executed natively by machine.
      2. ALU (56k) - lookup tables containing ALU results.
      3. Relocatable Code (40k) - code executed by interpreter.
      4. Fonts (32k) - binary fonts used in text mode and dithering patterns used in hi-res graphics mode.

      The machine uses an 8-bit program counter (PC) and 8-bit page register (Pg). An additional bit of state (bank) is held by the CPU state machine to define two banks of 64k, providing the 128k of address space.

      The upper 128k contains various lookup tables. These can be split into two general sections: 96k of ALU functions and 32k of fonts. The ALU contains 56k of math and logic functions, including four 8-bit wide binary functions (ADD, SUB, AND, OR), three 4-bit wide binary functions (MUL, DIV, MOD), and 48 unary functions (discussed in earlier logs).

      The remaining 40k of ALU functions are reserved for relocatable code. These act like the other functions but return a byte of code as the result of the function. This may sound odd, but it is the most efficient method of reading data from the ROM using the Harvard Architecture. The alternative is to write a program that would load a the byte of code as an operand, write it to the RAM, then increment a pointer to the next memory location. It would take at least 3 bytes of native code to write each operand to memory, requiring almost all 128k of program memory to load 40k of interpreted code.

      The final part of the ROM are the fonts. There are two sets of four fonts. The first set use 8x8 glyphs and the second use 8x16 glyphs. The initial plan was to have bold and italic fonts, but this really isn't possible at 8-pixels wide! There are some other options though and these can be broken down as follows:

      1. Thick Serif
      2. Thin Serif
      3. Thick San-serif
      4. Thin San-serif

      Two sets of these fonts were selected from The Ultimate Oldschool PC Font Pack. It was quite challenging to process the old bitmap files, but this excellent resource was able to pull out the data and even render it as simple text files. From there a script packs the fonts into the 32k font area of the ROM. A test script was used to verify the ROM and generate the following PNG:

      All the fonts can be seen one after another. The first 4 fonts only take up 8 lines and the other 8 lines are used for dithering patterns in the hi-res graphics mode (discussed in the last log).

  • GPU - part 2

    Alastair Hewitt06/12/2019 at 15:49 0 comments

    The GPU uses three 4-bit counters (scan counter and V register) to control the vertical resolution of the display. This combined 12 bits can render up to 4096 vertical lines. Only part of this range can be displayed at standard video refresh rates using the 30.875 MHz dot clock though. The display is limited to 512 lines at 60 Hz or 400 lines at 75 Hz. Overclocking allows more lines with the potential for 1024 lines using a 64 MHz dot clock.

    As well as the counters, he GPU uses two additional registers: The Color Register (C) and Glyph Register (G). The C register stores the current color(s) being displayed and can be used in two ways: It can represent a single 8-bit color as a 3:3:2 RGB value, or two 3-bit RGB colors and a 2-bit font value. Note: The pipeline timing dictates this 2-bit value will select the font of the next character.

    The G register acts as a pipeline to hold the next glyph pattern while the current one is being rendered. The value of the G register is loaded into a shift register (SR) that is clocked at the 30.875 MHz dot clock. The output of this shift register is used to select one of the two 3-bit colors stored in the C register. The C register and this multiplexer can be thought of as an extremely simple RAMDAC.

    In text mode the GPU uses two process cycles per character: The first to load the G register and the second to load the C register. Both must be synchronized so the shift register load happens at the same time as the C register load. The exact timing is shown in the diagram below.

    The GPU repeats the same line from the video memory either 8 times in hi-res text mode, or 16 times in lo-res text mode. The scan counter counts to either 8 or 16 to select the specific line from the font ROM to render for the character's glyph.

    The hi-res graphics mode uses all the same logic as the text mode, but operates on a single process cycle. The C and G/SR load signals now happen at the same time, so the same byte is loaded by the C register and passed though the font ROM to load the G register.

    The GPU reads the same line only twice in the hi-res graphics mode and the shift register will output only 4 bits before the next value is loaded. The rendered character is therefore 4x2 pixels rather than the hi-res 8x8 or lo-res 8x16 text mode characters. This mode uses a special font where all the characters have the same glyph. There are still 4 fonts available though, since only 6 bits are used for the two colors.

    The first font (dither 0) consist of a 2x2 block of foreground pixels followed by a 2x2 block of background pixels. The other 3 fonts provide dithering patterns that blend the two colors to provide a wider pseudo palette of up to 32 colors. The patterns also alternate on odd/even lines as shown below repeated 4 times.

    The example below shows how the dithering (on the left)  is used to represent an intermediate color (shown on the right). Note: The hi-res graphics mode defines a resolution of 384x256 pixels, but the dithering is rendered at the native 768x512 resolution.

    Finally, in lo-res graphics mode all the glyph logic is bypassed and the C register is treated as a single 8-bit color value via its own video DAC. The GPU reads the same line four times to define a resolution of 192x128 with a simple one-byte-one-pixel format. Only half the video memory is used for one screen in this mode, so it provides room for double buffering. This is essential to prevent flicker when updating sprites on the screen, making it the preferred video mode for retro games.

  • GPU - part 1

    Alastair Hewitt06/09/2019 at 01:21 0 comments

    The GPU consists of five 74F163 4-bit synchronous counters. The system is fully synchronous, so all counter receive the same 7.72 MHz clock regardless of the rate at which they ultimately count. The count is controlled by cascading the RCO/ENT (TC/CET) signals between the counters.

    The first two counters form the Horizontal (H) register and connects to the lower half of the RAM address bus. The RCO signal from the second counter of the H register is used to reload the lower counter with a value of 12. This creates a count cycle that rolls over to 12 rather than 0 for a total count sequence of 244.

    The H register is used to generate the horizontal timing signals and these are based on the VESA display timing formula (DTF). Generally this specifies a horizontal blanking (H-blank) period of 20% (48.8 cycles) and a horizontal sync (H-sync) period of 8% (19.52 cycles).

    The H-sync is active when the upper 3 bits of the H register are low, so when the count is less than 32. Since the count starts at 12 the H-sync period lasts for 20 cycles, or 8.2% of the horizontal scan. The H-blank is active when the upper 2 bits of the H register are low and the next most significant 3 bits are not all high. This is the case when the count is less than 56 (64 - 8). Again, since the count starts at 12, this actually translates to 44 cycles. The RCO signal is also added the H-blank period, so the total blanking period is 45 cycles. This is slightly short to allow two cycles of overscan at the start and end of the blanking period to reach the required 49 cycles, or 20.08% of the horizontal scan.

    The H-blank signal is used to inhibit the video DAC switch and effectively turn off the video signal for the 45 cycles of the blanking period. The video signal will be output for the other 199 cycles, where at least 4 of these cycles are still in the DTF blanking period. The nominal width of the display is 192, so the first 4 and last 3 cycles are considered to be overscan and would typically be blank pixels. However, the option exists to shift the screen slightly to the left or right to accommodate different monitors.

    The RCO signal of the H register is used to enable the clock of the Scan Counter (SC). This is a single 4-bit counter that can count up to 16 horizontal lines. The outputs of the SC are combined, typically with AND gates, to create four additional RCO signals. These signals go high when the count reaches 1, 3, 7, or 15. The first of these signals is just the lower bit of scan counter and the last signal is the actual RCO of the counter.

    These RCO signals allow the next counter to clock on every 2, 4, 8, or 16 horizontal lines. The specific signal depends on the video mode selected and will be described in details in the next log. The selected signal is used to enable the clock of the remaining two counters that form the Vertical (V) register. This register connects to the upper half of the RAM address bus allowing the combined H and V registers access to 61k of RAM (remember, the H register starts at 12).

    The vertical count does not reset and will continue until it wraps around and returns to zero. The vertical register is accessible to the CPU though as one of it's 8 register targets. It is the responsibility of the CPU to reload the V register at the vertical scan rate of the video display. The CPU is also responsible for generating the vertical blank and sync signals by setting the appropriate bits on the extended (E) register.

  • Single Board?

    Alastair Hewitt06/03/2019 at 04:50 0 comments

    A few more minor updates were made to the circuit. The original design used a diode ROM to select the RAM or register that is output to the data bus of the data space. It started out fairly complex, but there were ways to rationalize it by optimizing the instruction encoding. This ROM eventually boiled down to just two 4-input AND gates and it was worth the extra chip to just replace the ROM with a 74F21.

    I don't have a 74F21 on hand, so the prototype build got blocked. This provided some time to start on the PCB layout. The original plan was to use two boards with a riser and the schematic showed two pairs of 2x20 headers to join the boards. The board size requirements are starting to drop with the elimination of the diode ROM and a switch to using resistor networks for the video DACs. Switching to a single board would eliminate the risers and save even more room.

    The current enclosure will fit a board up to 5" x 8". It now looks possible to squeeze everything on to a single board of this size. An example layout is shown below:

    Not a lot of thought was put in to the layout of the chips, other that to see if they could be placed around the mounting holes a leave a central gap for the decoupling capacitors. The autorouter gave up with this and a lot more thought will be needed to see if it is even possible to configure the board to route at this density. I'll continue along this path though and see if I can make it work.

  • Virtual CPU 2

    Alastair Hewitt05/28/2019 at 15:52 4 comments

      An initial sketch of the interpreter code has been completed. This was the first time a real program was created using the native machine code. This process exposed some limitations and an optimization in the current design. Changes have been made to the schematic to reflect the following changes:

      1. The accumulator is always loaded after an ALU function.
      2. The program counter replaces the accumulator in the set of 8 register targets.
      3. The dual 4-bit buffer is eliminated from the ECU (-1 chip).
      4. Additional logic added to support banked RAM (+1 chip).

      The hope was to produce something that runs at close to the native speed of the emulated CPU. This will not be possible though. Interpreters are not very efficient and the final implementation will probably operate at around 1/4 of the emulated CPU speed. However, the native machine code can be used to add efficient system calls for accessing and controlling the peripherals (audio/video/serial).

      The interpreter uses the zero page to store the virtual CPU registers. These include things like a virtual program counter and stack pointer. Many of these virtual registers are 16 bits and need to be loaded, incremented or decremented, then saved back to the zero page. Additional conditional checking is required to determine if the most significant byte needs to change when the least significant is updated.

      Once the program counter is updated the instruction it points to can be read. This will then drive a switch statement to select the code that implements the instruction. There are various ways to make this switch. The most efficient is to use the opcode as an offset to the native program counter. This was the rationale behind the first change above. A custom ALU function can be added to define this offset, but even then, there isn't enough space in a single page to implement all the instruction emulation code.

      The current design will use three jumps to select the instruction code. The first will jump within the page to one of several fork points. Each fork then jumps to a new page that branches within that page to specific code that implements the instruction. There is one additional page jump at the end to return to the start of the interpreter loop. The first jump could define up to 64 pages, each of which could contain code for 16 instructions each. This would provide room to support 1024 op codes.

      The total overhead for just this instruction decode is around 30 process cycles. The actual instruction implementation would probably require a similar amount of cycles to complete. A total of around 60 cycles per instruction translates to around 0.125 MIPS. This is about 1/4 of an original 1 MHz 68xx processor that could perform around 0.425 MIPS.

      The last change listed above is aimed at supporting FUZIX.  This is designed for 8-bit CPUs, but requires more than just 64k of RAM. More memory requires Banked RAM to switch between different address spaces.  This can be achieved on the YATAC by using the extended register to define additional address bits for the RAM. Two bits are used to support four address spaces, with the GPU automatically switching to the highest bank to access the display RAM.

  • Virtual CPU

    Alastair Hewitt05/23/2019 at 02:37 0 comments

    The project is 3 months old. The CPU is running and boots up correctly. All the timing has been optimized and operates at up to 32 MHz (requiring a 64 MHz dot clock). The project even made it to the front page.

    So far about 40% of the circuit has been built and tested on breadboard. The design is now finalized enough to commit to a proper CAD drawing of the schematic. This took several days to complete, which explains why the breadboard has stalled at 40%. It will take at least another month to finish the build and get the GPU generating a video signal. The initial PCB routing will start in parallel, so that may be completed around the same time.

    The software design is also starting to pick up pace. This machine follows a similar design to the Gigatron in using the Harvard Architecture. There are two separate address spaces for both the ROM and the RAM. The ROM owns the Program Space identified in the schematic using the PA[0..15] and PD[0..7] bus labels. The RAM owns the Data Space identified in the schematic using the DA[0..15] and DD[0..7] bus labels. Only code in ROM can be executed and only data in the RAM can be accessed.

    The machine code is designed to be more like a microcode rather than something you would use to write general purpose programs with. These low-level instructions can be combined to build up a more expressive instruction set in the context of a virtual machine. This machine would have the more familiar Von Neumann Architecture where both the code and data exist in the same Data Space of the underlying Harvard Architecture.

    The plan is to take an existing CPU and build a virtual machine to be binary compatible with it, rather than design a new instruction set from scratch. The main advantage with this approach is to leverage existing tool chains and software for that processor. In theory, an existing C compiler for that processor can be used to build executable code, which would greatly reduce the software development overhead.

    My inclination is towards something in the 68XX family. These have a simple elegance that should translate well to the VM. The current target CPU is the 6809 and might include some or all of the 6811. There is also a lot of retro love for the 6502 and it may be possible to have a reduced 6502 mode. All the CPUs in the 8-bit Motorola lineage share a lot in common, so it's just a variation on a theme.

  • ALU and Instruction Set

    Alastair Hewitt05/13/2019 at 15:19 0 comments

      Hardware testing is complete on all the jump and branch instructions. So that's the first 8 out of 256 instructions tested! The next 24 are loading operands. These should work fine since the jump/branch instructions are also loading operands in order to update the PC and Pg register. All the rest are ALU instructions, so it's time to work on the build script to generate the 96k of lookup tables.

      First off is to define the functions. There's room for 8 full byte-wide, 4 half nibble-wide, and 64 unary functions. One of the unary functions has to be the identity (do nothing) so I don't count that in the total of 75 functions.

      The 8 byte-wide functions are the classics:

      1. ADD - Addition
      2. DAD - BCD addition
      3. SUB - Subtraction
      4. DSB - BCD subtraction
      5. AND - Logical AND
      6. OR - Logical OR
      7. XOR - Logical Exclusive-OR
      8. CMP - Compare (returns 0 if equal, else -1)

      The 4 nibble-wide functions are used for multiplication, division, and modulo. These would be used to multiply two nibbles to get a byte, or divide a byte by a nibble to get a nibble.

      The 64 unary functions are contained in 4 sets of 16. One of these could contain math related functions consisting of the following:

      1. SQRT - Square Root
      2. POW2 - Square (x**2)
      3. POW3 - Cube (x**3)
      4. INV - 1/x
      5. SIN - sin(x)
      6. ASIN - arc sin(x)
      7. COS - cos(x)
      8. ACOS - arc cos(x)
      9. TAN - tan(x)
      10. ATAN - arc tan(x)
      11. EXP - e**x
      12. LN - natural log(x)
      13. LOG - base 10 log(x)
      14. LOG2 - base 2 log(x)
      15. ABS - absolute (remove sign)
      16. ?? - ran out of ideas :(

      These math functions may look impressive, but they have a very limited dynamic range at only 8 bits wide. These can not be used directly to build a real floating-point library, but they can provide short cuts in making a real library faster. They could be used directly for demo-grade things like a Mandelbrot program, or to draw a circle on the screen. The circle should be clean if the radius is kept below 128, which is realistic in both the low and hires graphics modes.

      The other sets would contain functions related to graphics, serial communication, keyboard scan codes, interpreter jump offsets etc. I have some ideas, but not worth finalizing at this point. The last set (FN3/FNH) are the most used and contain the typical unary functions you would see on other processors:

      1. INC - Increment (x+1)
      2. DEC - Decrement (x-1)
      3. INC2 - Double Increment (x+2)
      4. DEC2 - Double Increment (x-2)
      5. 1COM - One's complement (invert bits)
      6. 2COM - Two's complement (invert bits + 1)
      7. ROR - Rotate Right
      8. ROL - Rotate Left
      9. LSR - Logical Shift Right
      10. LSL - Logical Shift Left
      11. ASR - Arithmetic Shift Right
      12. ASR4 - Arithmetic Shift Right by 4 (move upper nibble to lower preserve sign)
      13. SR4 - Shift Right by 4 (move upper nibble to lower)
      14. SL4 - Shift Left by 4 (move lower nibble to upper)
      15. SWAP - Swap nibbles
      16. IDEN - Identity function (x = x)

      The instruction set can now be derived based on the functions defined above. The following lists all 256 instructions:

      00: NOP
      01: JMP D
      02: BPZ D
      03: BN D
      04: PJ D
      05: PJT D
      06: PBPZ D
      07: PBNT D
      08: LD PC, D
      09: LD V, D
      0A: LD X, D
      0B: LD Y, D
      0C: LD HL, D
      0D: LD E, D
      0E: LD EX, D
      0F: LD EY, D
      10: LDZ PC, D, RAM1
      11: LDZ V, D, RAM1
      12: LDZ X, D, RAM1
      13: LDZ Y, D, RAM1
      14: LDZ HL, D, RAM1
      15: LDZ E, D, RAM1
      16: LDZ EX, D, RAM1
      17: LDZ EY, D, RAM1
      18: LDZ PC, D
      19: LDZ V, D
      1A: LDZ X, D
      1B: LDZ Y, D
      1C: LDZ HL, D
      1D: LDZ E, D
      1E: LDZ EX, D
      1F: LDZ EY, D
      20: ADD A, HL, RAM0
      21: DAD A, HL, RAM0
      22: SUB A, HL, RAM0
      23: DSB A, HL, RAM0
      24: AND A, HL, RAM0
      25: OR A, HL, RAM0
      26: XOR A, HL, RAM0
      27: CMP A, HL, RAM0
      28: ADD RAM0, HL
      29: DAD RAM0, HL
      2A: SUB RAM0, HL
      2B: DSB RAM0, HL
      2C: AND RAM0, HL
      2D: OR RAM0, HL
      2E: XOR RAM0, HL
      2F: CMP RAM0, HL
      30: ADD A, HL, RAM1
      31: DAD A, HL, RAM1
      32: SUB A, HL, RAM1
      33: DSB A, HL, RAM1
      34: AND A, HL, RAM1
      35: OR A, HL, RAM1
      36: XOR A, HL, RAM1
      37: CMP A, HL, RAM1
      38: ADD A, HL
      39: DAD A, HL
      3A: SUB A, HL
      3B: DSB A, HL
      3C: AND A, HL
      3D: OR A, HL
      3E: XOR A, HL
    Read more »

  • Clock Circuit

    Alastair Hewitt05/09/2019 at 02:35 0 comments

    The clock circuit consists of a Pierce Oscillator running at 30.875 MHz and two Johnson Ring Counters. The first ring counter (mclk/nclk) consists of a single flip-flop to divide the 30.875 MHz dot clock (dclk) down by a factor of 2. The second (pclk/qclk) consists of two flip-flops to divide the dot clock down by a factor of 4. The last stage of this counter is duplicated to provide an additional set of identical clocks.

    Some considerations in this design:

    • A single clock source is used to derive all the other clocks, rather than feeding the clock of one flip-flop with the output on another. This keeps all the clock edges aligned.
    • The rising edges of the mclk and the pclk/qclk must to be synchronized. This requires the flip-flops to be reset on initialization or after power fluctuation.
    • The pclk and qclk control complementary bus contexts. The complementary outputs of a single flip-flop keeps these perfectly symmetrical and avoids bus contention, especially when held in reset.
    • The pclk and qclk are both used 11 times throughout the circuit and this exceeds the maximum TTL fanout for a single output. Two sets of these clocks are generated and divided evenly so no one clock output serves more than 6 inputs.

    The following shows the clocks generated by the circuit above. The tclk is also included for reference, but not shown in the circuit.

View all 21 project logs

Enjoy this project?



Shranav Palakurthi wrote 05/15/2019 at 03:05 point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 05/15/2019 at 11:48 point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct -

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2019 at 23:13 point

I finally ditched the diode ROM. I was able to juggle things around a bit and got it down to just 8 diodes configured as two 4-input AND gates. I decided to just add the additional chip and use a 74F21 instead. It's very fast with a Tp of just over 3 ns.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 



  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?


Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 ( which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!


If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version:

2) ATMega 328p "ROM" version:

3) And a schematic:

I have built the TD4 and have PCB designs on EasyEDA (, you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates