YATAC78 - The WWW TTL Computer

Retro computer built from 1978-era TTL logic chips. Internet capable with built in web browser and server

Public Chat
Similar projects worth following
Can you browse the Web using pre-1980 TTL logic and memory speeds? The goal of this project is to demonstrate how. Internet connectivity is via an era-appropriate RS232 interface. The machine is upward compatible by a decade to support currently available keyboard and video interfaces (PS/2 and VGA). The video includes a native text mode capable of displaying 96-columns and two bitmapped color graphics modes for retro gaming.

YATAC78 - Yet Another TTL Archaic Computer (1978)

  • Dual Processor CPU/GPU modified Harvard Architecture.
  • 15.44 MHz machine clock, 7.72 MHz per processor.
  • 256k ROM: 128k program, 96k ALU, 32k fonts.
  • 128K RAM: 78k user data, 50k display.
  • 75 ALU functions including: BCD support, multiply/divide, square root, trigonometry
  • Bitmapped Graphics 2 resolutions (60Hz refresh): 8 color hi-res mode at 384x256 (4 dithering patterns) or 256 color low-res mode at 192x128 (double buffered).
  • Text Mode 768x400 native resolution (75Hz refresh). 4 fonts, 96x25 using 8x16 glyphs or 96x50 using 8x8 glyphs. 256 line buffer for 10-page smooth scroll.
  • 8-bit PCM Audio with 4 voices, 20-15kHz bandwidth.
  • PS2 Keyboard interface built in.
  • RS232 Serial Port for host/client and network connectivity (up to 115200 baud).
  • Parallel Port for expansion (4 bits in, 8 bits out, 2 register strobes).
  • Chip Count 44 TTL, plus single ROM, RAM, PAL, and RS232 driver.
  • Target PCB size 2 stacked 12cm x 16cm 4-layer boards (24 chips per board).

The system is clocked around 15MHz and a typical instruction spans 4 clock cycles as follows:

  1. Load Instruction from ROM.
  2. Read data from source register or RAM.
  3. Perform an ALU function using the ROM as lookup table.
  4. Write data to register, accumulator, and optionally to RAM.

The alternating use of both ROM and RAM allows a second processor to be added to the system. Both processors use dedicated pipelines to cache data between the alternate program and data address spaces. One processor handles serial communications and general computational tasks (CPU) while the other is dedicated to the display and audio (GPU).

The following sequence of diagrams demonstrates the multiplexing of the CPU (shown in blue) and GPU (shown in red). In this example the GPU is operating in text mode and the CPU is executing the sequence described in the numbered list above.

In the first cycle the GPU reads the ASCII code point of a character from the RAM and stores the result in the GPU Cache (gc). The CPU addresses the ROM using the Program Counter (PC) and Page Register (Pg) to load the Instruction Register (I).

In the next cycle the context switches over and the CPU X and Y registers are used to address the RAM and load the CPU Cache (cc). Meanwhile the gc, along with the Scan Counter (SC), is used to address the ROM and load a character bitmap line in to the Character Register (C)

The GPU returns to the RAM where the H counter was moved to the next byte and loads the gc with text color values. The ROM is now configured as an ALU with a function specified in the instruction. The cc is combined with one half of the HL register and the result is stored in the Accumulator (A).

In the final cycle the value in A is written to the RAM. The font colors stored in gc are moved to the RAMDAC (P) and the bitmap loaded in to a shift register to start the next character render cycle.

... Read more »


Mostly finished schematic

Adobe Portable Document Format - 774.82 kB - 05/19/2019 at 01:21



Simulation of CPU State Machine from WinCUPL.

Adobe Portable Document Format - 126.74 kB - 05/12/2019 at 03:49



Memory map showing both RAM and ROM address layout

Portable Network Graphics (PNG) - 103.66 kB - 05/02/2019 at 11:26


  • 12 × 74F574 Octal D-type Flip-flop with Tri-state Outputs
  • 7 × 74F163 Synchronous 4-Bit Binary Counter
  • 1 × 74F08 Quad 2-Input AND Gates
  • 2 × 74F00 Quad 2-Input NAND Gates
  • 5 × 74F541 Octal Buffers/Drivers with Tri-State Outputs

View all 23 components

  • ALU and Instruction Set

    Alastair Hewitt6 days ago 0 comments

      Hardware testing is complete on all the jump and branch instructions. So that's the first 8 out of 256 instructions tested! The next 24 are loading operands. These should work fine since the jump/branch instructions are also loading operands in order to update the PC and Pg register. All the rest are ALU instructions, so it's time to work on the build script to generate the 96k of lookup tables.

      First off is to define the functions. There's room for 8 full byte-wide, 4 half nibble-wide, and 64 unary functions. One of the unary functions has to be the identity (do nothing) so I don't count that in the total of 75 functions.

      The 8 byte-wide functions are the classics:

      1. ADD - Addition
      2. SUB - Subtraction
      3. ADDD - BCD addition
      4. SUBD - BCD subtraction
      5. AND - Logical AND
      6. OR - Logical OR
      7. XOR - Logical Exclusive-OR
      8. CMP - Compare (returns 0 if equal, else -1)

      The 4 nibble-wide functions are used for multiplication and division (one each for binary and BCD). These would be used to multiply two nibbles to get a byte, or divide a byte by a nibble to get a nibble.

      The 64 unary functions are contained in 4 sets of 16. The first 16 (FN0) are math related and may consist of the following:

      1. SQRT - Square Root
      2. POW2 - Square (x**2)
      3. POW3 - Cube (x**3)
      4. INV - 1/x
      5. SIN - sin(x)
      6. ASIN - arc sin(x)
      7. COS - cos(x)
      8. ACOS - arc cos(x)
      9. TAN - tan(x)
      10. ATAN - arc tan(x)
      11. EXP - e**x
      12. LN - natural log(x)
      13. LOG - base 10 log(x)
      14. LOG2 - base 2 log(x)
      15. ABS - absolute (remove sign)
      16. ?? - ran out of ideas :(

      These math functions may look impressive, but they have a very limited dynamic range at only 8 bits wide. These can not be used directly to build a real floating-point library, but they can provide short cuts in making a real library faster. They could be used directly for demo-grade things like a Mandelbrot program, or to draw a circle on the screen. The circle should be clean if the radius is kept below 128, which is realistic in both the low and hires graphics modes.

      The next two sets (FN1 and FN2) would contain functions related to graphics, serial communication, keyboard scan codes etc. I have some ideas, but not worth finalizing at this point. The last set (FN3/FNH) are the most used and contain the typical unary functions you would see on other processors:

      1. INC - Increment (x+1)
      2. DEC - Decrement (x-1)
      3. INC2 - Double Increment (x+2)
      4. DEC2 - Double Increment (x-2)
      5. 1COM - One's complement (invert bits)
      6. 2COM - Two's complement (invert bits + 1)
      7. ROR - Rotate Right
      8. ROL - Rotate Left
      9. LSR - Logical Shift Right
      10. LSL - Logical Shift Left
      11. ASR - Arithmetic Shift Right
      12. ASR4 - Arithmetic Shift Right by 4 (move upper nibble to lower preserve sign)
      13. SR4 - Shift Right by 4 (move upper nibble to lower)
      14. SL4 - Shift Left by 4 (move lower nibble to upper)
      15. SWAP - Swap nibbles
      16. IDEN - Identity function (x = x)

      The instruction set can now be derived based on the functions defined above. The following lists all 256 instructions:

      00: NOP
      01: JMP D
      02: BPZ D
      03: BN D
      04: PJ D
      05: PJT D
      06: PBPZ D
      07: PBNT D
      08: LD V, D
      09: LD E, D
      0A: LD EX, D
      0B: LD EY, D
      0C: LD X, D
      0D: LD Y, D
      0E: LD HL, D
      0F: LD A, D
      10: LDZ V, D
      11: LDZ E, D
      12: LDZ EX, D
      13: LDZ EY, D
      14: LDZ X, D
      15: LDZ Y, D
      16: LDZ HL, D
      17: LDZ A, D
      18: LDZ V, D, RAM1
      19: LDZ E, D, RAM1
      1A: LDZ EX, D, RAM1
      1B: LDZ EY, D, RAM1
      1C: LDZ X, D, RAM1
      1D: LDZ Y, D, RAM1
      1E: LDZ HL, D, RAM1
      1F: LDZ A, D, RAM1
      20: ADD A, HL, RAM0
      21: SUB A, HL, RAM0
      22: ADDD A, HL, RAM0
      23: SUBD A, HL, RAM0
      24: AND A, HL, RAM0
      25: OR A, HL, RAM0
      26: XOR A, HL, RAM0
      27: CMP A, HL, RAM0
      28: ADD A, HL
      29: SUB A, HL
      2A: ADDD A, HL
      2B: SUBD A, HL
      2C: AND A, HL
      2D: OR A, HL
      2E: XOR A, HL
      2F: CMP A, HL
      30: ADD RAM0, HL
      31: SUB RAM0, HL
      32: ADDD RAM0, HL
      33: SUBD RAM0, HL
      34: AND RAM0, HL
      35: OR RAM0, HL
      36: XOR RAM0, HL
      37: CMP RAM0, HL
      38: ADD A, HL, RAM1
      39: SUB A, HL, RAM1
      3A: ADDD A, HL, RAM1
      3B: SUBD A, HL, RAM1
      3C: AND A, HL, RAM1
      3D: OR A, HL, RAM1
      3E: XOR A, HL, RAM1
      3F: CMP A, HL, RAM1
      40: FNH A, V, RAM0
      41: FNH A, E, RAM0
      42: FNH A, EX, RAM0
      43: FNH A,...
    Read more »

  • Clock Circuit

    Alastair Hewitt05/09/2019 at 02:35 0 comments

    The clock circuit consists of a Pierce Oscillator running at 30.875 MHz and two Johnson Ring Counters. The first ring counter (mclk/nclk) consists of a single flip-flop to divide the 30.875 MHz dot clock (dclk) down by a factor of 2. The second (pclk/qclk) consists of two flip-flops to divide the dot clock down by a factor of 4. The last stage of this counter is duplicated to provide an additional set of identical clocks.

    Some considerations in this design:

    • A single clock source is used to derive all the other clocks, rather than feeding the clock of one flip-flop with the output on another. This keeps all the clock edges aligned.
    • The rising edges of the mclk and the pclk/qclk must to be synchronized. This requires the flip-flops to be reset on initialization or after power fluctuation.
    • The pclk and qclk control complementary bus contexts. The complementary outputs of a single flip-flop keeps these perfectly symmetrical and avoids bus contention, especially when held in reset.
    • The pclk and qclk are both used 11 times throughout the circuit and this exceeds the maximum TTL fanout for a single output. Two sets of these clocks are generated and divided evenly so no one clock output serves more than 6 inputs.

    The following shows the clocks generated by the circuit above. The tclk is also included for reference, but not shown in the circuit.

  • Overclocking

    Alastair Hewitt05/09/2019 at 02:02 0 comments

    Overclocking usually happens after a project is complete. This one started with overclocking and the design was adapted from the outset to maximize performance. The results have been impressive.

    The original goal was to use the standard VGA dot clock of 25.175 MHz. This was divided by two to generate a machine clock of around 12.5 MHz. Calculations showed this would work fine for 55ns memory, but not for the 70ns NOR flash being used in development. It looked like that wouldn't make it past 11 MHz.

    However, it appears the memories are capable of significantly better performance than their quoted specs. The current design was able to support a machine clock as high as 19 MHz with NOR flash (two different types were tested). Faster memories (10ns) would easily support a machine clock in the 30-35 MHz range, so a dot clock as high as 70 MHz may be possible. The clock circuit and state machine were tested with a dot clock as high as 100 MHz and both performed well, but the NOR flash was basically a random number generator at that speed!

    The plan is to use era-appropriate memory speeds for the late 70's though. This doesn't mean if a single 1,024 bit memory could do 25ns on its own, then I can use 25ns memory. It needs to be a memory sub-system of equivalent size, which could require up to a 1,000 chips to store 128k bytes in 1978. The fastest memory sub-system of that size and era would probably come out of a Cray-1 super computer. The Cray-1 memory system was capable of a 50ns access time, so that seems like the appropriate speed limit for this design.

    This works out at around 30 MHz for the dot clock and a machine clock of 15 MHz. Another consideration in selecting the exact frequency is the serial communication. The margin of error in syncing with a monitor is much greater than a high-speed serial link. A UART frequency is therefore more important than a dot-clock frequency. Plus, it's almost impossible to get the old-school VGA crystals these days.

    There's two options in the 30 MHz range. Both divide down to 115,200 baud in a whole number of process cycles (1/4 of the dot clock): 

    • 29.4912 MHz = 64 process cycles * 115,200
    • 30.875 MHz ~ 67 process cycles * 115,200

    The plan is to use the higher frequency, but the slower option may be used depending on stability of the final design. The higher frequency is 22.6% faster than the standard VGA dot clock, so the GPU horizontal scan length is increased from 200 to 244 process cycles. The active screen memory per line is increased from 160 bytes to 192 bytes. This increases the text mode from 80 to 96 columns.

  • Progress

    Alastair Hewitt04/29/2019 at 19:17 0 comments

    The redesign is complete and everything has changed. I'll need to go back and update all the logs because most of that information is no longer valid. Things have been rationalized and the timing simplified. This has resulted in an even faster machine that could potentially run on the higher VESA dot clock of 31.5 MHz. More on that after some more testing.

    The big change was getting rid of the clock and strobe timing. The clocks control the bus context and the strobes were used to latch data within the bus timing. A strobe would be 6ns before the end of the bus cycle to allow time to hold the latched data before the context switch.

    This wasn't needed though because the ROM owns its data bus. Data from the previous context can be left on the bus after the switch from the GPU to CPU. The bus tri-state takes at least 3ns to change the address and the ROM is guaranteed to hold the data for at least 7ns after that. Since the data will stick around on the bus for at least 10ns then the same clock used for the bus control can be used to latch the data, even after one-layer of logic delay. This effectively overlaps the clock cycles and time can be borrowed from the next cycle to extend the previous one.

    The build is finally starting to take shape. Things are progressing slowly since everything is being validated at every step. Bus contention is also being checked and one issue was resolved that would happen on reset. The new clock circuit needs to be reset to synchronize two independent ring counters. If the clocks are all pulled low then both bus contexts exist at the same time. This effectively shorts any TTL output that is driving a high on to the same bus line as something holding it low. The new circuit uses flip-flops with complementary outputs to prevent this.

    The picture below shows the bare-bones CPU (ROM not shown). This contains the main clock generator, CPU state machine, Program Counter and bus buffer, Instruction register, and a little bit of logic (not all wired up in this picture). There's also an additional register to drive something on the bus during the GPU bus cycle.

    The ROM was programmed with sequential groups of 16 instructions. The instruction sequences demonstrate different execution cycles. The oscilloscope output below shows the least significant bit of the Program Counter. Going from left to right you can see cycles where the PC is incremented every 4 machine cycles, then every 6, every 2, and then every 1 cycle before going back to 2, then 4.

    The instructions executed above are:

    • Short ALU - 4 cycles between PC increments
    • Long ALU - 6 cycles between PC increments
    • Load Zero-page - 2 cycles between PC increments
    • Load Operand - 2 cycles between PC increments
    • Page Jump (unconditional) - 2 cycles between PC increments
    • Jump (condition not met) - 1 cycle PC increment
    • Jump (unconditional) - 2 cycles between PC increments
    • NOP - 2 cycles between PC increments

    Next up is a jump program that will reload the Program Counter and execute a loop. Then add the accumulator and program some ALU functions.

  • CPU State Machine

    Alastair Hewitt04/22/2019 at 15:57 1 comment

    8 weeks in and the CPU is running, although it's not doing anything useful yet. The initial task was to get all the timing circuits in place and measure the performance of the system.

    Only 10 chips are needed to implement the clock circuit, program counter, instruction register, and CPU state machine. This would have been 15 chips if it wasn't for this guy below.

    The initial design was all 74-series chips and it needed 50 in total. A single programmable array logic (PAL) can absorb 6 of the 74-series chips. The first PALs were introduced in 1978 and included the 16R8. The device show above is the modern equivalent knows as a GAL (generic array logic). This chip can emulate a variety of old PAL devices and in this case it is being used as the 1978-era 16R8.

    I could go further though. The table below shows the possible chip count reductions using more PALs:

    Original Design050555
    Current Design144550
    Optional Design530540

    The optional design would replace the 8-bit counters that currently require 3 chips (two 4-bit counters + buffer) with a single chip. There would be 3 of these 8-bit counter PALs. The rest of the instruction decode logic and bus state machine would also fit in a single PAL. Note: Control signals used by the counters would require the larger 24-pin 22R10 devices.

    Consolidating 6 chips in to one was an easy choice. The current plan is to stop there though. The design files for the PAL have been uploaded along with a simulation showing different execution states for the CPU State Machine.

  • Internet

    Alastair Hewitt04/18/2019 at 14:07 2 comments

    This was always a stretch goal, but it looks very doable now the design is complete... and it wouldn't be the first TTL Computer on the Internet.

    What is the scope of Internet-enabled?

    Connectivity is via the RS232 port. This provides an ancient but still supported interface standard. There are plenty of inexpensive options to adapt to more modern serial standards. These include USB with an FTDI cable and Ethernet with a WizNet protocol adapter. Of course, dial-up would be the most authentic method using a standalone modem.

    A TCP/IP stack is a project in its own right. Something like uIP could be ported to the YATAC and would work within the constrained resources. Things like the WizNet adapter can offload some of this stack overhead and will be used to get things up and running quickly.

    The simplest server/client model would be TFTP. I plan to do better than this though and go straight to HTTP with a browser and web server. However, this would be the Tim Berners-Lee 1991 version of the Web: Text only browser supporting a subset of HTML 2.0 and basic web server file handling. This will provide everything needed to upload and download programs via a web interface, so no mass storage is needed.

    The hardware text mode was specifically designed to render basic web pages. The 4 fonts are used with HTML as follows:

    • Standard font used to render text body.
    • Bold font used for <b> tags.
    • Italic font used for <i> tags.
    • Underline font used for <u> and <a> tags.

    There are 8 colors available, so links would be highlighted along with using the underline font. Headers would also be highlighted in a different color using the bold font.

    Pages can be up to 256 lines long and are rendered directly to the screen memory. Scrolling is achieved by updating one register on every screen refresh. Almost no additional CPU resources are required to display a page once loaded.

  • Redesign

    Alastair Hewitt04/14/2019 at 23:44 4 comments

    So there's the easy problem (designing a computer out of TTL) and then there's the hard problem (make it work at 12.5 MHz).

    I was close, but not quite there. The RAMDAC was hitting the wall with a dot clock above 21 MHz. I have to simplify the logic that switched between the 16 and 8 colors in text mode. I have to take the ZX Spectrum approach and drop to 15 colors with a subtractive intensity rather than additive.

    The instruction decode looked good on paper but I forgot about the 10 ns propagation from the clock to output of the instruction register. I was able to find a solution but the entire decode path barely fits in the 80 ns machine cycle. Too much parasitic capacitance in a PCB layout and things will get glitchy once the chips reach a toasty 70 C.

    So I've been shuffling things around and doing some consolidation. I'm back to a solid design, but I will need to redraw the schematic. I'm not spending another weekend doing that by hand, so the next schematic will be a proper CAD drawing. But saying that, it's time to start building and verifying this thing will work at these speeds. Only then is it worth documenting the verified design.

    Not surprisingly the first thing to test is the clock and bus control state machine. After that the instruction decode and CPU state machine. I'll then be able to run a simple jump/loop program. It won't be Turing Complete, but if that program runs reliably at 12.5 MHz then everything else will work. I can then start on the fun stuff like adding RAM and the video output. Then the really hard problem (software).

  • Clocks, Scans, and Refresh Rates

    Alastair Hewitt04/10/2019 at 22:22 0 comments

    The previous log discussed the display columns. This one will cover the rows, but first the clocks:

    • Dot Clock (dclk) - 25.175 MHz
    • Machine Clock (mclk) - 12.588 MHz (1/2 dclk)
    • Processor Clocks (pclk/qclk) - 6.294 MHz (1/2 mclk, qclk is pclk inverted)
    • Extended Clocks (rclk/sclk) - 6.294 MHz (pclk/qclk shifted 90-degrees)
    • Text Clock (tclk) - 3.147 MHz (1/2 pclk)

    The dclk is the standard VGA dot clock and is used to latch the output of the first video DAC (VDAC1) and to shift the bits of the character buffer in text mode. This renders the text at the full 640 horizontal graphics resolution of VGA by displaying 80 columns of text using 8-bit wide characters (not the 9-bits of the 720 horizontal resolution VGA-400 text mode).

    The mclk is the native speed at which the hardware is clocked at. This is divided down again to generate the pclk at which each processor operates. Therefore each processor cycle includes two machine cycles. One machine cycle is used to access the ROM and the other to access the RAM. The GPU and CPU operate on opposite clocks to access both memories concurrently.

    The rclk is a delayed version of the pclk used by the CPU state machine. This provides a 40ns delay in which to perform the instruction decode and maintain the state across the edge of the 80ns machine cycle.

    The final clock is the tclk and is used by the GPU in text mode. Each column of text uses two bytes, so the GPU divides down the pclk to alternated between reading the ASCII code point and font/color bytes on each processor cycle.

    Each line of the display requires 200 process cycles as discussed in the previous log. This results in a horizontal scan frequency of 31.47 kHz. This is fixed in hardware and is the same for every video mode. At the end of each line the Scan Counter (SC) is incremented. This is a 4-bit counter that can count up to 16 lines. This serves two purposes:

    1. Index the character bitmap row to be rendered in text mode
    2. divide down the line count

    The lo-res text mode uses 16 lines for each character, so the whole of the scan counter is required for this. The hi-res text only needs 8 lines, so just the first 3 bits are used. The bits are also combined to create an ent signal for the vertical counter depending on the video mode as follows:

    • divide by 2 - hi-res graphics mode (repeat lines twice)
    • divide by 4 - lo-res graphics mode (repeat lines 4-times)
    • divide by 8 - hi-res text mode (8-bit character)
    • divide by 16 - lo-res text mode (16-bit character)

    This is what drives the vertical scan count, but the software must blank the display at the bottom of the screen, generate the V-Sync pulse, and then reload the V register to reset the counter to the top of the screen. The value loaded in to the V register can be moved up and down to perform a smooth scroll of a larger text area within the video RAM.

    Since the vertical sync timing is software defined it is possible to supported different screen sizes. The YATAC defines 4 screen heights and when combined with the 4 other video modes provide a total of 16 configurations for the display. Both the standard VGA-400 and VGA-480 modes are supported, allowing full compatibility with even the most ancient PS/2 CRT monitors. The other two modes are YATAC-specific custom defined: The YATAC-9x4 is a 419 line mode at 75 Hz and will map directly to HD/UHD 9:4 ratio displays (but at a much lower resolution). The YATAC-MAX is a 561 line mode at 56 Hz and renders the maximum number of lines in RAM (256) when using the hi-res graphics mode.

    The mode name, total lines and how they are made up are defined as follows:

    Mode NameTotal LinesFront PorchV-Sync Back PorchRefresh Rate
    YATAC-9x44191724075.10 Hz
    VGA-4004491223570.09 Hz
    VGA-4805251023359.94 Hz
    YATAC-MAX5611223556.09 Hz

    The screen refresh rate drops as the lines increase since the horizontal frequency is fixed. However,...

    Read more »

  • Counting to 200

    Alastair Hewitt04/10/2019 at 18:16 0 comments

    The GPU does the same thing over and over again: Count to 200. It does this regardless of the graphics mode. Each horizontal scan will read 200 bytes of the RAM at the processor clock rate of 6.25 MHz. What does change is what each byte represents.

    Even though the horizontal scan is 200 bytes long, it must also contain the border, overscan area, and sync timing. This extra stuff takes up 20% of the scan line, so only 160 bytes are typically displayed per line.

    The 160 bytes is mapped to columns as follows:

    ModeBits per ColumnColumns
    Hi-Res Graphics4320
    Lo-Res Graphics8160

    Each column in the graphics mode is directly mapped to a DAC. The hi-res 4-bit encoding is RGBI and the lo-res 8-bit encoding is RGB 3:3:2 (3-bits red and green, 2-bits blue). The text column contains two colors (foreground and background) and both use a 3-bit encoding of just RGB.

    The 16 bits of a text column span 3 bytes and consists of the font, code point of the character, and two colors. The following table shows how these are mapped given a column index of C:

    Font2 bits
    Code Point (ASCII)8 bits
    Foreground Color3 bits
    Background Color3 bits

    In addition to the 160 bytes per line for the conventional display area, an additional 4 bytes are added to the start and end of each line. These bytes would normally be set to a solid border color and are rendered along with the normal 160 bytes. These can be used to display content in the border, but this would only be visible on a CRT and wrap around the edge of the glass.

    The display RAM provides up to 256 rows for the display. The total RAM required is 42k bytes with the 168 bytes reserved for each line for the display. The way this is mapped may seem a bit odd until you see the reasoning behind it.

    The first column of the display (including the border) has an index of 56 (0x38 in hex). Remember we need to count to 200. A naive approach would be to start at 0 and count to 199 before returning to zero on the next clock pulse. If we start at 56 then the last index before reseting would be 255. The synchronous counter chips provide a signal (rco) that is generated on 255 and this can be used to reload the counter to 56. Therefore we count to 200, but without needing any additional logic gates (actually, one inverter)

    So why place the border where it is? And why is it the size it is? The (VESA) VGA spec does specify a border, but this would only be 2 bytes (8 VGA pixels). The extra 2 bytes is added so the screen border ends at 224 (0xE0). This is when the H-Sync pulse begins. A single 3-input AND gate can be used to define the start of this pulse. This is the H-Blank signal and defines when the horizontal output should be turned off. The H-sync pulse lasts for 24 bytes, so a pair of 2-input NAND gates can fully define this when combined with the H-Blank signal.

    Here's the detailed memory map of a line of video RAM:

    0011 10000x3856video RAM start
    0011 10010x3957back porch end
    0011 10100x3A58left border start
    0011 10110x3B59left border end
    0011 11000x3C60display start
    1101 10110xDB219display end
    1101 11000xDC220right border start
    1101 11010xDD221right border end
    1101 11100xDE222front porch start
    1101 11110xDF223front porch & video RAM end
    1110 00000xE0224H-Blank & H-Sync start
    1111 01110xF7247H-Sync end
    1111 10000xF8248back porch start
    1111 11110xFF255H rco, H-Blank end

  • Control Signals

    Alastair Hewitt04/10/2019 at 02:17 0 comments

    The following gives an overview of the control signals shown in the schematic. There are a lot of them! The table below details the 20 registers/buffers/counters used in the YATAC. Each requires one or both of an output enable (~OE) and latch enable (~LE). Listed are the names, machine context, data source (SRC), destination (DST), and the control signals used by each register.

    Register NameContextSRCDST~OE~LE
    qclk +
    qclk + H-blank~VLE
    Scan Counter
    pclk +
    GPU CachegcGPURAM
    Color RegisterCGPUROM
    X Index
    pclk +
    X Index
    Y Index
    Ei CPUserial
    CPU Cachecc CPURAM
    ALU FunctionfnCPUI0-2, I6
    AccumulatorA CPUROM
    HL RegisterHLCPU
    X Register
    Y Register

View all 13 project logs

Enjoy this project?



Shranav Palakurthi wrote 4 days ago point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 4 days ago point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct -

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 



  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?


Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 ( which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!


If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version:

2) ATMega 328p "ROM" version:

3) And a schematic:

I have built the TD4 and have PCB designs on EasyEDA (, you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates