Novasaur Retrocomputer

Retrocomputer with serial and video built from only 1978-era TTL logic

Similar projects worth following
Can you browse the Web using pre-1980 TTL logic and memory speeds? The goal of this project is to demonstrate how. Internet connectivity is via an era-appropriate RS232 interface. The machine is upward compatible by a decade to support currently available keyboard and video interfaces (PS/2 and VGA). The video includes a native text mode capable of displaying 80-columns and two bitmapped color graphics modes for retro gaming.

Novasaur TTL Minicomputer

  • Dual Processor CPU/GPU (Harvard Architecture).
  • 33 MHz dot clock, 16.5 MHz memory clock, 8.25 MHz per processor (3.5 CPU MIPs)
  • 256k ROM: 96k ALU, 64k native program, 64k relocatable code, 32k fonts.
  • 128/512k RAM: 1-7 banks of 64k user, 50k display, 14k shared.
  • 50+ ALU functions including multiply/divide, system and math functions.
  • Bitmapped Graphics: Up to 320x256 Hi-res mode with 8 colors and 4 dithering patterns. Lo-res mode up to 160x256 with 256 colors, or 160x128 double buffered.
  • Text Mode: 8 colors FG/BG, 256 line buffer, up to 80x75 using 8x8 glyph text, up tp 80x48 rows using 8x16 glyph text.
  • Audio: up to 4 voices, 6 waveforms, ADSR, 8-bit DAC, 20Hz-4.8kHz.
  • PS2 Keyboard interface built in.
  • RS232 Serial Port for host/client and network connectivity (9600 baud).
  • Expansion Port: 7 addressable 8-bit registers in/out, 4 input flags
  • Chip Count: 34 TTL (22 CPU, 12 GPU), 1 ROM, 1 RAM, 1 PAL, 4 analog.
  • Gate Count: 1,425 (935 CPU, 490 GPU)
  • PCB size: 8" x 5" (200 x 125mm) double-sided board.
  • Power: 6v DC @ 2A (9W)

The Novasaur consists of two processing units (CPU/GPU) operating on the alternating cycles of a 4-phase clock. The 4-phase clock is driven by a 33MHz oscillator to generate a processor clock of 8.25MHz. Each processor accesses one of the two address spaces (ROM/RAM) concurrently on a memory access cycle of 60ns (16.5MHz).

The GPU functions as a DMA controller operating in transparent mode to read the video memory and output to one of two video DACs. The first DAC generates 256 colors using three bits for the red/green, and two bits for the blue. This DAC is used for low res graphics mode where each byte of the video memory represents a single pixel.

The GPU also supports a text mode where the bytes of video memory alternate between a color byte and a code point representing a character. The color byte is used with the second video DAC to represent two 8 color values for foreground and background. The text mode can also support a high res graphics mode with two pixels per byte of video memory.

The CPU instructions use a 4-cycle sequence consisting of: fetch, read, execute, write. The fetch cycle uses a program counter to access the machine code instruction in the ROM. The read cycle provides access to the RAM in the indexed addressing mode. The execute cycle returns to the ROM to access the next byte in the program memory for immediate addressing, or to a lookup table for an ALU operation. The final cycle is the write cycle where a register is loaded with the execute result and optionally the RAM in the indexed addressing mode.

Instructions take from one to four process cycles to complete: The instructions are either 8 or 16-bits, so the fetch cycle takes either one or two process cycles to complete. The ALU operations can only handle one nibble per cycle, so two process cycles are required to handle an entire byte. The NOP instruction and conditional loads, were the condition is not met, are only one cycle (no execute). The average instruction takes 2.35 process cycles for a typical CPU speed of 3.5MIPS.

The base firmware implements a hardware abstraction layer (HAL) to support a video system with up to 112 modes, a multi-voice audio source, and a dual-port UART providing a full-duplex RS232 interface and a single PS/2 port. The operating system and user programs are executed via an interpreter offering binary compatibility with the RCA 1802 (COSMAC).


Memory map of RAM and ROM address layout.

image/png - 118.36 kB - 03/14/2020 at 23:23



Schematic of main board for rev. 5 PCB

Adobe Portable Document Format - 844.30 kB - 03/14/2020 at 01:32



Font ROM rendered as a bitmapped image.

Portable Network Graphics (PNG) - 8.48 kB - 01/17/2020 at 04:49



Big-endian instruction encoding.

Portable Network Graphics (PNG) - 16.72 kB - 12/29/2019 at 07:16



Schematic of expansion board prototype

Adobe Portable Document Format - 106.41 kB - 12/28/2019 at 00:48


View all 7 files

  • 14 × 74F574 Octal D-type Flip-flop with Tri-state Outputs
  • 4 × 74F541 Octal Buffers/Drivers with Tri-State Outputs
  • 1 × 74F138 3-line to 8-line Decoders
  • 1 × 74F244 Octal Buffers with Tri-State Outputs
  • 1 × 74F175 Quad D-type Flip-flop with Clear

View all 20 components

  • Rev. 5

    Alastair Hewitt03/14/2020 at 01:31 1 comment

    The final board was supposed to be Rev. 4. The order was placed the day China went on Covid-19 lockdown and what should have been a week turned in to a month. This provided some time to reflect on the current design and see if it was possible to squeeze any additional functionality out of the already limited chip count... so by the time the Rev. 4 board showed up there was a Rev. 5 board ready to ship!

    The original design uses 128k RAM divided in to two banks: user and video. The bank is selected by a bit in the instruction, allowing fast switching between these two memory banks. The video memory is the only bank read by the GPU and must always be selected during the GPU RAM cycle. However, the other bank could be further selected and there are up to 3-bits available from the E register to do this.

    It was possible to implement this change by swapping a quad 2-input OR gate (74F32) with a quad 2:1 mux (74F157). A lot of things had to get moved around and the board went through a three-week long revision. The layout didn't change much, but almost all the ECU traces had to be rerouted. A jumper (shown below) was added to select between the original 128k and the new 512k memory chip.

    Another change was the power supply. Unfortunately, the high voltage buck converter didn't make it through the final round of Rev. 4 testing. There is over 3 watts being generated in just one cubic inch of space and things were overheating. The temperature is stable if air can flow over the area, but it starts to get too hot once the enclosure is sealed up.

    The alternative was a linear regulator, specifically the LDO variety. There are 1.5A versions available that can operate within 0.4V of the input and accommodate a 6V power supply. Like a lot of power supplies, the one currently under test outputs 5% over the rated voltage. There's about 366m ohms between the supply and the regulator dropping this to about 5.75V. The remaining drop across the regulator results in 1.125W being dissipated as heat, almost 1/3rd of the buck regulator design.

    The thermal properties of the board are much improved. The minimal board was assembled and tested up to 50C. Not only was it stable, it could run this hot at 35MHz with no decoupling capacitors fitted! This is a first and it appears the changes in the design and PCB layout have all been positive.


    Alastair Hewitt03/08/2020 at 22:32 0 comments

    The hardware abstraction layer contains a virtual CPU for executing application code. The plan has always been to emulate an existing CPU, but which one? 8-bits for sure, and since the Intel-derived CPUs (8085/Z80) were too complex, the initial approach was towards a Motorola-derived CPU (68XX/6502). There is a third option though; the RCA 1802 COSMAC.

    This chip is often overlooked because it didn't gain the same visibility as the other 8-bit micros. RCA started its precipitous decline soon after the CPU was launched and their commercial products were all flops. The CPU did find success in the embedded market, from GM's first ECU to space probes.

    The COSMAC is closer to RISC rather than the typical CISC processors of the era. This minimalistic design makes it even easier to implement that the Motorola-derived chips. The chip contains a total of 16 index registers, each with 16 bits, a single 8-bit accumulator, and a few other status registers.

    One issue with chips like the 6502 is their register constrained design (they rely heavily on a zero page to expand a limited set of registers). The emulator has access to its own zero page and can implement up to 256-bytes of registers at no additional cost. There is no benefit to implementing a register constrained design. In fact, a lot of 6502 code would be working around this limitation for no reason. COSMAC code tends to work within its own set of registers and this means it works within the emulator's zero page, so is far more efficient virtual CPU.

    The COSMAC uses a fetch and execute sequence, with a single fetch, and typically one, or sometime two execute cycles. The 1976 COSMAC CPU would run with a 4-5uS machine cycle, so comparable to the 5.2uS machine cycle of the hardware abstraction layer (HAL).

    The fetch code of the HAL just fits in the 43 clock cycles of the virtual machine cycle. There is one caveat: the program counter is pre-incremented. This is the only way to make it fit, so this will break binary compatibility of assembled machine code. However, the code can be easily fixed via static analysis - absolute jump locations need to be reduced by one.

    For the fetch cycle, any one of the index registers can be assigned to the program counter (the PC is essentially an indirect address). This address is used to reference the two bytes of the PC in the emulator's zero page. The lower byte is incremented and the upper byte is either loaded or incremented if the lower byte overflows. The memory location at this address is read and copied to an instruction cache in the zero page. This instruction is then used, along with the virtual machine state, to decode the next page jump.

    Most of the execution will fit in a single virtual machine cycle, but there are a few exceptions. One is the long branch - this is where the next two bytes referenced by the PC need to be read and then used to update the PC. This requires a double length virtual machine cycle (86-clock cycles).

    This is where the indirect location of the PC is used to find and then increment the PC (two step process with conditional jump), load the value at that memory location and then save it in a temporary location. It has to be cached because the PC needs to be incremented again to load the second byte. Both bytes are then used to update the two bytes of the PC (indirectly). This is a lot of work with such limited hardware, as can be seen in the assembly code below:

    # Long Branch (LBR)
    INCLUDE ../inc/unary.nsa
    INCLUDE ../inc/zpage.nsa
    INCLUDE ../inc/pages.nsa
    # $PREG - zero page location of the P register
    # $PREG:  10> 100
    # $REG0H: 100> 222 Big-endian
    # $REG0L: 101> 254
    # assume: Y = $VMS
    FNH DZ, HLD        # double inc state
    LDZ Y, $PREG       # zero page address of PC (Y=10, [10]->100)
    FNH DZ, Y          # Y = lower byte address (y=101)
    FNFH DZ, XD        # inc value of lower byte put in X ([101]->254->255->X)
    FNEL A, PC         # fork based on X
    ADDR 0x40          # if X=0xFF : iden Y, inc X, inc...
    Read more »

  • One Year Later

    Alastair Hewitt02/27/2020 at 20:14 3 comments

    It was a year ago when I stumbled across the infamous 8-bit Guy video demoing the Gigatron. I was working on a retro arcade cabinet at the time, but building a video game system from scratch was a much more interesting challenge. It wouldn't be the first time either. I built a Racer game out of TTL chips using a 7x7 LED matrix as a senior project at school. I then spent that summer working on a Harvard Architecture CPU with a ROM-based ALU. I never thought about generating VGA (it was still a couple of years away at the time) but seeing the Gigatron achieve this with so little has re-inspired me!

    I'm essentially at the same place I was almost 12 weeks ago: I can copy an image from the ROM to the video RAM and generate the video timing. What has changed is the way the video timing is generated and how this code is built.

    The initial code was developed old skool by assembling the machine code by hand and then typing the hex code into the WIndows app that came with the EPROM programmer. It was nostalgic, but not very productive (not to mention frustrating when you typo '6' instead of 'b').

    The project now has an assembler and a build script to compile the code, calculate the ALU lookup tables, and generate fonts. The final step of the build process is to flash the ROM image using minipro. There is no simulator though, so testing must be done on real hardware and debugging still requires an oscilloscope.

    The oscilloscope trace above shows the Page Register clock pulse occurring every 52uS. This represents the virtual machine clock of a hardware abstraction layer developed over the last few weeks. This is the foundation of the system going forward and will be providing video, a virtual UART, "sound chip", and CPU for an operating system and user applications.

    Hardware Abstraction Layer

    There are multiple systems on the board with timing critical requirements like the video, audio, and serial ports. A user program can not take control of the hardware without having a significant insight in to the various timing constraints and requirements of these systems. The solution is to put an abstraction layer between the hardware and user program.

    Even though this has drifted up and down a bit, the final dot clock (until it changes again!) is 33 MHz. This drives a 4-phase clock for the hardware process clock of 8.25MHz. The hardware abstraction layer divides this clock down to a 43-cycle fixed virtual machine cycle running at 191.86kHz. This is further divided down to 9.593kHz by using 20 machine cycles to create a virtual process cycle consisting of either 4 lines of 5 cycles, or 5 lines of 4 cycles.

    Each line in the process cycle ends with a single machine cycle dedicated to timing. This cycle updates the scan register to generate the video sync pulses, updates the V register to select the next line for the GPU to render, samples the serial ports, and decides what additional cycles are needed to handle features (audio and serial communication).

    The remaining cycles are available to execute user code on a virtual CPU. So the 4-line process cycle has 16 machine cycles (153,488 per second)  and the 5-line process cycle has 15 machine cycles (143,895 per second) to execute user code. The virtual CPU uses a fetch/execute cycle, where the execute would need at least one and sometimes two machine cycles. The average would be around 2.3 cycles per instruction, which equates to a virtual CPU speed of around 66k instructions per second.


    The ALU now contains a video timing function to support four video timing schemes. The first two use the 4-line virtual process cycle with a horizontal frequency of 38.372kHz. The first of these uses 128 process cycles per field to generate VGA at 75Hz (VESA DMT ID: 06h). The second uses 160 process cycles per field to generate SVGA at 60Hz (VESA DMT ID: 09h). The last two timing schemes use the 5-line virtual process cycle with a horizontal...

    Read more »

  • Happy New Year!

    Alastair Hewitt01/25/2020 at 02:51 0 comments

    As in the Chinese Lunar New Year! It's remarkably cheap to run prototype PCBs since the design works on a 2-layer board. I decided to ship the Rev. 3 board design last week to get it here before things shut down in China.

    There's now four populated boards (2x Rev. 1 boards on top, Rev. 3 and Rev. 2 on the bottom of the picture)

    Rev. 3 included a few minor updates to improve the ground planes and power distribution. A bridge rectifier was added and the filter capacitors were increased to handle AC power input. There was also an update to the horizontal control circuit to allow switching between 2 and 3 micro-second H-sync pulses.

    The good news is the board worked first time. The bad news was the updated power supply generates too much noise when the components heat up. This is not a surprise though. Trying to put the entire power supply circuit on the same board was really pushing it!

    The output filter cap was moved away from the main switching circuit due to space constraints. This adds inductance to the ground return path and increases the switching transients on the buck regulator. This causes sharp 20ns pulses riding on the power lines and some pretty horrifying EMC implications I would imagine. The board starts ok, but then becomes unstable as the thermal drift kicks in.

    The Rev. 2 buck converter is working fine though, so the power circuit will be rolled back for the Rev. 4 board. Further testing seems to indicate a 33MHz dot clock is going to be stable and the the hardware abstraction layer is being designed around this (more on that in a later log). The true color video output is greatly improved in terms of supply noise. Not only that, but the video signal gets cleaner as things warm up. There's something strangely satisfying about that in a vacuum tube sort of way.

  • If It Ain't Broke, Don't Fix It

    Alastair Hewitt01/05/2020 at 18:19 0 comments

    There was one final delta between Rev. 1 and 2: The H-sync was put through the bus control latch to align it with the dot clock. This wasn't really necessary, so was rerouted to be a straight connection on the Rev. 2 board. This frees up a flip-flop for use elsewhere and was used to resample the output enable of the X register. However, this required an additional shift in the clock phase that was not made. The result was bus contention on the lower part of the RAM address.

    It's surprising the board was able to run at all with this problem. The OTP ROM did not work because it contained the text fill code and this was crashing before the video loop could start. The problem was resolved by cutting a pin and using a patch wire to select the correct clock phase.

    The Rev. 2 board is working and was able to run with the 35MHz dot clock. The quality of the video signal is greatly improved with the cleaner supply lines. The assumption was the cleaner supply would also improve the stability at 35MHz, but things are starting to glitch as they warm up. Dropping to 32MHz resolves any remaining stability issues and this is likely be the final dot clock speed.

    There's not much more to test on the computer side of things, so testing is focusing on the new power supply design. A trip to the local electronic store to pick up more solder lead to a chance discovery. They had inexpensive linear power supplies. I don't need a regulated input and a big hunk of iron in the power supply has additional retro appeal. Another option is to add a bridge rectifier to the the board to support an AC input. This would only require a simple iron core transformer for the power source.

  • One Step Forward, Two Steps Back

    Alastair Hewitt01/04/2020 at 05:17 0 comments

    The Rev. 2 PCB came in yesterday and the first sample has been built and tested. The image below shows the Rev. 1 (left) and Rev. 2 (right) boards. The plan is to keep one fo the previous revisions of the board on hand to compare in the case the new revision is a step backwards... which unfortunately appears to be the case here...

    The biggest delta between the two revisions was a new power distribution layout including a cooper pour on the back for a ground plane. The Rev. 1 board needed a few patch wires to add additional ground return paths. These problems should be eliminated on the new board and so far the power lines do look a lot cleaner.

    The new board booted up first time and flashed the blinkenlight on and off at the correct 1Hz frequency. The V-sync signal was glitchy and prevented the video from syncing. Dropping the dot clock to 32MHz fixed the issue and the board appears to be completely stable at this speed. This was with the slower NOR flash ROM though, so the faster OTP ROM was tested. This didn't work at all... at any frequency, so something is definitely not right.

    The other major change was the new on-board buck converter. This is working well and provides a clean and stable 5v supply with up to 2A from a lower current 500mA 24v input. The PCB has the wrong footprint for the regulator with the pins staggered the other way (there will definitely be a Rev. 3 board!). The pins were re-bent and everything was able to fit within the one-cubic inch of available space. The regulator and inductor run hot, as expected. The design calculations indicated a 50C rise above ambient under load and this appears to be the case.

    The final delta was some updates to the audio circuit. The op-amp was tested under load during the Rev. 1 phase and can easily supply up to 150mA. This is enough to drive a small internal speaker, similar to the old UK home computers from the early 80's (ZX Spectrum, Jupiter Ace, BBC Micro, etc). This is optional and would be supported with a speaker connector and trimmer pot to control the volume (once the correct vertical mounted POT is installed). A fixed resistor can be added in place of the trimmer for a line-level output from the audio jack.

    So there's some debugging ahead to work out why the faster clock is not working and why the OTP ROM doesn't work. It's possible that both issues are related, but figuring out the root cause is likely to burn up the entire weekend.

  • Expansion Board

    Alastair Hewitt12/26/2019 at 21:07 0 comments

    An expansion method was an important feature of the design and provisions were put in place to allow data a parallel data path in and out of the system. The current design is aligned to support the RCA 1802 (COSMAC) with up to eight input and output registers, four external flag inputs (EF), and a single flip-flop output (Q).

    Two 16-pin headers are installed on the main PCB to expose the two data busses and a minimal set of control signals. From here only two 3-to-8 decoders and a 4-bit buffer is needed to complete the expansion interface. These components are not included on the main PCB since the expansion is optional and not needed for normal operation.

    A simple expander card has been designed in order to test this interface and will be manufactured alongside Rev. 2 main board. The expander is typically 10cm x 10cm in size  (the threshold for the lowest price tier for most PCBs) and mounts over the main PCB like a shield.

    The design includes a socket for a single 8-bit expansion register. Two sets of headers with jumpers allow this register to appear as any one of the eight possible input and/or output registers. An additional set of four flip-flops is included in the 7th register position and can be used to scan a 4x4 keypad matrix with the return 4-bits going to the expansion flags (EF).

    The keypad is just a temporary measure for testing and will not be needed once the PS/2 keyboard serial code is built and working. That could be up to 6 months away though.

  • Audio

    Alastair Hewitt12/19/2019 at 05:00 0 comments

    Not much has been said about the audio yet, but it's definitely a feature and currently being tested.

    First a quick overview of the zero page to better understand how the audio system works. A zero page is typically the first page of the memory and only requires a single byte to address one of 256 possible values. In this design the zero page is put in the upper bank of memory along with the display. The display uses all the pages of this memory, but only the first 209 bytes of each. To accommodate this the zero page is oriented to be the last byte of each page. So rather than setting the Y index to 0 and using the X index to address the zero page location, this design sets the X index to 0xFF and uses the Y index to address the location.

    The 0xFF value for the X register is created by adding pull-up resistors to the address bus and leaving the X register in tri-state during the zero page access. A similar approach is used with the GPU where both the H and V registers are left in tri-state during the horizontal blanking period. This selects not only the zero page, but the very last byte at the top memory address of 0x1FFFF. This last byte of the zero page is used to store an 8-bit audio sample.

    So during the horizontal blanking period the GPU reads the audio sample and puts it on the lower 8-bits of the ROM address bus. Normally the GPU context selects the font area of the ROM, but in the horizontal blanking period the ALU context is used. Not only that, the upper ROM address is also left in tri-state and pull-up resistors select 0xFF of the ALU. This selects the unary identity function and passes the value of the audio sample through the ROM unaffected to the glyph register.

    The glyph register does double duty: It acts as a pipeline for the glyph line while colors are loaded, but during the blanking period it holds the audio sample read from the zero page. An audio DAC is added in the form of an R2R resistor network to output the analog version of the audio sample during the blanking period. The audio DAC output is only sampled during the blanking period to reject the video signal during the non-blanking period. The sampled signal is then filtered to remove the high-frequency and DC components.

    To test the audio a sine wave was added to the ROM and addressed by the video vertical line address. This results in a sine-wave at the video field rate of 60 Hz and sounds exactly like electrical hum :) One issue identified is with the sample and hold circuit. This currently uses a BS170 MOSFET with a threshold voltage of only 0.8v and this is not completely turning off on the bottom half of the cycle. The image below shows the sine wave transposed up but still experiencing some breakthrough of the video signal during the bottom part of the cycle. Switching to a BS270 may fix this, but further investigation is ongoing.

  • Power Supply Design

    Alastair Hewitt12/11/2019 at 22:51 0 comments

    The remaining instability in the Rev. 1 board is down to power issues. The power distribution was deliberately underspecified to highlight problem areas and test different designs. There are two main areas of concern: inductance and current consumption.

    Inductance is the biggest enemy for maintaining a clean power supply. The decoupling capacitors help, but adding multiple return paths to ground seems to be the most reliable strategy. The inductance of each path is placed in parallel, so two paths will halve the inductance of the single path.

    Inspiration was taken from 70's video game boards (an example shown above). These were designed before the introduction of microprocessors and typically required well over a hundred (non-LS) TTL chips. These are arranged in columns of several chips with one or more decoupling capacitors per column and a dual return power path. This is the approach taken for the power distribution on the Rev. 2 board.

    The other issue is the relatively large current consumption at 1.5A. This is just for the core system and doesn't include additional power to things like a WiFi dongle (2W) or expansion board (300-500mA). This pushes the maximum current consumption closer to 2.5A and poses some major challenges in maintaining the supply voltage between 4.75v and 5.25v

    The initial plan was to use a 5v power adapter and there are plenty of inexpensive options to meet the current requirements. The problem with these is the resistance between the power supply and the power distribution on the board. The leads from the supply and resistance of the barrel jack connector comes in over 300 milli-ohms. This would drop the voltage by 0.75v at 2.5A, resulting in only 4.25v getting to the board power rails. Some 5v supplies output 5.25v to compensate, but this still means the supply would only reach 4.5v on the board.

    One idea was to start with a higher voltage like 6v and add some additional resistance to drop the voltage down to 5v. The 6v supplies also tend to compensate and typically output 6.3v, so adding a 0.22 ohm power resistor to the supply line would drop the voltage to 5v (assuming a total resistance of 520 milli-ohms). This assumes a current consumption of 2.5A, but the base system consumption of 1.5A would result in a supply of over 5.5v if the additional components were not used with this approach.

    What's really needed is a regulator on the board. One option is a linear regulator to take a 9v or 7.5v supply and drop it down to 5v. Again the current consumption poses a problem here resulting in up to 2.5W of heat dissipation for each volt dropped. Low-dropout regulators are available, but these would still result in over 4W of dissipation and require a large heatsink. The other option is a buck converter and this is the current plan for the Rev. 2 board.

    There are inexpensive SMD modules that can fit in the available space, but these don't have the best thermal design or reliability. The components are available in through-hole however and the buck converter can be added directly to the board. A small heatsink is required and this can be wrapped around one of the mounting holes to maximize the available space as shown below.

    This design can use a much higher voltage and avoid the large input current and voltage drop getting the supply to the board. The current power supply design would only need 600mA using a 24v supply. This includes 2W of direct power to the Wifi dongle and provide up to 2A at 5v via the buck converter for the main system and optional expansion board (assuming 80% efficiency). Heat dissipation is also a more manageable 2W with this design.

  • Parrot!

    Alastair Hewitt12/08/2019 at 05:04 1 comment

    One significant milestone in any home brew VGA project is to generate a parrot image... so here it is!

    This is in the lo-res video mode (160x120) using the 3:3:2 video DAC. To do some baseline testing the board generating this does not have any decoupling capacitors and it looks pretty terrible. The video is being oversampled by a factor of 5 and a lot of the supply noise shows up in the image (the white area below should be the same color).

    Hopefully this will be greatly improved with the Rev 2 board, which is where most of the time over the last 2 weeks has been spent. Also worth noting is how this image is generated. The image is stored in the ROM, but the Harvard Architecture prevents data from being read from the ROM. The image is actually the product of several ALU functions, each one acting as a lookup table to return parts of the bitmap image.

View all 48 project logs

Enjoy this project?



Marcel van Kervinck wrote 5 days ago point

Great name change!

  Are you sure? yes | no

monsonite wrote 11/05/2019 at 15:04 point

Hi Alastair, I stumbled across your project following on from a message from Marcel. Excellent work and very inspirational. I'm planning a 16-bit design based on a 4-bit bitslice design and video and sound will not be a high priority. I noticed that you mentioned overclocking the ROM. I hope to be using a AT7C1024-45 - have you any estimate of how fast that might clock?

  Are you sure? yes | no

Alastair Hewitt wrote 11/05/2019 at 18:14 point

Thanks for the follow! I've become less certain about overclocking... I'm routinely seeing the 55ns OTP ROM perform as fast as 12ns. That's actually causing issues because the pull up resistors on the bus are jumping high for 6ns during the CPU/GPU context switch. The ROM is so fast it sees that as a valid address (0xFFFF) and returns a value before then doing the actual look up. That means it's doing twice the work in a time window that was barely long enough to do one. This is slowing things down a bit and I need to solve that problem before I can get an idea about actual performance.

Saying that, this is what I found with the 70ns NOR flash. That was responding within 32ns, so more than twice as fast. But, there are certain addresses, or sequences, that take up to 50ns. You have to design around the worse case, so that would be the actual limit. Since then I've seen it slow down a little more and that number is closer to 55ns. I suspect that may have been caused by repeated flashing of the chip. The chip also slows down when it heats up and you can expect another 5ns at 50C. That brings it down to 60ns. That's still better than the 70, but not by much.

So you should do better than 45ns and may see actual speeds in 10-20ns range. I wouldn't get too carried away though since worse case may be closer to 40ns for reliable operation in all conditions.

  Are you sure? yes | no

Marcel van Kervinck wrote 08/18/2019 at 08:14 point

I wonder if your architecture would be classified as a barrel processor. Any thoughts on that?

  Are you sure? yes | no

Alastair Hewitt wrote 08/18/2019 at 13:24 point

I was a bit generous when using the term "GPU". That part of the circuit is really a DMA controller running in transparent mode.

The Harvard Architecture makes it fairly simple to implement since there's two address/data spaces. I'm able to use both concurrently with some pipelining. The same technique could be used to build a 2-core barrel processor. I assume you would have to replicate the CPU registers though.

  Are you sure? yes | no

Shranav Palakurthi wrote 05/15/2019 at 03:05 point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 05/15/2019 at 11:48 point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct -

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2019 at 23:13 point

I finally ditched the diode ROM. I was able to juggle things around a bit and got it down to just 8 diodes configured as two 4-input AND gates. I decided to just add the additional chip and use a 74F21 instead. It's very fast with a Tp of just over 3 ns.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 



  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?


Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 ( which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!


If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version:

2) ATMega 328p "ROM" version:

3) And a schematic:

I have built the TD4 and have PCB designs on EasyEDA (, you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates