Novasaur Retrocomputer

Retrocomputer with serial and video built from late 70's TTL logic

Public Chat
Similar projects worth following
Can you browse the Web using pre-1980 TTL logic and memory speeds? The goal of this project is to demonstrate how. Internet connectivity is via an era-appropriate RS232 interface. The machine is upward compatible by a decade to support currently available keyboard and video interfaces (PS/2 and VGA). The video includes a native text mode capable of displaying 80-columns and bitmapped color graphics for retro gaming.

Novasaur TTL Retrocomputer

  • Dual Processor CPU/GPU (Harvard Architecture).
  • 33 MHz dot clock, 16.5 MHz data path, 8.25 MHz per processor (~3.5 CPU MIPs)
  • 256k ROM: 96k ALU, 64k native program, 64k cold storage, 32k fonts.
  • 128/512k RAM: 1-7 banks of 64k user, 60k display, 4k system.
  • 76 ALU functions including multiply/divide, system and math functions.
  • Bitmapped Graphics: Hi-res mode up to 416x240 with 8 colors and 4 dithering patterns. Lo-res mode up to 208x160 with 256 colors, double buffered.
  • Text Mode: 8 colors FG/BG, 256 line buffer, up to 104x60 using 8x8 glyphs, 80x36 and 64x48 rows using 8x16 glyphs.
  • Audio: 4 voice wavetable synthesis, ADSR, 8-bit DAC, 8Hz-4.8kHz.
  • PS2 Keyboard: Native interface built in.
  • RS232 Serial Port: Full duplex, RTS/CTS flow control, 9600 baud.
  • Expansion Port: 7 addressable 8-bit register ports, 4 interrupt flags
  • Chip Count: 34 TTL (22 CPU, 12 GPU), 1 ROM, 1 RAM, 1 PAL, 4 analog.
  • Gate Count: 1,425 (935 CPU, 490 GPU)
  • PCB size: 8" x 5" (200 x 125mm) double-sided board.
  • Power: 10W

The Novasaur consists of two processing units (CPU/GPU) operating on the alternating cycles of a 4-phase clock. The 4-phase clock is driven by a 33MHz oscillator to generate a processor clock of 8.25MHz. Each processor accesses one of the two address spaces (ROM/RAM) concurrently on a memory access cycle of 60ns (16.5MHz).

The GPU functions as a DMA controller operating in transparent mode to read the video memory and output to one of two video DACs. The first DAC generates 256 colors using three bits for the red/green, and two bits for the blue. This DAC is used for low res graphics mode where each byte of the video memory represents a single pixel.

The GPU also supports a text mode where the bytes of video memory alternate between a color byte and a code point representing a textcharacter. The color byte is used with the second video DAC to represent two 8 color values for foreground and background. The text mode can also support a high res graphics mode with two pixels per byte of video memory.

The CPU instructions use a 4-cycle sequence consisting of: fetch, read, execute, write. The fetch cycle uses a program counter to access the machine code instruction in the ROM. The read cycle provides access to the RAM in the indexed addressing mode. The execute cycle returns to the ROM to access the program memory for immediate addressing, or a set of lookup tables for an ALU operation. The final cycle is the write cycle where a register is updated with the execution result and optionally the RAM in the indexed addressing mode.

Instructions take from one to four process cycles to complete: The instructions are either 8 or 16-bits, so the fetch cycle takes either one or two process cycles to complete. The ALU operations can only handle one nibble per cycle, so two process cycles are required to handle an entire byte. The NOP instruction and conditional loads, were the condition is not met, are only one cycle (no execute). On average the instructions take 2.35 process cycles to execute for a nominal CPU speed of 3.5MIPS.

The base firmware implements a hardware abstraction layer (HAL) to support a video system with up to 46 addressable video modes, a multi-voice sound synthesizer, and a dual-port UART providing a full-duplex RS232 and PS/2 interface. The operating system and user programs are executed via a byte-code interpreter providing binary compatibility with the Intel 8080/5.


Mnemonics and hex codes for all 17,000 usable instructions.

JavaScript Object Notation (JSON) - 418.26 kB - 02/15/2021 at 23:22



Schematic of main board for rev. 9 PCB

application/pdf - 869.55 kB - 02/15/2021 at 23:21



Memory map of RAM and ROM address layout.

Portable Network Graphics (PNG) - 124.94 kB - 10/21/2020 at 03:07



Font ROM rendered as a bitmapped image.

Portable Network Graphics (PNG) - 8.48 kB - 10/21/2020 at 02:02



Big-endian instruction encoding.

Portable Network Graphics (PNG) - 16.72 kB - 12/29/2019 at 07:16


View all 6 files

  • Two Years Later

    Alastair Hewitt02/22/2021 at 03:45 0 comments

    It's been a couple of months since the last update and more like three since anything meaningful changed. There has been (yet) another board revision and Rev. 8 is now good enough to actually solder the chips in place!

    Just like last year, the project is coming out of a design phase and beginning the next stage of development. The past year focused on the firmware (hardware abstraction layer) and this year will focus on the operating system. This primarily involves bringing up CP/M, but there's a bit more to it than that...

    Preemptive Multitasking

    One advantage of the byte-code interpreter is the CPU state is already in RAM. This makes it easy to switch the CPU context and have more than one CPU running on the machine. The banked memory provides up to 8 banks of 64k and each bank can be assigned to a separate CPU instance.

    A counter is incremented at the end of each virtual process block (every 4 lines in SVGA) and the context is switched every 75 blocks. The context is determined by a sequence of 256 that can be set up to prioritize how often each CPU runs. This sequence takes up to 2 seconds to complete, but would typically repeat faster since each CPU can yield before the block count gets to 75.

    The context switch takes advantage of the 2-cycle identity function to read/write from the zero page to an adjacent memory location in a single instruction. This allows a entire context switch to be completed in under 80us. The context switch is also the only time the memory bank can be changed and will prevent another process from accessing or modifying another's memory.

    This memory segmentation is very important since half the memory banks are used as a disk drive. Without segmentation a crashed user program could write to the memory and damage the file system.

    Shared Memory

    Bank 0 contains the display and state of the hardware abstraction layer. This state is in a protected area above 0xF0 in the memory and also contains the context for each CPU. There is no context for bank 0, so this is used to hold the context sequence to determine the next CPU context.

    0xF0: Context Sequence
    0xFn: Context n (1-7)
    0xF8: Keyboard Scan Code Buffer
    0xF9: Keyboard Character Buffer
    0xFA: Serial Receive Buffer
    0xFB: Serial Transmit Buffer
    0xFC: TBD
    0xFD: TBD
    0xFE: TBD
    0xFF: Zero Page (HAL state)

    Each CPU context is broken down as follows:

    [0x00 ... 0x7F] [0x80 .... 0xE7] [0xE8 .. 0xEB] [0xEC . 0xFE] [0xFF]
    <-record body->|<-message body->|<-msg header->|<-CPU state->| flag

    The top 128 bytes is a fixed buffer used for transferring records. The next two sections can contain a message used for inter-process communication; consisting of a variable body up to 104 byes in length and a header containing message metadata. The next 19 bytes contain the CPU state. The final byte is a binary semaphore to signal (0) or wait (-1).


    Each CPU can only access its own context. However, the first CPU (bank 1) has an additional privilege to access the context of the other CPUs (2-7). This first CPU runs a kernel to manage and coordinate inter-process communication between the other CPUs (master/slave configuration).

    One bank (2) is configured to run the CP/M operating system and the last four banks (4-7) run a process to manage the memory as a RAM disk (designated as the A: drive). The following diagram shows how CP/M would request a record from the RAM disk using a context sequence of 2:1:4:5:6:7:1.

    The CP/M context would publish a message to request a record and then yield. Yielding involves timing out the context block count and setting the semaphore flag to -1 (wait). The CPU is now halted and blocked in the wait state until a signal (0). The context switch would then happen at the end of the current process block. 

    The next context is the kernel. The kernel operates in an event loop checking the messages from each of the other CPUs (2-7). The kernel sees the message from context 2 (CP/M) and...

    Read more »

  • Internet Connection

    Alastair Hewitt12/13/2020 at 19:20 0 comments

    Thanks to @Al Williams recent writeup a few questions came up about the Internet connection, "does this have ethernet? Or does it use PPP over that serial line". Well basically, all of the above.

    The physical data connection to the board is RS-232-C running at 9600 baud (8-N-1) with RTS/CTS flow control. There's a couple of options from here to get to the Internet. The classical method is via a serial line protocol like SLIP or PPP to a dialup modem. This requires a TCP/IP stack on the machine to handle the rest of the layer-2 and layer-3 network protocol. This would involve porting a stack like uIP and is still some way off in terms of development.

    An easier way to connect is via an IoT Wifi/Ethernet-to-UART module. Shown below is the Novasaur with one of these modules to support an Ethernet network connection (also shown with HDMI).

    These modules are a bit of a cheat though. They not only adapt the physical Wifi/Ethernet interface but also contain a micro-controller to handle the TCP/IP connections. The payload is pulled out of the protocol and then sent over the RS-232 like a simple UART serial connection.

    In fact, the current serial terminal program can already display protocols such as HTTP. The (blurry) image below shows a browser connecting to the Novasaur and asking for a web page. The HTTP protocol is just echoed to the screen, but a client program could interpret this and serve up a web page in response.

    A web server is also some way off. The good news is the 8080 CPU is partially tested and running. There's still a lot more to test and plenty of bugs to chase down over the next few weeks. After that a simple monitor program can be added and the work to bring up CP/M can begin.

  • Serial Terminal

    Alastair Hewitt11/25/2020 at 00:04 0 comments

    The first step in the serial terminal development was to echo characters typed on the keyboard to the screen. The new receive code is now integrated and echos text received over the RS232 serial interface to the screen as well.

    The animated GIF below shows text being received over the serial connection at 9,600 baud, or 960 bytes per second. The text is 2.4k bytes and takes about 2.5 seconds to transfer (shown in real time).

    The connection is made via a USB-to-RS232 null-model cable containing an FTDI chip. The cable includes a transmit and receive LED that can be seen below as both lit. This full duplex communication is possible by using two threads to handle both transmit and receive concurrently.

    Each byte typed on the keyboard or received over the serial link is echoed back over the serial connection. The terminal program shown below is displaying the same text being transmitted after it is echoed back.

    This was not a serious attempt to build a functional terminal program, but just a convienient way of testing the keyboard and serial interfaces. Next up is the virtual CPU testing, which should be a lot easier with a keyboard and a way to transfer code to/from the machine.

  • Bit Banged

    Alastair Hewitt11/22/2020 at 18:01 0 comments

    Just completed testing of the new serial receive code and confirmed it can remain synchronized with inputs from 9300 and 9800 baud. It look about two weeks to figure out the new algorithm and code it. The best part was the final solution required no more resources that the overly-simple original. Like the transmit, the receive thread only consumes one virtual machine cycle per bit and only needed one additional (repurposed) unary function.

    The diagram below is a little complex to explain in detail here, but might be of interest in showing some of the analysis behind the algorithm.

    The problem being solved here is the synchronization between the transmitter and receiver. Sure, they both run at "9600 baud", but the reality is the clocks are going to drift. This results is the clock slipping one bit ahead or behind periodically. The sampling point also needs adjustment to keep away from the clock edge and prevent spurious data caused by jitter.

    The new algorithm examines six sample points over two bit periods. The two bits in question are the stop then start bit. This is guaranteed to be a high-to-low transition regardless of the data being received. The position of this transition is monitored and the data bit sample point is adjusted to avoid any clock jitter/slippage. In addition, the timing is also adjusted when the transition gets too close to either edge of the sampling window.

    The state machine has a 10-bit cycle to match the start, the 8 data, and stop bits. If the clock drifts too far then one cycle is either added or removed. If the sample position has moved such that the next data bit sample would align wtih the start bit then an additional empty skip bit is added. This ignores the start bit and creates an 11-bit cycle to realign the timing of the next 10-bit cycle correctly.

    A similar thing is done for the other direction when an additional double cycle is added. This cycle samples two bits in the one cycle and then jump ahead by two bits. The result is a 9-bit cycle and a timing adjustment in the other direction.

    These adjustments can compensate for a slip of up to one sample period per byte. The serial ports are sampled on every line, so either 4 or 5 lines per bit, or 40 or 50 lines per byte. This translates to an error of 2.5% (1/40) or 2% (1/50) and provides a window of 9400-9800 baud for the serial connection.

  • TV Typewriter

    Alastair Hewitt11/14/2020 at 19:24 0 comments

    Testing moved to the serial interfaces last month with the development of a simple terminal program. This will display text typed on the keyboard and echo it over the RS232 interface. The serial interface is full-duplex, so data sent back over the RS232 interface is displayed on the screen.

    The first step was to get to a TV Typewriter. The PS/2 interface clock and data bits are sampled during the horizontal sync period. This then drives a state machine that deserializes the data to recover the scan code. Each scan code is added to a buffer and then decoded via another state machine to track things like shift/control key state. Special combinations of ctrl-alt are mapped to system calls with ctrl-alt-del calling the system restart.

    The keyboard buffer is sampled by the serial terminal code and any new characters are displayed on the screen and echoed over RS232 at ~9600 baud. There are no plans to develop this terminal code beyond a testing tool, so the terminal only handles lower/upper case characters, carriage return/line feed, and backspace.

    The transmit code is working fine, but there was a major design flaw in the receive code. I identified and solved part of the problem with the asynchronous clock recovery but missed the bigger picture with the clock slipping over process cycles. This results in an extra bit arriving in some cycles, or conversely no bits arriving. The Novasaur samples the RS232 data at 9593 baud and will typically miss 7 bits per second if the data is transmitted at exactly 9600 baud. Missing a single bit pushes the stop/start bits out of alignment and the data turns to garbage.

    So it's back the drawing board. I have a new algorithm that looks promising, but it is significantly more complex. There are a lot of corner cases that need to be addressed and it will likely take the rest of this month to get to working code.

  • Roll-your-own SID Chip

    Alastair Hewitt10/07/2020 at 03:30 0 comments

    Audio testing is now complete. This includes both hardware updates and the software to generate the sound. Since the sound system is finalized this would be a good point to review all the gory details.


    To keep the hardware minimal, no registers are dedicated to the audio. Instead time is borrowed from the GPU's glyph (G) register during the horizontal blanking period. The GPU address registers (H and V) are left in tristate during blanking and pulled high to generate the address 0x0FFFF. This is the top byte of the zero page and reserved to store the next audio sample as a 7-bit signed number. The blanking period also switches to the ALU instead of the font ROM with a special audio function at 0x3FFXX. This function remove the sign bit to create a DC-biased audio level and reverses the bits since due to PCB layout constraints the MSB of the audio DAC connects to the LSB of the register.

    The audio DAC gets the full glyph signal during the active video period and the initial design attempted to use a sample and hold circuit to sample just the audio when blanking. This didn't do a good job of isolating the video signal and led to a lot of noise issues. The circuit was redesigned to the following:

    The new design uses the H-sync signal (blue trace below) to mute the DAC during the active period and then allow the audio signal (yellow trace below) to recover during the blanking. This presents pure PCM pulses to the audio filter stage rather than the typical step function. This isn't a problem since they both contain the same frequency domain information. The power level is a lot lower though, so a 20dB inverting amplifier is needed to bring the level up to the -10dBv line level.

    Prior to the amplifier are two filters: A second-order Sallen-Key low-pass filter followed by a passive high-pass filter. The high-pass cuts frequencies below 16Hz and the low-pass above 4.8kHz. This is the Nyquist corner frequency when generating audio at the standard 9.6kHz virtual process rate. The frequency response is shown below:


    The same method used by the Gigatron was shamelessly copied to generate the audio waveform here: A lookup table is used to map each note to a 16-bit value that is then added to a 16-bit counter register. The addition is done at a fixed sample rate such that the register counts to 65,536 at the frequency of the note being played. The upper 8 bits of this counter are then used to index another lookup table that contains a sample of a waveform. Multiple voices are generated by using additional 16-bit counters for different notes and adding the result of waveform lookups together.

    Two functions are included in the ALU to lookup the note by the MIDI value and return the high and low byte to use for the 16-bit counter register. The table goes from 0 to 127 for use with the non 60Hz VGA video mode, where full 88-key piano keyboard goes from 21 to 108. For 60Hz VGA the sample frequency is slightly different, so the table is duplicated for this frequency between 128 and 255. In both cases the entire 88-key piano frequency range can be played.


    The Gigatron is able to compute one voice per line during the horizontal sync period. The Novasaur requires up to 48 compute cycles to calculate each voice, which is longer than the entire virtual machine cycle containing the horizontal sync. The audio has to therefore consume additional machine cycles and is treated as an optional feature with the number of voices made configurable.

    The audio is handled by a non-blocking thread scheduled at the end of the first line in the virtual  process cycle. At least 2 virtual machine cycles are required if the audio is enabled and this can be extended by an additional cycle per voice up to a total of 4 cycles. The first two cycles provides the first melodic voice and an additional non-melodic voice that would typically generate a random noise signal. Each additional cycle adds...

    Read more »

  • Lo-Fi

    Alastair Hewitt09/20/2020 at 22:01 0 comments

    After video came the audio testing. There was a known issue with a nasty 60Hz buzz breaking through in the audio channel. The last value in the glyph register shows up in the audio channel when the blanking period starts. This was assumed to be the cause of the buzz and the correct blanking during the front porch should take care of it. It turned out there was more to the issue than this...

    The audio DAC gets the full video signal during the active part of the video line. A sample and hold circuit is used to sample only the audio level during the horizontal blanking. However, there appears to be a significant parasitic capacitance associated with the DAC. This capacitance is charged up during the active video and then takes significantly longer than the front porch time to discharge. The result is an echo of the video signal in the audio channel, resulting in a periodic waveform at the frame rate of 60Hz.

    The solution was to add a muting circuit to the DAC. This is just a transistor that shorts the output of the DAC to ground during the active video.

    This worked so well that the sample and hold circuit was removed. The DAC now feeds the raw PCM pulses directly to the second-order low-pass filter section. The filter will need to be redesigned slightly to help filter out the increased high frequency harmonics and amplify the lower RMS level of the PCM signal. The following shows the current output of the DAC for an 880Hz sine wave:

    Note that the pulses come in groups of 4. The audio value is calculated once per virtual process cycle, but output once per line. The SVGA timing shown here has 4 lines per process cycle. The output wave (shown in blue) is via the existing filter design.

  • Classic VGA

    Alastair Hewitt09/14/2020 at 04:19 0 comments

    There was a completed version of the video system running a few months ago. Pretty much everything changed during the main software development and that also included the video modes. One change was to add the classic 60Hz VGA mode and allow even the ancient plasma TV rescued last year to understand the video timing.

    The image above shows the old TV displaying 104x60 of random text. This 104 column text is due to the dot clock being 33MHz, or 30% faster that the standard 25MHz VGA clock. This results in an additional 24 characters of text being output per line.

    So how is the classic VGA timing done?

    The original video modes used a process cycle of 4 lines of 5 virtual machine cycles. The virtual machine runs at 192kHz, so the horizontal frequency is 38.4kHz and close enough to support 75Hz VGA and 60Hz SVGA modes.

    The process cycle can be reconfigured to 5 lines of 4 virtual machine cycles. This results in the same block of 20 cycles and the serial compatible process cycle frequency of 9.6kHz. The horizontal frequency is now 48kHz and close enough to support 768-line video modes such as XGA.

    The new 60Hz VGA mode uses a configuration of 3 lines of 6 virtual machine cycles. There are some issues with this though. The process block is now 18 cycles and the process cycle frequency does not support a standard UART frequency, so not serial support. The horizontal frequency is also little high at 32kHz, but this can be fixed by adding a short delay to each line.

    The 60Hz VGA timing is determined by reseting the horizontal line every 262 cycles. The 6 virtual machine cycles add up to 258 (6 x 43), so an additional delay page is added before the horizontal sync page. The delay burns an additional 4 cycles to result in a horizontal frequency of 31.48855kHz (8.25MHz / 262). This is very close to the exact VGA horizontal frequency of 31.46875kHz.

    The frame is made up of 175 process cycles consisting of 3 lines each. This results in the exact 525 lines of the standard VGA mode and a vertical frequency of 59.98Hz. Again, this is very close to the VGA/NTSC standard of 59.94Hz.

  • Coding Complete

    Alastair Hewitt08/30/2020 at 01:18 0 comments

    Complete doesn't mean finished though! This was not like a modern iterative development process with small incremental changes as features were added and enhanced. The entire system had to be coded before all the dependancies were resolved. This took an entire year and there are still several weeks of testing ahead (and some inevitable updates).

    So what has been coded? Essentially the only program that will ever be written to run on this hardware. This is the firmware that forms the hardware abstract layer that all other programs will use to access the functions of the machine.

    There are two reasons for this approach. The first (and least significant) is the limited implementation of the Harvard Architecture: The system uses only one ROM and one RAM chip and there are two data data paths, one for program and other for data. It makes most sense to put the program in the ROM, so this means a new machine code program can not be loaded without reprograming the ROM.

    It is easy to add another RAM chip to the system and configure this as an additional bank of program memory. This solves the issue of not being able to load new machine code at run time, but there is a far more significant issue to consider: The main reason for not allowing a user to add their own machine code is to prevent a user's program from taking control of the CPU execution.

    If the system yields to a user's program then that program needs to be aware and responsible for all the critical real-time activities required to make the hardware work. The hardware provides the bare minimum to support the electrical interfaces for things like audio, video, and serial communications. The software is responsible for all the timing and state for these interfaces. An interrupt mechanism could be employed, but this is impractical with horizontal video timings running as fast as 48kHz.

    The way to keep the system simple is to use a byte code interpreter to execute the user's program. This does have a significant performance impact but there is plenty of room to extend the interpreter with fast native functions for common activities. A big advantage of the interpreter is the ability to provide binary compatibility with an existing processor like the 8080. This makes it easy to port things like CP/M to the platform.

    A lot of the firmware features have been discussed in previous logs during their development. A few things have changed as the final pieces came together, so these will be expanded on in later logs. For now this is a quick summary of the final firmware: The base system consists of 120 pages containing over 5,000 assembly instructions (not including 900 NOP instructions to pad timing). This code operates 10 non-blocking threads to control: horizontal video timing, vertical video timing, PS/2 keyboard scan, realtime clock, serial I/O sampling, RS232 transmit, RS232 receive, wavetable synthesizer, maskable interrupts, and byte-code interpreter.

  • Rev. 6

    Alastair Hewitt08/13/2020 at 19:44 0 comments

    The hardware abstraction layer development has taken an entire year and is finally nearing completion (final details in an upcoming log). In the meantime there were some minor updates to the hardware, some of which were discussed in the last log. This resulted in the rev. 6 board shown below:

    It was time to do a more detailed thermal analysis to make sure this hungry beast will not overheat. The F-series TTL chips consume about 5mW per gate and the 1,425 gates making up the Novasaur dissipate close to 7.5W. The regulator also dissipates up to 1.5W for a total power consumption of 9W.

    The plan is to have a sealed enclosure, so no ventilation holes. The components are cooled by radiating heat that is absorbed by the case. The case then radiates this heat to the environment until it reaches a thermal equilibrium. At this point the temperature is stable and the best way to measure this is via a thermal imaging camera.

    A budget camera was obtained and some initial measurements made. The picture on the left is the external case temperature after running for about an hour at 21C ambient. The picture on the right is with the cover removed showing the circuit board.

    There is a hot spot reading around 41.5C from the outside of the case and 62.5C from the inside. This area is centered around the B, X, and Y registers. The B register is the pipeline between the data and program address space and clocked at 16.5MHz. Both the X and Y register have pull-up resistors and are cycled at a similar rate. Together these three chips represent the highest heat density on the board.

    There is a second hot spot over the regulator shown in the left picture below (the H shape towards the bottom is the heatsink). The spot above that is the PAL, which also runs quite hot. It's interesting to note that the regulator is slightly cooler at 60C than the hottest chips.

    For comparison the picture on the right shows a hot spot on the 6V power adapter case. This had a temperature of 49C, which was several degrees above the Novasaur case.

    Note: These were just some initial pictures and more accurate and detailed images are planned. The emissivity was the default 0.95, but it is probably more accurate at around 0.90.

View all 65 project logs

Enjoy this project?



Marcel van Kervinck wrote 03/25/2020 at 17:11 point

Great name change!

  Are you sure? yes | no

monsonite wrote 11/05/2019 at 15:04 point

Hi Alastair, I stumbled across your project following on from a message from Marcel. Excellent work and very inspirational. I'm planning a 16-bit design based on a 4-bit bitslice design and video and sound will not be a high priority. I noticed that you mentioned overclocking the ROM. I hope to be using a AT7C1024-45 - have you any estimate of how fast that might clock?

  Are you sure? yes | no

Alastair Hewitt wrote 11/05/2019 at 18:14 point

Thanks for the follow! I've become less certain about overclocking... I'm routinely seeing the 55ns OTP ROM perform as fast as 12ns. That's actually causing issues because the pull up resistors on the bus are jumping high for 6ns during the CPU/GPU context switch. The ROM is so fast it sees that as a valid address (0xFFFF) and returns a value before then doing the actual look up. That means it's doing twice the work in a time window that was barely long enough to do one. This is slowing things down a bit and I need to solve that problem before I can get an idea about actual performance.

Saying that, this is what I found with the 70ns NOR flash. That was responding within 32ns, so more than twice as fast. But, there are certain addresses, or sequences, that take up to 50ns. You have to design around the worse case, so that would be the actual limit. Since then I've seen it slow down a little more and that number is closer to 55ns. I suspect that may have been caused by repeated flashing of the chip. The chip also slows down when it heats up and you can expect another 5ns at 50C. That brings it down to 60ns. That's still better than the 70, but not by much.

So you should do better than 45ns and may see actual speeds in 10-20ns range. I wouldn't get too carried away though since worse case may be closer to 40ns for reliable operation in all conditions.

  Are you sure? yes | no

Shrad wrote 12/14/2020 at 19:37 point

Just my two cents... why not use two interleaved ROM to double the rate? would be easier than any other solution and there should be a leftover flip-flop somewhere to clock them each at a turn...

  Are you sure? yes | no

Marcel van Kervinck wrote 08/18/2019 at 08:14 point

I wonder if your architecture would be classified as a barrel processor. Any thoughts on that?

  Are you sure? yes | no

Alastair Hewitt wrote 08/18/2019 at 13:24 point

I was a bit generous when using the term "GPU". That part of the circuit is really a DMA controller running in transparent mode.

The Harvard Architecture makes it fairly simple to implement since there's two address/data spaces. I'm able to use both concurrently with some pipelining. The same technique could be used to build a 2-core barrel processor. I assume you would have to replicate the CPU registers though.

  Are you sure? yes | no

Shranav Palakurthi wrote 05/15/2019 at 03:05 point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 05/15/2019 at 11:48 point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct -

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2019 at 23:13 point

I finally ditched the diode ROM. I was able to juggle things around a bit and got it down to just 8 diodes configured as two 4-input AND gates. I decided to just add the additional chip and use a 74F21 instead. It's very fast with a Tp of just over 3 ns.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 



  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?


Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 ( which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!


If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version:

2) ATMega 328p "ROM" version:

3) And a schematic:

I have built the TD4 and have PCB designs on EasyEDA (, you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates