Close
0%
0%

Novasaur CP/M TTL Retrocomputer

Retrocomputer built from TTL logic running CP/M with no CPU or ALU

Public Chat
Similar projects worth following
The Novasaur is a full-featured personal computer built from less than three dozen Advanced Schottky TTL chips (circa 1979). It support an 80-column VGA text display, PS/2 keyboard interface, programmable sound generator, RS232 serial, and an Intel 8080 byte-code interpreter. The machine is capable of running early 80's computer games and even CP/M using a built-in 250k RAM disk.

Novasaur TTL Retrocomputer

  • Dual Processor CPU/GPU (Harvard Architecture).
  • 33 MHz dot clock, 16.5 MHz data path, 8.25 MHz per processor (~3.5 CPU MIPs)
  • 256k ROM: 96k ALU, 64k native program, 64k cold storage, 32k fonts.
  • 128/512k RAM: 1-7 banks of 64k user, 60k display, 4k system.
  • 76 ALU functions including multiply/divide, system and math functions.
  • Bitmapped Graphics: Hi-res mode up to 416x240 with 8 colors and 4 dithering patterns. Lo-res mode up to 208x160 with 256 colors, double buffered.
  • Text Mode: 8 colors FG/BG, 256 line buffer, up to 104x60 using 8x8 glyphs, 80x36 and 64x48 rows using 8x16 glyphs.
  • Audio: 4 voice wavetable synthesis, ADSR, 8-bit DAC, 8Hz-4.8kHz.
  • PS2 Keyboard: Native interface built in.
  • RS232 Serial Port: Full duplex, RTS/CTS flow control, 9600 baud.
  • Expansion Port: 7 addressable 8-bit register ports, 4 interrupt flags
  • Chip Count: 34 TTL (22 CPU, 12 GPU), 1 ROM, 1 RAM, 1 PAL, 4 analog.
  • Gate Count: 1,425 (935 CPU, 490 GPU)
  • PCB size: 8" x 5" (200 x 125mm) double-sided board.
  • Power: 10W

The Novasaur consists of two processing units (CPU/GPU) operating on the alternating cycles of a 4-phase clock. The 4-phase clock is driven by a 33MHz oscillator to generate a processor clock of 8.25MHz. Each processor accesses one of the two address spaces (ROM/RAM) concurrently on a memory access cycle of 60ns (16.5MHz).

The GPU functions as a DMA controller operating in transparent mode to read the video memory and output to one of two video DACs. The first DAC generates 256 colors using three bits for the red/green, and two bits for the blue. This DAC is used for low res graphics mode where each byte of the video memory represents a single pixel.

The GPU also supports a text mode where the bytes of video memory alternate between a color byte and a code point representing a textcharacter. The color byte is used with the second video DAC to represent two 8 color values for foreground and background. The text mode can also support a high res graphics mode with two pixels per byte of video memory.

The CPU instructions use a 4-cycle sequence consisting of: fetch, read, execute, write. The fetch cycle uses a program counter to access the machine code instruction in the ROM. The read cycle provides access to the RAM in the indexed addressing mode. The execute cycle returns to the ROM to access the program memory for immediate addressing, or a set of lookup tables for an ALU operation. The final cycle is the write cycle where a register is updated with the execution result and optionally the RAM in the indexed addressing mode.

Instructions take from one to four process cycles to complete: The instructions are either 8 or 16-bits, so the fetch cycle takes either one or two process cycles to complete. The ALU operations can only handle one nibble per cycle, so two process cycles are required to handle an entire byte. The NOP instruction and conditional loads, were the condition is not met, are only one cycle (no execute). On average the instructions take 2.35 process cycles to execute for a nominal CPU speed of 3.5MIPS.

The base firmware implements a hardware abstraction layer (HAL) to support a video system with up to 46 addressable video modes, a multi-voice sound synthesizer, and a dual-port UART providing a full-duplex RS232 and PS/2 interface. The operating system and user programs are executed via a byte-code interpreter providing binary compatibility with the Intel 8080/5.

inst.json

Mnemonics and hex codes for all 17,000 usable instructions.

JavaScript Object Notation (JSON) - 418.26 kB - 02/15/2021 at 23:22

Download

schematic.v1.9.pdf

Schematic of main board for rev. 9 PCB

application/pdf - 873.54 kB - 02/15/2021 at 23:21

Preview
Download

memory-map.v1.7.png

Memory map of RAM and ROM address layout.

Portable Network Graphics (PNG) - 124.94 kB - 10/21/2020 at 03:07

Preview
Download

font_rom_v1.3.png

Font ROM rendered as a bitmapped image.

Portable Network Graphics (PNG) - 8.48 kB - 10/21/2020 at 02:02

Preview
Download

instruction_encoding.v1.1.png

Big-endian instruction encoding.

Portable Network Graphics (PNG) - 16.72 kB - 12/29/2019 at 07:16

Preview
Download

View all 6 files

  • Rev. 9

    Alastair Hewitt2 days ago 0 comments

    Rev. 8 was supposed to be the final "pre-release" board, but a few minor updates turned into a fairly major refactor of the power distribution. The result has been impressive...

    Note: these photos greatly exaggerate the background power supply ground noise. This isn't exactly what it looks like under normal viewing, but the it is very noticeable on this monitor when viewed from above (the photos also have a filter to increase the contrast).

    The first photo is from the Rev. 8 board and the periodic noise from the ground plane is very noticeable (even this was an improvement from the Rev. 7 and earlier boards). The second photo is the Rev. 9 and the periodic noise is almost gone.

    There is still a high frequency component modulated with a period matching the horizontal frequency. It rises to a peak in the center of the screen and looks like a CRT phosphor burn... an unexpected but very cool retro effect!

    There were three component changes - the volume control and reset button have been added back. These aren't really necessary, but it's more fun to have extra knobs and buttons to play with. The other change was ditching the super caps and switching to a battery. The super caps could backup the memory for almost a month, but a CR2032  can last up to 5 years.

  • Quad Core Demo

    Alastair Hewitt05/30/2021 at 02:59 0 comments

    The ability to do preemptive multitasking was discussed in a previous log. The code was checked in almost 6 months ago, but it's taken until this week to finally debug and test. The following demo image shows the kernel executing the memory dump command and three other CPU instances each updating the colors in a single column on the screen.

    It took the development of several additional features to set up the context switching and initialize the various CPU instances (note, each CPU has its own RAM bank and there's no way one CPU to access another's RAM bank).

    A Boot Loader is used to initiate the CPU instances and each CPU context will copy a different section of code to initiate that CPU and memory bank. CPU 1 is the kernel and the only context that can issue a BOOT command. On start up the kernel issues this command to each of the other CPUs and then updates the context switching table to set the sequence and priority for the other CPUs. Each CPU will then boot as the context is switched to that CPU instance. The boot loader then copies the code related to that CPU and starts execution.

    The example above gave each CPU an equal weighting. This results in about 15 KIPs to each CPU and is the reason why the memory dump is running fairly slow.

  • RTC and KIPs

    Alastair Hewitt05/18/2021 at 04:52 0 comments

    One feature of the Hardware Abstraction Layer that hasn't been discussed yet is the Real-Time Clock. This isn't some super low-power CMOS chip keeping track of time using a button cell, but an extension of the video timing to keep track of seconds, minutes, hours, and days. It runs as part of the block sync thread and needs all 10 watts to keep track of time!

    The frame rate is either 60 or 75 Hz and this is divided by either 4 or 5 to generate a 15 Hz reference. This is used to trigger the PS/2 keyboard scan and increment the counter TIME0. This counter starts at -90 and counts up to zero, overflowing every 6 seconds. This overflow increments the TIME1 counter, which in turn counts up from -120 to zero and overflows every 12 minutes. TIME2 is then incremented and also counts for -120 to zero to overflow every 24 hours. The final TIME3 counter is then used to track the number of the days.

    This may seem like an odd design, but it's based on efficient 7-bit arithmetic to keep the code compact in terms of both space and time. There are custom instructions to read these registers and return the time in the more conventional second, minute, and hour format. There is also a provision in this design to adjust TIME0 by one count every 16 counts of TIME1. This adjustment corrects the RTC to within several PPM, or losing less than 5 seconds per week.

    One of the first uses of the RTC is the K command in the system monitor and is used to measure the speed of the byte-code interpreter. The image above shows the command running and returning a value every 6 seconds (after the first incomplete run). The values shown are the BCD counts for a 60-instruction loop of 8080 machine code. Inserting a decimal point in the middle of this 4-digit number represents the interpreter speed in kilo-instructions per second (KIPs).

    The monitor starts up with serial support turned on, so the Rx and Tx threads are running and the speed comes in around 56.5 KIPs. The T command toggles the serial mode off and this increases the speed to the maximum 58.25 KIPs, or around 1/5th of the original 2 MHz 8080 rated 290 KIPs. The final example shows everything turned on: The serial mode is toggled back on and the audio thread is started with all three melodic voices enabled. This drops the speed to 39.6 KIPs, or between 1/7th and 1/8th the speed of the original 8080.

  • System Monitor

    Alastair Hewitt04/30/2021 at 03:39 0 comments

    I got a suitably dog-eared copy of 8080/Z80 Assembly Language Techniques for Improved Programming that covers the development of a system monitor in chapter 6.

    The code is also available here, but the book breaks it down into stages so you can build up and debug the functionality step by step. This is invaluable since my 8080 byte-code interpreter is riddled with bugs!

    There was some additional work needed before even getting through the first exercise in attaching the console. I needed a way to interface the virtual UART to the 8080 and the most elegant way of doing this was via the input/output ports. The first 8 were assigned to the expansion board, but the rest have now been assigned as follows:

    Port#InputOutput
    0-7Expansion InExpansion Out
    8Serial RxSerial Tx
    9Console (KBD)Console (CRT)
    10KBD Scan CodesSet Audio Mode
    11Cursor CharacterDisable Rx
    12-63Zero Page ReadZero Page Write

    The system's zero page is not addressable by the 8080, so 52 ports are mapped to this memory space via the ports. The console provides a decoded keyboard input and a simple text terminal output to make interfacing easy for the system monitor.

    The second exercise in the monitor development was the memory dump command. This is now working after debugging the associated 8080 instructions and arithmetic functions. The following animated GIF demonstrates dumping memory locations 0-300 in real time.

  • Two Years Later

    Alastair Hewitt02/22/2021 at 03:45 0 comments

    It's been a couple of months since the last update and more like three since anything meaningful changed. There has been (yet) another board revision and Rev. 8 is now good enough to actually solder the chips in place!

    Just like last year, the project is coming out of a design phase and beginning the next stage of development. The past year focused on the firmware (hardware abstraction layer) and this year will focus on the operating system. This primarily involves bringing up CP/M, but there's a bit more to it than that...

    Preemptive Multitasking

    One advantage of the byte-code interpreter is the CPU state is already in RAM. This makes it easy to switch the CPU context and have more than one CPU running on the machine. The banked memory provides up to 8 banks of 64k and each bank can be assigned to a separate CPU instance.

    A counter is incremented at the end of each virtual process block (every 4 lines in SVGA) and the context is switched every 75 blocks. The context is determined by a sequence of 256 that can be set up to prioritize how often each CPU runs. This sequence takes up to 2 seconds to complete, but would typically repeat faster since each CPU can yield before the block count gets to 75.

    The context switch takes advantage of the 2-cycle identity function to read/write from the zero page to an adjacent memory location in a single instruction. This allows a entire context switch to be completed in under 80us. The context switch is also the only time the memory bank can be changed and will prevent another process from accessing or modifying another's memory.

    This memory segmentation is very important since half the memory banks are used as a disk drive. Without segmentation a crashed user program could write to the memory and damage the file system.

    Shared Memory

    Bank 0 contains the display and state of the hardware abstraction layer. This state is in a protected area above 0xF0 in the memory and also contains the context for each CPU. There is no context for bank 0, so this is used to hold the context sequence to determine the next CPU context.

    0xF0: Context Sequence
    0xFn: Context n (1-7)
    0xF8: Keyboard Scan Code Buffer
    0xF9: Keyboard Character Buffer
    0xFA: Serial Receive Buffer
    0xFB: Serial Transmit Buffer
    0xFC: TBD
    0xFD: TBD
    0xFE: TBD
    0xFF: Zero Page (HAL state)

    Each CPU context is broken down as follows:

    [0x00 ... 0x7F] [0x80 .... 0xE7] [0xE8 .. 0xEB] [0xEC . 0xFE] [0xFF]
    <-record body->|<-message body->|<-msg header->|<-CPU state->| flag

    The top 128 bytes is a fixed buffer used for transferring records. The next two sections can contain a message used for inter-process communication; consisting of a variable body up to 104 byes in length and a header containing message metadata. The next 19 bytes contain the CPU state. The final byte is a binary semaphore to signal (0) or wait (-1).

    Kernel

    Each CPU can only access its own context. However, the first CPU (bank 1) has an additional privilege to access the context of the other CPUs (2-7). This first CPU runs a kernel to manage and coordinate inter-process communication between the other CPUs (master/slave configuration).

    One bank (2) is configured to run the CP/M operating system and the last four banks (4-7) run a process to manage the memory as a RAM disk (designated as the A: drive). The following diagram shows how CP/M would request a record from the RAM disk using a context sequence of 2:1:4:5:6:7:1.

    The CP/M context would publish a message to request a record and then yield. Yielding involves timing out the context block count and setting the semaphore flag to -1 (wait). The CPU is now halted and blocked in the wait state until a signal (0). The context switch would then happen at the end of the current process block. 

    The next context is the kernel. The kernel operates in an event loop checking the messages from each of the other CPUs (2-7). The kernel sees the message from context 2 (CP/M) and...

    Read more »

  • Internet Connection

    Alastair Hewitt12/13/2020 at 19:20 0 comments

    Thanks to @Al Williams recent writeup a few questions came up about the Internet connection, "does this have ethernet? Or does it use PPP over that serial line". Well basically, all of the above.

    The physical data connection to the board is RS-232-C running at 9600 baud (8-N-1) with RTS/CTS flow control. There's a couple of options from here to get to the Internet. The classical method is via a serial line protocol like SLIP or PPP to a dialup modem. This requires a TCP/IP stack on the machine to handle the rest of the layer-2 and layer-3 network protocol. This would involve porting a stack like uIP and is still some way off in terms of development.

    An easier way to connect is via an IoT Wifi/Ethernet-to-UART module. Shown below is the Novasaur with one of these modules to support an Ethernet network connection (also shown with HDMI).

    These modules are a bit of a cheat though. They not only adapt the physical Wifi/Ethernet interface but also contain a micro-controller to handle the TCP/IP connections. The payload is pulled out of the protocol and then sent over the RS-232 like a simple UART serial connection.

    In fact, the current serial terminal program can already display protocols such as HTTP. The (blurry) image below shows a browser connecting to the Novasaur and asking for a web page. The HTTP protocol is just echoed to the screen, but a client program could interpret this and serve up a web page in response.

    A web server is also some way off. The good news is the 8080 CPU is partially tested and running. There's still a lot more to test and plenty of bugs to chase down over the next few weeks. After that a simple monitor program can be added and the work to bring up CP/M can begin.

  • Serial Terminal

    Alastair Hewitt11/25/2020 at 00:04 0 comments

    The first step in the serial terminal development was to echo characters typed on the keyboard to the screen. The new receive code is now integrated and echos text received over the RS232 serial interface to the screen as well.

    The animated GIF below shows text being received over the serial connection at 9,600 baud, or 960 bytes per second. The text is 2.4k bytes and takes about 2.5 seconds to transfer (shown in real time).

    The connection is made via a USB-to-RS232 null-model cable containing an FTDI chip. The cable includes a transmit and receive LED that can be seen below as both lit. This full duplex communication is possible by using two threads to handle both transmit and receive concurrently.

    Each byte typed on the keyboard or received over the serial link is echoed back over the serial connection. The terminal program shown below is displaying the same text being transmitted after it is echoed back.

    This was not a serious attempt to build a functional terminal program, but just a convienient way of testing the keyboard and serial interfaces. Next up is the virtual CPU testing, which should be a lot easier with a keyboard and a way to transfer code to/from the machine.

  • Bit Banged

    Alastair Hewitt11/22/2020 at 18:01 0 comments

    Just completed testing of the new serial receive code and confirmed it can remain synchronized with inputs from 9300 and 9800 baud. It look about two weeks to figure out the new algorithm and code it. The best part was the final solution required no more resources that the overly-simple original. Like the transmit, the receive thread only consumes one virtual machine cycle per bit and only needed one additional (repurposed) unary function.

    The diagram below is a little complex to explain in detail here, but might be of interest in showing some of the analysis behind the algorithm.

    The problem being solved here is the synchronization between the transmitter and receiver. Sure, they both run at "9600 baud", but the reality is the clocks are going to drift. This results is the clock slipping one bit ahead or behind periodically. The sampling point also needs adjustment to keep away from the clock edge and prevent spurious data caused by jitter.

    The new algorithm examines six sample points over two bit periods. The two bits in question are the stop then start bit. This is guaranteed to be a high-to-low transition regardless of the data being received. The position of this transition is monitored and the data bit sample point is adjusted to avoid any clock jitter/slippage. In addition, the timing is also adjusted when the transition gets too close to either edge of the sampling window.

    The state machine has a 10-bit cycle to match the start, the 8 data, and stop bits. If the clock drifts too far then one cycle is either added or removed. If the sample position has moved such that the next data bit sample would align wtih the start bit then an additional empty skip bit is added. This ignores the start bit and creates an 11-bit cycle to realign the timing of the next 10-bit cycle correctly.

    A similar thing is done for the other direction when an additional double cycle is added. This cycle samples two bits in the one cycle and then jump ahead by two bits. The result is a 9-bit cycle and a timing adjustment in the other direction.

    These adjustments can compensate for a slip of up to one sample period per byte. The serial ports are sampled on every line, so either 4 or 5 lines per bit, or 40 or 50 lines per byte. This translates to an error of 2.5% (1/40) or 2% (1/50) and provides a window of 9400-9800 baud for the serial connection.

  • TV Typewriter

    Alastair Hewitt11/14/2020 at 19:24 0 comments

    Testing moved to the serial interfaces last month with the development of a simple terminal program. This will display text typed on the keyboard and echo it over the RS232 interface. The serial interface is full-duplex, so data sent back over the RS232 interface is displayed on the screen.

    The first step was to get to a TV Typewriter. The PS/2 interface clock and data bits are sampled during the horizontal sync period. This then drives a state machine that deserializes the data to recover the scan code. Each scan code is added to a buffer and then decoded via another state machine to track things like shift/control key state. Special combinations of ctrl-alt are mapped to system calls with ctrl-alt-del calling the system restart.

    The keyboard buffer is sampled by the serial terminal code and any new characters are displayed on the screen and echoed over RS232 at ~9600 baud. There are no plans to develop this terminal code beyond a testing tool, so the terminal only handles lower/upper case characters, carriage return/line feed, and backspace.

    The transmit code is working fine, but there was a major design flaw in the receive code. I identified and solved part of the problem with the asynchronous clock recovery but missed the bigger picture with the clock slipping over process cycles. This results in an extra bit arriving in some cycles, or conversely no bits arriving. The Novasaur samples the RS232 data at 9593 baud and will typically miss 7 bits per second if the data is transmitted at exactly 9600 baud. Missing a single bit pushes the stop/start bits out of alignment and the data turns to garbage.

    So it's back the drawing board. I have a new algorithm that looks promising, but it is significantly more complex. There are a lot of corner cases that need to be addressed and it will likely take the rest of this month to get to working code.

  • Roll-your-own SID Chip

    Alastair Hewitt10/07/2020 at 03:30 0 comments

    Audio testing is now complete. This includes both hardware updates and the software to generate the sound. Since the sound system is finalized this would be a good point to review all the gory details.

    Hardware

    To keep the hardware minimal, no registers are dedicated to the audio. Instead time is borrowed from the GPU's glyph (G) register during the horizontal blanking period. The GPU address registers (H and V) are left in tristate during blanking and pulled high to generate the address 0x0FFFF. This is the top byte of the zero page and reserved to store the next audio sample as a 7-bit signed number. The blanking period also switches to the ALU instead of the font ROM with a special audio function at 0x3FFXX. This function remove the sign bit to create a DC-biased audio level and reverses the bits since due to PCB layout constraints the MSB of the audio DAC connects to the LSB of the register.

    The audio DAC gets the full glyph signal during the active video period and the initial design attempted to use a sample and hold circuit to sample just the audio when blanking. This didn't do a good job of isolating the video signal and led to a lot of noise issues. The circuit was redesigned to the following:

    The new design uses the H-sync signal (blue trace below) to mute the DAC during the active period and then allow the audio signal (yellow trace below) to recover during the blanking. This presents pure PCM pulses to the audio filter stage rather than the typical step function. This isn't a problem since they both contain the same frequency domain information. The power level is a lot lower though, so a 20dB inverting amplifier is needed to bring the level up to the -10dBv line level.

    Prior to the amplifier are two filters: A second-order Sallen-Key low-pass filter followed by a passive high-pass filter. The high-pass cuts frequencies below 16Hz and the low-pass above 4.8kHz. This is the Nyquist corner frequency when generating audio at the standard 9.6kHz virtual process rate. The frequency response is shown below:

    Software

    The same method used by the Gigatron was shamelessly copied to generate the audio waveform here: A lookup table is used to map each note to a 16-bit value that is then added to a 16-bit counter register. The addition is done at a fixed sample rate such that the register counts to 65,536 at the frequency of the note being played. The upper 8 bits of this counter are then used to index another lookup table that contains a sample of a waveform. Multiple voices are generated by using additional 16-bit counters for different notes and adding the result of waveform lookups together.

    Two functions are included in the ALU to lookup the note by the MIDI value and return the high and low byte to use for the 16-bit counter register. The table goes from 0 to 127 for use with the non 60Hz VGA video mode, where full 88-key piano keyboard goes from 21 to 108. For 60Hz VGA the sample frequency is slightly different, so the table is duplicated for this frequency between 128 and 255. In both cases the entire 88-key piano frequency range can be played.

    Voices

    The Gigatron is able to compute one voice per line during the horizontal sync period. The Novasaur requires up to 48 compute cycles to calculate each voice, which is longer than the entire virtual machine cycle containing the horizontal sync. The audio has to therefore consume additional machine cycles and is treated as an optional feature with the number of voices made configurable.

    The audio is handled by a non-blocking thread scheduled at the end of the first line in the virtual  process cycle. At least 2 virtual machine cycles are required if the audio is enabled and this can be extended by an additional cycle per voice up to a total of 4 cycles. The first two cycles provides the first melodic voice and an additional non-melodic voice that would typically generate a random noise signal. Each additional cycle adds...

    Read more »

View all 69 project logs

Enjoy this project?

Share

Discussions

Alexander wrote 05/28/2021 at 13:45 point

2 CPUS and a GPU  in TTL?!?!  Is there more than that board? Seems literally incredible. Either I'm misunderstanding what you did, or what did is really amazing.

  Are you sure? yes | no

Alastair Hewitt wrote 05/28/2021 at 20:43 point

Thanks! That's the only board... what you don't see is the software. That's where most of the magic happens :) The CPU is very minimal and uses a byte-code interpreter to run 8080 machine code. The GPU is really just a DMA controller. It uses the same pipelining as the CPU, but only one "ALU" operation to do a font lookup.

  Are you sure? yes | no

aldolo wrote 05/31/2021 at 12:01 point

I'm trying to understand also, but the schematics itself is not usefull at all to grasp the soul of this project...

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2021 at 16:52 point

The CPU and interfaces are all software defined so the schematic doesn't give much insight into how thing actually work. The logs cover most of the development details though. I'm planning on going back and explaining the virtual machine in more detail now I've finished that part of the development.

  Are you sure? yes | no

aldolo wrote 05/19/2021 at 19:49 point

an 8080 emulation in 3 dozens of ttl chips seems a bit optimistic

  Are you sure? yes | no

Alastair Hewitt wrote 05/20/2021 at 05:54 point

That number includes the video and audio hardware. The CPU proper is only 22 TTL chips, so the 8080 emulation is done in less than 2 dozen.

  Are you sure? yes | no

Marcel van Kervinck wrote 03/25/2020 at 17:11 point

Great name change!

  Are you sure? yes | no

monsonite wrote 11/05/2019 at 15:04 point

Hi Alastair, I stumbled across your project following on from a message from Marcel. Excellent work and very inspirational. I'm planning a 16-bit design based on a 4-bit bitslice design and video and sound will not be a high priority. I noticed that you mentioned overclocking the ROM. I hope to be using a AT7C1024-45 - have you any estimate of how fast that might clock?

  Are you sure? yes | no

Alastair Hewitt wrote 11/05/2019 at 18:14 point

Thanks for the follow! I've become less certain about overclocking... I'm routinely seeing the 55ns OTP ROM perform as fast as 12ns. That's actually causing issues because the pull up resistors on the bus are jumping high for 6ns during the CPU/GPU context switch. The ROM is so fast it sees that as a valid address (0xFFFF) and returns a value before then doing the actual look up. That means it's doing twice the work in a time window that was barely long enough to do one. This is slowing things down a bit and I need to solve that problem before I can get an idea about actual performance.

Saying that, this is what I found with the 70ns NOR flash. That was responding within 32ns, so more than twice as fast. But, there are certain addresses, or sequences, that take up to 50ns. You have to design around the worse case, so that would be the actual limit. Since then I've seen it slow down a little more and that number is closer to 55ns. I suspect that may have been caused by repeated flashing of the chip. The chip also slows down when it heats up and you can expect another 5ns at 50C. That brings it down to 60ns. That's still better than the 70, but not by much.

So you should do better than 45ns and may see actual speeds in 10-20ns range. I wouldn't get too carried away though since worse case may be closer to 40ns for reliable operation in all conditions.

  Are you sure? yes | no

Shrad wrote 12/14/2020 at 19:37 point

Just my two cents... why not use two interleaved ROM to double the rate? would be easier than any other solution and there should be a leftover flip-flop somewhere to clock them each at a turn...

  Are you sure? yes | no

Marcel van Kervinck wrote 08/18/2019 at 08:14 point

I wonder if your architecture would be classified as a barrel processor. Any thoughts on that? https://en.wikipedia.org/wiki/Barrel_processor

  Are you sure? yes | no

Alastair Hewitt wrote 08/18/2019 at 13:24 point

I was a bit generous when using the term "GPU". That part of the circuit is really a DMA controller running in transparent mode.

https://en.wikipedia.org/wiki/Direct_memory_access#Transparent_mode

The Harvard Architecture makes it fairly simple to implement since there's two address/data spaces. I'm able to use both concurrently with some pipelining. The same technique could be used to build a 2-core barrel processor. I assume you would have to replicate the CPU registers though.

  Are you sure? yes | no

Shranav Palakurthi wrote 05/15/2019 at 03:05 point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 05/15/2019 at 11:48 point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct - https://www.polycase.com/zn-40

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago https://hackaday.io/project/20781-gigatron-ttl-microcomputer/log/56640-testing-a-bunch-of-diodes . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2019 at 23:13 point

I finally ditched the diode ROM. I was able to juggle things around a bit and got it down to just 8 diodes configured as two 4-input AND gates. I decided to just add the additional chip and use a 74F21 instead. It's very fast with a Tp of just over 3 ns.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

https://hackaday.io/project/158329-dawn-the-subleq-operating-system-by-geri 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 

greetings

Geri

  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?

---

Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 (http://apollo181.wixsite.com/apollo181/index) which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!

---

If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version: https://www.youtube.com/watch?v=e0QCErIIOWA

2) ATMega 328p "ROM" version: https://www.youtube.com/watch?v=tKO3O2UY_7s

3) And a schematic: http://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html

I have built the TD4 and have PCB designs on EasyEDA (https://easyeda.com/search?wd=td4b&indextype=projects), you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates