Close
0%
0%

YATAC78 - The WWW TTL Computer

Retro computer built from 1978-era TTL logic chips. Internet capable with built in web browser and server

Similar projects worth following
Can you browse the Web using pre-1980 TTL logic and memory speeds? The goal of this project is to demonstrate how. Internet connectivity is via an era-appropriate RS232 interface. The machine is upward compatible by a decade to support currently available keyboard and video interfaces (PS/2 and VGA). The video includes a native text mode capable of displaying 80-columns and two bitmapped color graphics modes for retro gaming.

https://github.com/ajhewitt/YATAC78

YATAC78 - Yet Another TTL Archaic Computer (1978)

  • Dual Processor CPU/GPU (Harvard Architecture).
  • 32 MHz dot clock, 16 MHz memory clock, 8 MHz per processor (2-4 CPU MIPs)
  • 256k ROM: 96k ALU, 64k native program, 64k relocatable code, 32k fonts.
  • 128k RAM: 64k user, 43k display, 16k files, 5k buffers.
  • 50+ ALU functions including multiply/divide and math functions.
  • Bitmapped Graphics: Hi-res mode with 8 colors, 4 dithering patterns (320x240, 320x160 @ 75Hz, 320x256, 320x200 @ 60 Hz). Lo-res mode with 256 colors, double buffered (160x120, 160x96, 160x80 @ 75 Hz, 160x150, 160x120, 160x100 @ 60 Hz)
  • Text Mode: 80 columns using code page 437, 8 colors FG/BG, 256 line buffer, up to 8 1/2 page smooth scroll. 8x8 glyph text (80x60, 80x48 @ 75 Hz, 80x75, 80x60 @ 60 Hz), 8x16 glyph text (80x30 @ 75 Hz, 80x36 @ 60 Hz)
  • Audio: 2 melodic voices, 1 noise channel, 10 waveforms, ADSR, 9.6 kHz/8-bit DAC.
  • PS2 Keyboard interface built in.
  • RS232 Serial Port for host/client and network connectivity (9600 baud).
  • Expansion Port: 7 addressable 8-bit registers in/out, 4 input flags, 1 flip-flop.
  • Blinkenlights: 1
  • Chip Count: 40 Total (34 TTL, 3 analog, 1 ROM, 1 RAM, and 1 PAL).
  • Target PCB size: 8" x 5" (200 x 125mm) 4-layer board.

The system bus is clocked at 16MHz and a typical CPU instruction spans 4 clock cycles as follows:

  1. Fetch Instruction from ROM.
  2. Read data from source register or RAM.
  3. Execute ALU function using the ROM as lookup table.
  4. Write data to register and optionally accumulator and/or RAM.

The alternating use of both ROM and RAM allows a second processor to be added to the system. Both processors use a pipeline to cache data between the two address spaces. One processor handles serial communications and general computational tasks (CPU) while the other is dedicated to the display and audio (GPU).

The following sequence of diagrams demonstrates the multiplexing of the CPU (shown in blue) and GPU (shown in red). In this example the GPU is operating in text mode and the CPU is executing the sequence described in the numbered list above.

In the first cycle the GPU reads the ASCII code point of a character from the RAM and stores the result in the pipeline (gc). The CPU addresses the ROM using the Program Counter (PC) and Page Register (Pg) to fetch an instruction to the Instruction Register (I).

In the next cycle the context switches over and the CPU's X and Y registers are used to address the RAM and load the pipeline (cc). Meanwhile the gc, along with the Scan Register (S), is used to address the ROM and fetch a character bitmap line for the Glyph Register (G)

The GPU returns to the RAM where the H counter was moved to the next byte and loads the gc with text color values. The ROM is now configured as an ALU with a function specified in the instruction. The cc is combined with one half of the HL register and the result is stored in a register and Accumulator (A).

In the final cycle the value in A is written to the RAM. The font colors stored in gc are moved to the RAMDAC (C) and the bitmap loaded in to a shift register to start the next character render cycle.

Note: 8-bit ALU functions repeat the last two cycles to combine both halves of the HL register to form an 8-bit result.

inst_set.json

Mnemonics and hex codes for all 17,746 unique instructions.

application/json - 452.37 kB - 10/01/2019 at 02:02

Download

instruction_encoding.v1.0.png

16-bit instruction encoding.

image/png - 16.71 kB - 09/12/2019 at 03:51

Preview
Download

font_rom_v0.2.png

Font ROM rendered as a bitmapped image.

image/png - 10.28 kB - 08/11/2019 at 04:58

Preview
Download

memory-map.v1.0.png

Memory map of RAM and ROM address layout.

image/png - 57.50 kB - 07/08/2019 at 04:02

Preview
Download

CPU-state-machine-test-vectors.v1.0.pdf

Simulation of CPU State Machine from WinCUPL.

application/pdf - 161.02 kB - 07/04/2019 at 19:22

Preview
Download

View all 6 files

  • 14 × 74F574 Octal D-type Flip-flop with Tri-state Outputs
  • 4 × 74F163 Synchronous 4-Bit Binary Counter
  • 1 × 74F08 Quad 2-Input AND Gates
  • 1 × 74F257 Quad 2-line to 1-line Multiplexers with Tri-State Outputs
  • 2 × 74F139 Dual 2-line to 4-line Decoders

View all 19 components

  • Success

    Alastair Hewitt2 days ago 0 comments

    Quick update after a long weekend. The final version of the CPU has been built and tested. It's not the prettiest thing in the world!

    There's not much to demo until the GPU is installed. For now, the most exciting thing it has done is generate a 1 Hz pulse. That may sound simple, but this was using a version of the planned RTC code (accurate to 8.5 ppm). It requires 12 bits to divide down the 8 MHz process clock and would normally only use three bytes of the zero page. The version tested used both the zero page and the full RAM address space of bank 0. The ALU operations were also expanded to test the full 2-cycle ALU addition/subtraction instead of just doing increment/decrement.

    A couple of notes on the picture: The 70ns NOR flash was having a hard time meeting the 50ns access cycle of the 16 MHz machine clock, so a couple of slower oscillators are being used for testing (the actual OTP ROM is 55ns and should be fine) . There are patch wires on the ROM address and data busses that can be moved to add/remove bus drivers. The current design exceeds the recommended fanout on the data bus, but it doesn't appear to be an issue. In fact the circuit is a lot more stable without them.

  • Video Modes

    Alastair Hewitt09/25/2019 at 05:12 0 comments

    An early log talked about 16 possible video modes. This is still the case, but a lot has changed since then. The following should clarify what the current modes are and how they are supported.

    The 16 modes are defined by 4 bits with the following states:

    • Mode0 - Text (0) or Graphics (1)
    • Mode1  - Low (0) or High (1) resolution.
    • Mode2 - VGA (0) or SVGA (1)
    • Mode3 - Mod 16 (0) or Mod 15 (1) timing.

    Mode0 is a hardware state (bit 4 of the Eo register) and selects whether the GPU executes one (graphics mode) or two (text mode) machine-cycles per process cycle. The two-machine cycle will complete 80 active process cycles per line, representing 80 characters composed of a code point and color byte. The one-machine cycle completes 160 active cycles, either as 160 single color values (low-res graphics) or 160 nibbles (hi-res graphics).

    Mode1 is also a hardware state (bit 5 of the Eo register) and selects whether the 8x8 or 8x16 glyphs are selected from the font ROM. This bit is also used to define the high/low resolution setting for the graphics mode.

    Mode2 is used to control the number of lines per frame in software. A low value selects a VGA mode (640x480) at a field rate of 75 Hz using 512 lines per field. A high value selects an SVGA mode (800x600) at a field rate of 60 Hz using 640 lines per field.

    Mode3 is also used to control the video timing in software. The number of lines are divided down depending on the video mode and there are two different ways to do this: A low value selects a Mod16, allowing the timing to be divided down by 2, 4, 8, or 16. A high selects a Mod15 allowing the timing to be divided down by 3 or 5. Multiples of 2 are also available to divide down by 6 or 10.

    The following tables show all the resolutions available by combining the Mode0 and Mode1 bits for the columns and the Mode2 and Mode3 bits for the rows. The value of the modulo is shown in brackets next to the resolution (%n).


    Graphics
    (hi-res)

    Graphics

    (lo-res)

    Text

    (8x8)

    Text

    (8x16)

    VGA%16320x240 (%2)160x120 (%4)80x60 (%8)80x30 (%16)
    VGA%15320x160 (%3)160x96 (%5)80x48 (%10)*160x80 (%6)
    SVGA%16320x256 (%2)160x150 (%4)80x75 (%8)80x36 (%16)
    SVGA%15320x200 (%3)160x120 (%5)80x60 (%10)*160x100 (%6)


    *Note: Mod15 is not used for the 8x16 glyph text mode, so an additional lo-res graphics mode is defined using a modulo of 6.

  • Firmware - part 2

    Alastair Hewitt09/14/2019 at 06:08 0 comments

    The following shows a breakdown of the firmware process cycle described in the last log. Each cycle spans 4 lines and consists of 5 machine cycles per line:

    The firmware machine cycle consists of 34 hardware process cycles for either the fetch, execute, or horizontal sync. Each machine cycles ends in a decode page jump (DPG) driven by the process cycle state and instruction. This decode takes 6 hardware process cycles resulting in a total length of 40, or 5 uS. Once the fetch is performed, each instruction requires one or two execution cycles. If the instruction is a NOP, then the next instruction is fetched. At the end of the execution cycle the instruction value is set to NOP so that the DPG will jump to fetch.

    The 4th machine cycle is reserved for the horizontal sync handling. This also takes 34 hardware process cycles, plus the DPG, and includes an additional 8 cycles for sampling the PS2 port. This is a simple record-and-shift operation performed at the full 38.4 kHz line rate. The PS2 clock and data lines are sampled by two nibbles with the previous sample being shifted. The result after 4 lines is a byte containing 4 bits of the sampled clock and 4 bits of the sampled data. This can be processed to determine what data was received via the port, however, this data is only processed occasionally as described below.

    Each firmware process cycle begins with the RST cycle to reset the process state and decide which feature to handle in the following machine cycles. The feature takes up the next one, two, or three machine cycles and can consist of the following:

    • Serial communication
    • Audio generation
    • Keyboard input

    The first two are exclusive, so audio can not be generated when serial communication is being handled (sorry, no streaming audio on this machine!). Serial may be full duplex, but could also be handled as half duplex and one of the machine cycles can be given back to the interpreter. The audio takes up two machines cycles and will handle at least two melodic voices and one noise channel. More voices, or ADSR, will be added if there is room when the implementation is finalized.

    The keyboard is handled as an additional feature so that serial or audio can be processed concurrently with keyboard input (the latter being required for games). All the serial ports are implemented with hardware flow control, so the keyboard can be suppressed until a keyboard feature cycle is used. The plan is to sample the keyboard 15 times per second, or every 4th refresh at the 60 Hz field rate, or 5th refresh at the 75 Hz field rate. The keyboard input is processed for at least 128 lines, which should allow up to 3 bytes to be read. PS2 devices are required to buffer when the clock is inhibited, so this shouldn't be a problem as long as the user doesn't sustain 15 key presses per second.

    PS2 interfaces are also bi-directional and the keyboard requires things like a reset command on power up. These are atypical events and are handled by specialized functions rather than handling during the standard firmware process cycle. The keyboard data transmit function includes the horizontal sync timing but does not run the interpreter. This is to facilitate data transmission at the keyboard's clock rate, which is faster and asynchronous to the 9600 process cycle. This will be fairly rare though (reset, cap lock, setting change) and should only last about 2 milliseconds. 

  • Firmware

    Alastair Hewitt09/02/2019 at 20:08 0 comments

    A lot of progress was made over the past month on the software side of things. This is a very significant part of the overall system and has also been driving changes in the underlying hardware. A workable design is now in place and some details are covered in the following log.

    First, some terminology

    • Firmware is used to describe the code that is executed by the machine. This code resides in the ROM and is executed directly by the hardware.
    • Software is used to describe the interpreted code that resides in the RAM. This code is read by the firmware and executed to emulate the operation of a virtual CPU.

    The firmware is technically running on a RISC machine, except instructions take more than one clock cycle to execute. This is due to the limitation of 8-bit wide memory and putting both the firmware and ALU in the same ROM. The machine could operate at one instruction per clock by using a 16-bit wide memory for the firmware and a separate ROM for the ALU. However, the current design requires up to four cycles per instruction:

    • Fetch - one or two process cycles
    • Execute - one or two process cycles

    Each process cycle consists of two machine cycles which alternate between access to the ROM and RAM. The fetch cycle provides a read access to the RAM and the execute cycle provides an optional write. This way every instruction can take the form: fetch, read, execute, write (the same read and write cycle is repeated when more than one cycle is needed to fetch and/or execute).

    The software is interpreted by an emulator in firmware that also controls both the video display and serial communications. These systems are timing critical and maintaining synchronization is a significant part of the design challenge. One approach is tracking the number of cycles per instruction before the synchronized event and breaking when not enough cycles are left to guarantee the next instruction will execute in time. The other is to break the virtual instructions down in to fetch and execute cycles and use fixed cycle timing for each.

    Cycle counting makes sense only if the complete instruction fetch/execution time is fairly small. After implementing instruction from various CPUs it appears cycle counting could result in only one instruction being executed between the horizontal sync events. This happens when the first instruction loaded is on the longer side and there isn't enough time left to execute another long instruction. The fixed fetch/execute timing has the ability to split instructions across the horizontal sync event and avoid wasting unusable cycles when longer instructions were executed.

    The fixed fetch/execute timing is being used with a processing rate of 9600 cycles per second. This leads to the following counts for vertical video synchronization:

    • VGA - 128 process cycles per field at 75 fields per second
    • SVGA - 160 process cycles per field at 60 fields per second

    Each process cycle consists of 4 lines and each line consists of 5 machine cycles (20 machine cycles per process cycle). One machine cycle is needed per line to perform the horizontal sync at a rate of 38.4 kHz. Three additional machine cycles are used every process cycle for other overheads including serial communication at 9600 baud. This leaves 13 machine cycles for the interpreter, which would equate to around 125k machine cycles per second (average 8 uS per machine cycle). This is equivalent to an RCA CDP1802 running at 1 MHz, or about half the processor speed of the COSMAC Elf.

    The machine cycle is tracked using an 8-bit state machine in firmware. This state is combined with the fetched instruction to determine the code page to load via a decode function in the ALU. The state is also combined with the process cycle count via another ALU function to set the Scan Counter. Other custom ALU functions are used by the interpreter such as the Arithmetic Status (AS). This function returns two nibbles, one consisting of the status flags on addition and the other on...

    Read more »

  • CPU State Machine - part 2

    Alastair Hewitt08/22/2019 at 04:31 0 comments

    The CPU state machine was introduced early in the project and has undergone some evolution as the design progressed. Hopefully things are stable enough to expand on the details.

    The CPU essentially does the same thing on every instruction cycle: It fetches an instruction and then latches data from the ROM in to a register. The instruction always comes from the program bank of the ROM, the data can come from either the program half or ALU banks. In the case of the program bank, the data would be an operand. The ALU banks return the result of an arithmetic or logical operation.

    Every instruction cycle begins with an instruction fetch, which can be either one or two bytes in length. The upper bit (i7) of the first instruction byte determines if another instruction byte should be fetched. Once the instruction is fetched an operand or ALU result will follow. In the case of the ALU, either the H or L register can be selected as the input to the ALU, or in the case of 8-bit operations, both the H and L are used sequentially. A conditional option also exists where the operand fetch can be skipped based on the sign of the accumulator. If the condition is not met then the CPU will skip the second part of the cycle and fetch the next instruction immediately.

    If any of that made sense then you'll be aware of 4 possible sequences:

    • Single Cycle - conditional operand fetch where condition is not met (NOP)
    • 2-Cycle - Single byte instruction, fetch operand, or 4-bit ALU operation.
    • 3-Cycle - Two byte instruction, 4-bit ALU operation.
    • 4-Cycle - Two byte instruction,  8-bit ALU operation.

    This state can be handled with a 2-bit finite state machine. The op code is encoded using the top 3-bits of the first instruction byte (i7-5). Only the H register is used for the 2-cycle ALU operations and is also used when i6 is high in the 3-cycle operations. If i6 is low then the L register is used, and if i5 is low then the state machine will progress to a 4th cycle and use the H register as well. The NOP will keep the state machine in the initial instruction fetch state, which is also the reset vector.

    Even though this may sound complex, the state-transition diagram should be easy to follow:

    It takes two D flip flops and about a dozen gates to implement this state machine. There are two other simple state machines used as well: One is used to cache the sign of the accumulator (1-bit) and the other selects the boot page (1-bit). The boot page is needed to clear the page register after a reset. The page register is left in tri-state and pulled high to 0xFF on reset and left that way until a negative conditional register load is completed. This way a boot sequence can be executed on start up and cleared when an initialization loop ends. These require another two D flip flops and dozen gates.

    All this can be implemented with TTL chips and the initial design was able to fit this in about 6 chips. Another option was to use Programable Array Logic. The requirements are simple enough to fit one of the original (circa 1978) PAL chips, the PAL16R4. This provides 8 inputs, 4 combinational logic outputs, and 4 registered outputs for the state machines.

    The additional combinational outputs are used to provide control signals derived from the state machines:

    • PA17 - (pin 19) A17 for the ROM
    • PA16 - (pin 18) A16 for the ROM
    • !prog - (pin 13) select program/page registers (active low)
    • pcent - (pin 12) Program Counter clock enable (active high)

    The registered outputs are assigned as follows:

    • sign - (pin 17)  sign of the accumulator after the last ALU operation
    • boot - (pin 16) boot mode (active high)
    • hsel - (pin...
    Read more »

  • Scan Lines and Flicker

    Alastair Hewitt08/13/2019 at 03:38 0 comments

    Video testing would never be complete until a CRT was put into the mix. This whole exercise started with the assumption that LCDs would handle arbitrary video timing. How hard is it to do a simple linear interpolation and resample at the native resolution of the LCD display? Too hard apparently, or at least no one wanted to deal the potential Nyquist-gone-bad corner case.

    Luckily my custom modes are close enough to existing standards to be supported by most reasonable LCD monitors. Older HD or newer UHD TVs not so much, but there is a relatively inexpensive converter box to cover these as well.

    It still left a gap though. There once was multi-sync magic where no one cared what resolution you wanted. You just generated as many or as few lines as you wanted. Just pick two frequencies for H and V and bingo, the mode would just work. Were CRTs just a myth?

    My last CRT monitor was jettisoned over a decade ago, so the hunt was on. I was able to find a retired graphics designer who was downsizing and letting go of a LaCie electron 19blue. This circa-2001 display uses a Mitsubishi Diamondtron CRT and supports a horizontal frequency range of 29-110 kHz, a vertical range of 49-140 Hz, and has a video bandwidth of at least 200 MHz.

    These specs are way beyond my requirements, so no issues handling whatever I can throw at it. The example below is the 38.4 kHz horizontal timing, but with a vertical timing of 120 Hz. The result is a 300-line display that shows off the scan lines nicely.

    120 Hz looks great, but going the other way exposes one long forgotten side-effect of CRTs: Flicker. The advantage, and in some cases disadvantage, of LCDs is the relatively long persistence of the image. CRT phosphors have almost no persistence and they rely on the brain's persistence of vision. The SVGA 60 Hz mode looked fine on the LCD but was almost painful to watch on the CRT.

    The VGA 75 Hz mode is a lot more tolerable and would definitely be the preferred choice for any text modes when displayed on a CRT. However, I'm leaning towards supporting both VGA and SVGA for all graphics modes. The 60 Hz SVGA provides the most lines of text for an LCD. If refresh is an issue then the 75 Hz VGA is available at the expense of just a few lines of text.

  • More Video Testing

    Alastair Hewitt08/10/2019 at 04:12 2 comments

    The net was cast wide to capture additional problematic video displays. This chance discovery was abandoned by the side of the road, probably because the main digital board was faulty and the owner didn't want to pay to dispose of it correctly. The panel was still good even though it had survived at least one rain storm. Replacement parts are readily available on Ebay and the repair was made.

    The result was a working 720p plasma TV from 2006 with both VGA and an original HDMI 1.0 input. It proved to be the most problematic of all displays so far. Not only was the 38.4 kHz horizontal scan frequency rejected (invalid input) but it could not accept a valid signal with a vertical frequency above 60 Hz (out of range).

    Meanwhile I ordered a VGA to HDMI converter/scaler. This is the more powerful and expensive option at $16 vs $7. This device (on the left) consumes the VGA signal and regenerates it at a selectable 720p or 1080p from the HDMI output. The cheaper version (on the right) just passes the signal through, converting the analog VGA to a digital HDMI signal with the original timing and resolution intact.

    This scaler had no problems with the 38.4 kHz horizontal and 75 Hz vertical timing. It can output the 720p HDMI signal at 60 Hz that the ancient recycled TV can accept. Not only that, it can output 1080p and provide a way to display 640x480 on the very latest UHD TV.

    So this provides an inexpensive get-out-of-jail-free option for displays that can't handle the 38.4 kHz and/or 75 Hz frequencies. This has only been an issue with older TVs though (pre-2010). All the computer monitors tested so far seem to be OK with the timing. They do have limitations, but sending an extra 12 lines per field is not causing problems.

    Shown below is a monitor that displays 640x480 or 800x600 when given the correct timings. It will switch to display the H and V frequencies when given the non-standard timing, but still displays it correctly at those resolutions.

    Now all the bases are covered on the display front the current design can continue.

  • Video Testing

    Alastair Hewitt08/05/2019 at 03:14 1 comment

    Things inevitably slow down during the summer. Not much happened in July and it will probably stay that way until well in to September. The project is still active though and the plan is to get a working PCB built by year end.

    The last two logs detailed some significant design changes. The GPU change is based on the assumption that both the horizontal scan and serial communication timing can be combined. The only frequency where this is possible is 38.4 kHz. Unfortunately, there are no defined video modes that use this frequency. The hope is that monitors will support some form of generalized timing.

    You would think a simple Google search would yield some answers to this question, but alas no. There is a standards body that was formed to define this stuff, the Video Electronics Standards Association (VESA). They created four standards of interest:

    1. Display Monitor Timing (DMT)
    2. Generalized Timing Formula (GTF)
    3. Coordinated Video Timings (CVT)
    4. EDID Timing - defined by the Extended Display Identification Data (EDID)

    The first document detailed a set of standard video modes. But this was during the era of multisync and multscan montors that could support an arbitrary range of video timings. So how arbitrary? VESA came up with the GTF to define these generalized timings. But they were too general and no one could support them. Then came CVT to put some constraints on things as well as introduce reduced blanking. Both GTF and CVT define generalized video timing that would support a video mode of interest.

    So do monitors support these generalized standards? Apparently not. The final standard (EDID) provides a way for the monitor to tell the graphics card that is does not support generalized timing and list the few standard timings it does. This appears to be how everything works these days and only defined standard video modes are available unless you can dig up a 20-year old multisync CRT monitor.

    So how standard does standard have to be? There are a couple of standard video modes that come close:

    • 640 x 480 - 75 Hz vertical, 37.5 kHz horizontal, 31.5 MHz pixel frequency
    • 800 x 600 - 60 Hz vertical, 37.9 kHz horizontal, 40 MHz pixel frequency

    The ability to generate a usable video signal is a gating factor for the continuation of the new design, so this needs to be tested. This was a blocker so it was time to do some experimentation. This was done by modifying Nick Gammon's Arduino sketch to generate a VGA signal. The timer settings were changed to generate the 38.4 kHz horizontal frequency and both the 60 or 75 Hz modes were tested.

    The results were slightly mixed, but promising. Two older LCD TVs had very little tolerance for non-standard timings and would not sync. Newer LCD monitors and TVs had no trouble handling the higher frequency as can be seen below.

    [Note: the timing is slightly fast and the numbers are rounded up in the above images]

    The acid test was to see if a cheap VGA to HDMI converter could handle this timing. This is important since VGA support is in terminal decline. Almost no monitors and TVs now ship with a VGA input, so a VGA to HDMI converter may be required as part of the standard configuration.

    [Update] After additional testing it appears the cheaper VGA to HDMI converters are just simple analog-to-digital converters with a buffer and data serializer. This means what ever frequency you give them is just passed through from the VGA input to the HDMI output. The hope was a device like this would convert the non-standard timing and support the more sensitive monitors. Additional testing confirmed that if the monitor does not support the non-standard timing via the VGA input then is will not accept the timing via the HDMI converter.

    A panacea to this timing issue may still be available in the form of a VGA to HDMI converter/scaler. This is a more expensive device, but will actually consume the VGA input signal and then generate a 720p or 1080p...

    Read more »

  • 16-bit Instructions

    Alastair Hewitt06/24/2019 at 05:16 0 comments

    The redesign continued to ECU section. The changes are fairly significant, so much so that the breadboard build needs to start over. It was almost back to the drawing board, but the CPU section remains fairly intact. The result is another reduction in chip count to bring the TTL count down to 37 chips.

    The original design packed the instructions into just 8-bits. The instructions need to define the instruction type, ALU function, data source, and destination register. The original encoding resulted in a lot of limitations in the available instructions to pack so much into so little space.

    The instructions do provide everything needed to write code, but actual programs were quickly exposing the instruction set limitations. Quite often additional instructions are needed to move the result from the ALU to the desired register. There is no specific move instruction, so this requires the ALU identity function to target the final register. This results in a lot of wasted cycles.

    The solution is to expand the size of the instructions. The ROM is still 8-bits wide, so this requires an additional cycle to load another byte to get to 16 bits. However, one bit of the first instruction byte can be used to tell the state machine if a second instruction byte should be loaded. This allows variable-length instructions, with a limited set of 7-bit instructions and a full set of 16-bit instructions.

    The new design is using instructions of this format:

    Destination is one of the 8 possible registers (Pg, PC, SC, V, HL, E, X, Y). Source is one of 4 data sources (A, X, E, RAM). There are 8 possible opcodes, with the following 6 defined:

    • LD - load operand (source not used)
    • LDC - load conditional (source defines condition)
    • MV - move source to destination
    • FNH - ALU unary function defined by H - destination = FNH(source)
    • FN4 - nibble-wide ALU binary function
    • FN8 - byte-wide ALU binary function

    The first 4 opcodes define 7-bit instructions. The most-significant bit of these is high and this stops the state machine from loading the second byte. The most-significant bit is also the output enable (active low) of the second instruction register, so the second register is tri-stated and the value pulled high to 0xFF. The ALU binary functions require all 16-bits, so the most-significant bit of the last two opcodes is low. Along with the 4-bit ALU function there are some additional bits as follows:

    The /WE bit (active low) enables write enable on the RAM cycle and stores the result in memory. The ZP bit (active high) specifies the zero page when the RAM is addressed. In a similar way, the ZB bit (active high) specifies the zero bank (the memory bank that contains the display). The EXT bit (active low) specified a set of extended registers and will switch the destination from the internal 8 registers to 8 possible external registers.

    The default values of the second instruction byte are therefore: internal registers, zero bank and page, memory read only, FNH ALU function. This means the RAM source comes from the zero page in the zero bank for the 7-bit instructions.

    This instruction format now defines almost everything needed in a single instruction. This improves performance, even though some instructions require an additional cycle to load the second instruction byte. The new format is using about 60% of the cycles to run the same code. The simpler encoding also reduces the amount of logic in the ECU, so a net reduction in chips, even though a second instruction register was added.

    Next up is a new schematic, complete redesign the PAL, and redo the entire breadboard... so basically, back to square one :(

  • Yet Another Redesign

    Alastair Hewitt06/20/2019 at 02:34 0 comments

    In keeping with a lot of the previous logs, the recent GPU writeup is now obsolete. As predicted, better insight gained through the software design is driving hardware changes. In this case the GPU can be simplified by moving the slower counters to software.

    The original GPU design takes care of almost all video timing, so the CPU interpreter loop can synchronize with a standard baud rate. If the baud rate can be aligned with the display then the same timing overhead can be used for both serial and video. This can be achieved by using a horizontal scan frequency of 38.4 kHz and a baud rate of 38400 (or dividing down to 19200, 9600, etc).

    Running the VESA GTF against this for a standard vertical frequency gives the following mode line:

    # 640x490 @ 75.00 Hz (GTF) hsync: 38.40 kHz; pclk: 31.95 MHz
    Modeline "640x490_75.00"  31.95  640 672 736 832  490 491 494 512  -HSync +Vsync

    The exact dot clock is 31.9488 MHz and this is available in a 31.95 MHz crystal. The mode defines 490 lines, but only 480 would be displayed with 5 additional vertical blanking lines at the top and bottom of the screen.

    With this change both the scan and vertical counters can be eliminated and replaced with registers that are updated every horizontal scan. An additional chip of gates is also eliminated to reduce the design by a total of 4 chips.

View all 29 project logs

Enjoy this project?

Share

Discussions

Marcel van Kervinck wrote 08/18/2019 at 08:14 point

I wonder if your architecture would be classified as a barrel processor. Any thoughts on that? https://en.wikipedia.org/wiki/Barrel_processor

  Are you sure? yes | no

Alastair Hewitt wrote 08/18/2019 at 13:24 point

I was a bit generous when using the term "GPU". That part of the circuit is really a DMA controller running in transparent mode.

https://en.wikipedia.org/wiki/Direct_memory_access#Transparent_mode

The Harvard Architecture makes it fairly simple to implement since there's two address/data spaces. I'm able to use both concurrently with some pipelining. The same technique could be used to build a 2-core barrel processor. I assume you would have to replicate the CPU registers though.

  Are you sure? yes | no

Shranav Palakurthi wrote 05/15/2019 at 03:05 point

I want to see a retro computer with 128K RAM run JavaScript. (will it support Javascript?)

  Are you sure? yes | no

Alastair Hewitt wrote 05/15/2019 at 11:48 point

No plans to go anywhere near Javascript! It would probably run out of memory just downloading a single JS file from a typical web page. There are some minimal JS engines like Espruino out there, but even those would use up all ROM and leave no room for anything else.

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:12 point

I have one those black cases and would love to get a few more any clue from where?

  Are you sure? yes | no

Alastair Hewitt wrote 05/07/2019 at 14:32 point

It's a Polycase ZN-40. You can buy them direct - https://www.polycase.com/zn-40

  Are you sure? yes | no

Scott Devitt wrote 05/07/2019 at 13:10 point

Kinda off target but where did you find that black case. I have one and want a few more but not clue where to find it.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 16:23 point

When I was contemplating the ALU and other random control logic for what later became known as the Gigatron, for quite a while I considered abusing the 74x48 7-segment decoder to build an instruction set around. But it's a slow chip, and also I couldn't get the instruction set quite right. After that phase I realised I really needed a ROM, but ROMs are very slow and it wouldn't fit in the critical path of a 6-8 MHz design. So that's where the diode-ROM came in, because that's fast. Interestingly, that was today exactly 2 years ago https://hackaday.io/project/20781-gigatron-ttl-microcomputer/log/56640-testing-a-bunch-of-diodes . I'm interested in what ROM speed are you planning to use?

  Are you sure? yes | no

Alastair Hewitt wrote 04/05/2019 at 18:58 point

Hi Marcel, thanks for your interest. The Gigatron is the main inspiration for this project, especially your work on generating VGA with TTL chips.

I read your article on using the diodes a few weeks ago. I was a bit worried discrete diodes wouldn’t switch fast enough, but it looks like this will work. I’m doing most of my instruction decode using discrete logic: This includes 8 chips of gates, 3 decoder chips, and 2 flip flop chips for state machines. There is one area where I decode 8 possible states and I plan to use a "diode ROM" for this.

Both the ROM and RAM are accessed at half the VGA dot clock (12.5875 MHz). I need to switch between three different contexts for the ROM address bus: program, ALU, and font bitmap. I have to determine what state I want next and then latch this so everything changes on a single clock edge. I don’t have time to determine the state after the clock edge because it takes up to 12ns to change the bus tri-state. This leaves me with just 65ns to access the ROM then latch the result before the next context switch.

To deal with this timing issue I have to use memory with 55ns or better access speed. The only ROM with this speed is one-time programable. I’ll use this when I have code worthy of "shipping", but for now I’ll be doing development using NOR flash. The fastest DIP version is 70ns (e.g. GLS27SF020) so I’ll need to drop my clock speed a little. Worse case is a screen refresh at 50 Hz instead 60 Hz during development.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/05/2019 at 20:57 point

Ah great. How about the references to an 128K ROM for ALU functions? I also saw a memory map of that, or is that "out" already? Anyway, take your time to reflect and document, if for no other reason than for yourself. I found those "boring documentation cleanup tasks" after a design frenzy helped to improve the end result. [BTW. This is probably a 3-level deep post without Reply button. Threading works best by going back 2 steps and reply from there....]

  Are you sure? yes | no

Alastair Hewitt wrote 04/06/2019 at 01:39 point

(jumping back 2 steps) The same ROM is used for the both the program and ALU. The CPU instructions take more than one cycle. For example: the first cycle reads the instruction from the ROM, the next cycle reads from the RAM, then the ROM is used as an ALU to perform a function, and finally the RAM can be written to. The ALU only handles one nibble at a time, so the last two cycles would be repeated to do a full 8-bit operation.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/06/2019 at 09:47 point

Got it! Good luck with the build! One or two PCB, both have their tradeoff. The Gigatron is very sparsely populated with wide spacing. You might fit your design in a similar size, and the PCB costs aren't really that steep.

  Are you sure? yes | no

Alastair Hewitt wrote 05/31/2019 at 23:13 point

I finally ditched the diode ROM. I was able to juggle things around a bit and got it down to just 8 diodes configured as two 4-input AND gates. I decided to just add the additional chip and use a 74F21 instead. It's very fast with a Tp of just over 3 ns.

  Are you sure? yes | no

Geri wrote 03/08/2019 at 16:20 point

Hi, i following your projects and i am impressed with your works, especially the SUBLEQ implementation. I suggest you to try creating an FPGA based implementation to run my operating system: 

https://hackaday.io/project/158329-dawn-the-subleq-operating-system-by-geri 

Running this operating system will put you in the next league as this is a multitasking-multiwindowing, smp capable operating system, and creating a hardware thats capable to run something like that gives the followers magnitude bigger impression. The example emulators are attached in the zip file to guide you in the process. Feel free to contact me in e-mail for information if you dont understand something. 

greetings

Geri

  Are you sure? yes | no

agp.cooper wrote 03/07/2019 at 01:11 point

Great computer specification! Perhaps your are aiming a little too high for ~30 TTL chips?

---

Have a look at some of the other TTL designs on Hackaday to get an idea of specifications and chip count. You may be disappointed what others have achieved.

Have a look at the Apollo181 (http://apollo181.wixsite.com/apollo181/index) which has a 65 chip count and uses the 74181 ALU (yuck!) for an example of what can be done in 4 bit.

Its pretty impressive for 65 chips!

---

If you want something simpler (to get started) have a look at the TD4:

1) Breadboard version: https://www.youtube.com/watch?v=e0QCErIIOWA

2) ATMega 328p "ROM" version: https://www.youtube.com/watch?v=tKO3O2UY_7s

3) And a schematic: http://xyama.sakura.ne.jp/hp/4bitCPU_TD4.html

I have built the TD4 and have PCB designs on EasyEDA (https://easyeda.com/search?wd=td4b&indextype=projects), you can get them made and posted to you.

Regards AlanX

  Are you sure? yes | no

roelh wrote 03/06/2019 at 08:18 point

Hi Alastair !  I'm looking forward to your schematics and instruction set....  I have similar plans...

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates