Close
0%
0%

Kobold - retro TTL computer

A 16 bit computer with 20 address bits and video display, from just a few TTL and memory chips. Instructions resemble 68000 / PDP-11.

Similar projects worth following
After having designed the one square inch ttl cpu, the moment comes that there is a new
minimal-parts computer that wants to be designed. The Kobold computer has chosen me to design him.

Its main characteristics are:
- fast 16 bit processor, 16 bit bus
- can access one Megabyte of memory
- 80 x 25 characters text, keyboard input
- full color mode with sprites
- sound
- onboard mass storage 32MByte
- I/O connectors with Arduino layout

Constraints are:
- low number of parts (TTL)
- no of-the-shelf processor or microcontroller
- no 74181 ALU

The drawing shows how the Kobold CPU (in SMD version) could be plugged into the mainboard of the Kobold computer system.


The end result will be available to everyone, either as completed product or as a kit.

PARTS

It will consist of the following parts: 

  • processor in the spirit of the #1 Square Inch TTL CPU  
  • a simple ALU (described here)
  • some address registers (described here)
  • at least 256 KByte RAM 
  • flash-ROM for booting 
  • VGA video
  • PS/2 keyboard input 
  • user-accessible I/O pins (footprint of Arduino shield)
  • a sound system
  • serial flash (16 or 32 MByte) for storing all your programs. 

VIDEO MODES

  • Mode 1,  640 x 480 bitmapped pixels, color. Gives at least 80 x 25 text capability.
  • Mode 2, 320 x 240 (or 320 x 480) bitmapped pixels, color, dual layer.

Mode 1 is described in this log.

Mode 2 has two layers, (foreground and background), enabling sprites (see this log).

Video mode 1 or 2 can be selected per line.

Support for fast vertical scrolling. 

number of colors: 128

At this moment, the idea is that the CPU and the video part will operate quite independent. The CPU might even be a separate pcb that is placed on top of a "motherboard", that has the memory, video and other I/O.

INSTRUCTION SET

The instructions will be implemented in microcode (defined HERE). There will be two main instruction sets:

  • The native 16-bit Kobold instruction set has highest execution speed and very good code density. The instructions are similar to 68000 and PDP-11 instructions and can handle 8, 16 or 32 bit data size. The instructions can access 1 MByte of memory.
  • The K11 instruction set will be binary compatible with the PDP-11 instructions. It is kept in mind, but will only be given full attention after most other goals are reached. It might become a separate project.

The microcode can be easily reprogrammed with a Raspberry Pi as programmer (just as with the 1 Square Inch TTL CPU ). The microcode Flash is large enough to accomodate a lot of microcode.

The CPU will probably be around 35 TTL parts and a microprogram Flash.

PROGRAMMING

A Raspberry Pi can be connected, that can read or write files from or to the serial flash of the Kobold.

When enough supporting software is in place, the programs for Kobold can be developed on the device itself, perhaps first in BASIC and later in C. 

PLAN

  • design the CPU
  • pcb design of CPU with thru-hole components
  • design the motherboard
  • pcb design of motherboard
  • some simulations
  • order boards, and assembly
  • hw fault finding
  • build an assembler and programming software
  • pcb design of a small Kobold CPU in SMD version
  • build or adapt a C compiler
  • make system software. Might adapt an OS.

This is a work in progress, the logs will show the design steps....

Adobe Portable Document Format - 75.87 kB - 07/23/2019 at 13:32

Preview
Download

Adobe Portable Document Format - 65.41 kB - 07/23/2019 at 13:31

Preview
Download

ms-excel - 33.00 kB - 06/29/2019 at 15:14

Download

kobold.circ

File to use with LOGISIM for simulating the micro-instructions. Version 20190610.

- 819.93 kB - 05/26/2019 at 19:59

Download

  • RESET Sequence

    roelh07/15/2019 at 13:20 0 comments

    There was just one thing that was known to be not right in the cpu design. That was the RESET function.

    The CPU receives an active-low RESET signal from the main board. It is activated at power-on and when the reset button is pressed.

    The reset signal will reset the UPC (micro-PC), so after a reset, the microcode will be executed from the microprogram store starting from address zero. At address zero, microcode will be present that sets the PC to a starting position (it might also do something more to start the CPU).

    So far so good.

    But the CPU has branch instructions. For forward branches, the 16-bit instruction code is 0000 0000 xxxx xxx0. An 8-bit addition to the PC would be everything that is needed, but in this architecture the lower 16 bits of the PC must be updated all at once (and we also like to have a carry from lsb to msb). The simplest microcode sequence will simply add the full 16-bit instruction code to the PC. This is only 4 cycles of microcode:  

    0000  add r,al,pc  ; lsb of sum to r
    0001  add a,ah,pc  ; msb of sum to b, lsb from r to b
    0002  ld pc,b; move to pc
    0003  ld ayu,(pc); go to next instruction

    Not all instruction bits are connected to the microcode storage: the three p and three q bits, that can select two registers for the instruction, are not connected. Instruction structure is: 

    rr--  -qqq  ----  ppp-

    Where only the dashes and r bits are connected to the microstore address.

    You see the problem here: The microcode can not distinguish a reset from a branch with a distance of 00001110 or less !

    We want to fix this, without adding extra parts to the design (the design has already drifted away from the minimal-parts goal).

    A solution was found. We use one of the condition bits, that are also connected to the address of the microprogram. There are three condition bits:

    •  carry
    •  input
    •  interrupt

    The input bit comes from a flipflop that is loaded from a multiplexer on the main board (that can connect the input signal to several sources). For instructions in group 0 (the instruction group of the forward branch), the input flipflop by default gets the value of address line A0 when an ALU instruction is executed (The microcode uses this to see if a byte-memory access is for the higher or lower 8 bits in a 16-bit word.).

    We use this input flipflop to solve the problem. In the branch instruction, both ADD instructions will have A0 low (because the PC is even and the jump distance also), so the input bit will be low.

    When a new instruction is loaded, we will reset the input flipflop. It will stay low during the branch (as just explained).

    But during system reset, the flipflop will be preset to one. The ADD instructions in the branch will be made conditional, to execute only when the input flipflop is low, so there are no ALU operations done, and the flipflop stays one.

    Now, the final 'go to next instruction' will be made conditional, to only execute when the flipflop is low, So a branch will be executed. But when the flipflop is one, 'go to next instruction' will not be executed, and the following microinstructions will do the reset sequence.

    Sidenote: The 'go to next instruction' was already conditional, because this is the moment in every instruction that the interrupt signal will be checked, and special action will be taken if the interrupt is active. In our case of the branch instruction, this means that it will become a kind of three-way jump (during reset, a interrupt can not occur).

    The schematics and other logs will soon be updated with this change [done 20190723, also updated schematics].

  • Main board schematic

    roelh06/15/2019 at 20:17 0 comments

    Today I finished the schematic of the main computer board (the first version, I guess...). You can find it in the file section, and the most important parts will be discussed here.

    SYSTEM INPUTS AND OUTPUTS

    The 24 system outputs can be directly controlled by microcode instructions. [updated 20190723] They are used for:

    • video control
    • SPI bus
    • selecting shift modes for the B register
    • enabling upper- or lower byte of video- or main memory
    • selecting a system input, together with OP6 (instruction bit).

    The system inputs can select from 8 input signals. Normally, IN_SEL0 and IN_SEL1 are zero, and the instruction bit OP6 selects between NON-ZERO condition or D0. So, instructions that check for zero must have bit 6 one. Instructions that check the lowest address bit A0, for enabling high or low byte, must have bit 6 zero.

    For selecting keyboard or mouse signals, IN_SEL0 and IN_SEL1 must be written first. It is expected that polling these signals in the video-line interrupt will be just (barely) fast enough to read them.

    VIDEO ADDRESS GENERATION

    Next comes the video address generation. There are two situations:

    1. The video system is in control and reads the pixel data from memory. CPU_ACCESS/ is not active, VDU_ACCESS/ is active (low). Address bits A1 - A8 come from the pixel counter (changing every 160nS). Address bit A0 comes from the VDU_A1 signal from the timing generator (changing every 80nS). Address bits A9-A17 come from the 9-bit register formed by U4 and U19B. The NAND gate will generate an interrupt for the CPU when the count is 160 (128 + 32). The interrupt will end when the count is 192 (128+64). The end of the interrupt is important, because that is used by the CPU to exactly synchronize to the video signal, in order to generate exact horizontal sync signals and resetting the pixel counter on time.
    2. The CPU has control and can write pixel data to the video memory. CPU_ACCESS/ is active, VDU_ACCESS/ is not active. The output of the pixel generator is now disabled, and the address bits A0-A8 are now delivered by the buffers U6 and U17A. Note that the pixel counter keeps running. The video memory output is disabled.

    VIDEO DATA HANDLING

    When the VDU is in control (pixels read from memory), two pixels are read every 80nS and delivered in the registers U9 and U10. Seven outputs of the registers are connected to the resistor-based D/A converters that generate the voltages for the RGB signals. The timing section will determine if U9 or U10 delivers its data to the D/A converters. 

    • In HIRES mode, both pixels will be sequenced, so each pixel will be visible for 40 nS.
    • In DUAL LAYER mode, bit 7 of pixel two determines which pixel will be displayed during this 80 nS cycle.

    When the CPU is in control, it can write its data through the buffers U7 and U8 to the memory. The control section will in this case not enable the PIXEL_CLK, so the video output keeps displaying the same pixel (black).

    TIMING SECTION

    The timing source is at the upper right. This should be a 25.175MHz oscillator, but that is only available in 3V3 version, so I took a 25MHz 5V oscillator and hope the monitor will handle it.

    The two flipflops below the oscillator divide the clock down to 12.5 and 6.25 MHz. The NAND gate generates the 6.25 MHz CPU clock. It is only active (low)  1/4 of the time, this is needed because the CPU contains some latches instead of flipflops, and we must be sure that the correct latch is selected before the clock goes low. This asymmetric clock gives more time to select the correct latch.

    The two 8-bit registers control the video section. The CPU can write to the video memory when both are disabled. Only one should be enabled at a time.

    When the upper register is enabled, control signals for HIRES mode are generated.

    When the lower register is enabled, control signals for DUAL...

    Read more »

  • Instruction set !

    roelh06/10/2019 at 08:32 0 comments

    This is a first design for the instruction set. In the final computer, the instruction set could be changed any moment by putting a new microcode in the Flash microstore.

    This instruction set borrows from the 68000, PDP-11 and SPARC !

    Here are the registers:

    The register set R0 - R7 has general registers, they can be used as data or address registers, comparable to the registers in the PDP-11, but in this case the registers can also contain 32 bit data. They are located in memory. There can be many of these register sets, and the WP register points to the current register set. A called function does not have to push registers, it simply changes WP to get a fresh set of registers.

    The registers A0 - A7 are global address registers, located at a fixed position in memory. Each of these registers can point to a structure (of 8 words max). The elements of such a structure can be read or written with a single instruction.

    All instructions are 16 bit, as follows:

    And the overview of instructions:

    A few remarks:

    • All addressing is within 32KWords, except for the "MOV far" instructions that address the full 1 MByte.
    • CALLS, JUMPs and RET are within 32KWord for word size, and use full 1 MByte destination for long size.
    • On this level, there are no flags. Conditionals are done with compare-and-skip instructions. The skipped instruction can of course be a branch (with 7 bit value). Skipped instructions should not have immediate values following them, of course.
    • Branches forward have the upper 8 bits zero, and branches backward have the upper 8 bits ones. A branch is done by simply adding the 16-bit instruction to the PC. 
    • Return instructions include a function-return value (an 8-bit immediate or any other value).
    • Moving an immediate value to memory or adding a register directly to memory is supported. For 4-bit values, this moving is a single 16 bit instruction that includes the 4-bit value.
    • The "indexed" addressing mode will add two registers to calculate a source or destination address, then do the requested operation, all in a single 16 bit instruction.
    • In byte-size mode, The (R) or (R+) addressing modes access bytes in the lower or upper half of a memory word (depending on the lowest address bit), to access arrays of bytes.
    • There are no memory-to-memory instructions (as in the 68000 and PDP-11). These might be emulated by letting the assembler generate two instructions.

    It is expected that the "pointer + displacement" address mode will be used a lot. Note that the displacement is OR-ed to the address, so the address needs to be proper aligned. The following picture illustrates this addressing method

    This is a rather ambitious instruction set. Perhaps it will be simplified if it is too much work to implement.

    I am open to suggestions !

    PS Several decoding details changed today (20190629). 

  • Have Sprites !

    roelh06/06/2019 at 11:30 0 comments

    The current plan is that Kobold has a bitmapped, multicolor display. It has 640 visible pixels per line, with 256 colors per pixel.

    But we want more.

    Suppose we want to program a game, where the hero (let's call him Mario) has to move in front of a background picture. Making the hero move involves a lot of software actions:

    1. to remove him from the current position, redraw the background at the hero position
    2. the hero is represented by a rectangular bitmap where several positions have 'transparent' color. The transparent positions must be filled with correct background pixels of the new position
    3. the result of step 2 must be copied to the screen memory at the new position.

    The early video games had a nice solution to this. At the positions that the hero could reach, the background was mostly black. This makes it really easy to remove the hero (just overwrite with zeros) and to place it at a new position. To make this realistic, the games were in a dark setting (in a cave or in outer space). Examples are:

    • Nodes of Yesod
    • Prince of Persia  (Apple II 6502 source code available on the www)

    For Kobold, we want to move Mario in front of a real background, without too many software actions.

    This can be done with Sprites (Wikipedia), objects that can be shown on screen, appearing in front of the background. They can be moved without a lot of software work. 

    This is a picture of a sprite in the ancient TMS9918 video processor (used in TI-99/4A and MSX home computers):

    The spirit of the Kobold project does not allow to use a TMS9918 or other special video control chip. Simple TTL chips should be used. But real sprites will need a lot of TTL, also against the Kobold philosophy.

    A solution (or call it compromise) has been found. There will be two modes (The mode can be set differently for each scan line):

    1. Hires mode, 640 pixels per line. Each memory word in video memory has two consecutive 8-bit (256 color) pixels.
    2. Dual layer mode, 320 pixels per line. Each memory word in video memory has two 8-bit pixels, one for the background layer and one for the foreground layer.

    In dual layer mode, there will be a special bit or bit combination in the foreground pixel byte meaning 'transparent'. If a foreground pixel is transparent, the corresponding background pixel will be shown. If it is not transparent, the foreground pixel will be shown.

    Although moving the foreground objects still involves software, moving Mario is simply painting a rectangle at old foreground position with 'transparent', and copying the hero's bitmap to the new foreground position. 

    Perhaps only a single extra TTL chip is needed for this dual layer mode. But I also have to increase the number of available control signals... I found space in the microinstructions to control 8 more output bits, now having 24 of them, and updated the description of the microinstructions.

    Writing pixels to video memory will be quite fast because it will be supported by special microcode.  

  • CPU schematic complete

    roelh05/31/2019 at 16:59 2 comments

    A new option was added to the microinstructions, to enable powerful instructions of the form:

    MOV 6(R4),R2

    (R4 points to a struct. The contents of the 6th field of the struct is copied to R2.) A previous log was updated to show this new microinstructions.

    The CPU schematic is now thought to be complete, the new version is in the files section now. The pcb design is also complete:

    And here is a picture of the traces (Clicking on a picture will give more details) :

    Dimensions of the pcb are 5.2 x 4 inch. 

    The work will now continue with designing a first version of the main board.

  • Microcode simulation

    roelh05/26/2019 at 20:15 0 comments

    This weekend I simulated the microcode in Logisim (V2.7.1). A few things didn't work, so schematic changed a little. The files in the file section are updated.

    Some things were added in microcode:

    • Micro-instruction to clear Y register (overlaps with bit set/clr). For easy access to a fixed region (at address zero), for constants or for variables for the video interrupt.
    • Micro-instruction to clear upper byte of B register (overlap with other bit set/clr). Now we can move an immediate byte to a word register (Rn) with the upper half of the word set to zero, in a single 16 bit instruction.

    The microinstructions are simple, but a sequence of them will be quite capable. It will be possible to add the 'long' instruction type to our PDP-11 lookalike instructions, so 32 bit MOV, ADD (and more) can be handled ! 

  • New Microcode instructions !

    roelh05/21/2019 at 09:45 1 comment

    The past weeks were spent designing the CPU schematic, and optimizing the available microcode instructions. The CPU schematic is now in the file section.

    As a reminder, here are the hardware registers:

    The registers A and B are always loaded at the same time (with the same data). The B register contents can be stored in memory (and can also be shifted left or right). The A register is connected to the input of the ALU.

    The UPC (micro-programcounter) register will contain the 'user' instruction. Two 3-bit fields <ppp> and <qqq> can be used to select a source- and destination register with base address WP. The 10 remaining bits select an address in the microcode storage. Two of the remaining bits (the <rr> bits) can also select a displacement for memory access with pointer X or Y.

    There are 3 classes of microcode instructions. The first one is register load/store:

    The user registers R0-R7 are addressed as (WP+displacement). For the displacement, the <ppp> or <qqq> register field in UPC can be chosen, these will be at addresses 0,4,6,8,A,C,E. The microinstruction has the option to add 2 to the displacement. This makes it possible to address 32 bit data (in memory or WP-based registers) in two 16-bit chunks, or address 16 word-sized registers instead of 8.

    Note that when the instruction is loaded  into UPC, it is also loaded in AB and Y. Why ? When the instruction contains immediate data, it is directly available to the ALU to do calculations. If it is an immediate byte load, the byte is already in the B register and can directly be moved to the destination.

    The destination register can not be the same as the pointer because pointer register is a latch and not a edge-clocked flipflop.

    The RL register is used to temporary store the result of the ALU for low bytes. When the ALU calculates the high byte, the high byte will be combined with the low byte that was stored in RL, and the 16 bit result will be stored at the destination.

    The PC can also be input to the ALU. This is useful for adding a constant to the PC (branch) or for obtaining the return address when doing a CALL instruction.

    All microinstructions can be executed conditionally depending on carry, zero or other conditions (Actually, at each microstep a certain microinstruction is selected depending on conditions).

    There are 24 output signals that can be individually set or cleared by microcode. These are used for:

    • selecting the condition for conditional execution
    • put the B register in a mode for shift left or shift right
    • write only to upper or lower half of a memory word (for byte writes)
    • video signals, like pixel count reset, sync outputs, video mode
    • several other I/O

    [edit: instruction codes updated 20190629]

  • First PCB routing

    roelh04/30/2019 at 19:56 0 comments

    Just a quick test to see how it would fit on a PCB and if it would be difficult to route. This is the CPU only. It is connected to the mainboard by the two 2 x 18 headers. Bypass caps are not yet placed.

    Routing was easy, the autorouter did it within a minute ! That is, after experimenting with placement a few hours. The size is about 100 x 150 mm.

  • Schematic of the new ALU

    roelh04/27/2019 at 18:53 0 comments

    The biggest parts of the schematic of Kobold are almost completed. Here you see:

    • Register B, that holds 16 bit data, and can shift the data left or right, and put it back on the databus
    • Register A, that holds 16 bit data and can put a single 8-bit chunk on the address bus. The other 8 bits of the address bus will be provided by an address register (not shown here). During ALU operation, the address bus will not be used to provide an address to memory.
    • The logic unit performs a NAND on the two 8-bit halves of the address bus
    • The Adder adds the two 8-bit halves of the address bus
    • The function selector selects the NAND or the ADD as result. Microcode bit IR4 is used to select the function.
    • Finally, the 8-bit result goes to the lower or upper 8 bits of the result register
    • The result register can put the 16-bit result on the databus

  • Plan for video generation

    roelh04/26/2019 at 19:05 0 comments

    Trying to have the hardware simple, and keeping in mind that 'large' SRAMs are easy to obtain, the choice was made to have just a single basic video mode.

    That is, 640 x 480 pixels, with 256 colors per pixel. The pixel time for VGA is 40nSec.

    The hardware needs assistance from software in order to operate correctly.

    The pixel counter will increment every 80nS, so the video RAM will deliver two new pixel values every 80nS. The 8 bit color values go to the 'first pixel' and 'second pixel' registers. During the next 80nS, each of the outputs of these registers will be enabled for 40 nS.

    When the pixel counter has reached a certain value, an interrupt will be given to the CPU. In the interrupt, the CPU will:

    • obtain exact synchronization with the pixel counter
    • when the pixel counter is exact 199, reset the pixel counter to zero, to obtain 32 uSec line duration
    • start the horizontal sync signal
    • obtain access to the video RAM, disabling the pixel counter and the updating of the two pixel color registers, and enabling the pixel(X) register. It can now, after setting the correct pixel address in the line and pixel registers, write new data to the video RAM. The line and pixel registers may be latches, so the data can be written in the same cycle together with the address. Several locations may be written, depending on the available time during blanking.
    • set the line (Y) register to the starting point of data for the next line
    • when the end of a frame has been reached, do functions for frame synchronisation
    • stop the access to video RAM, and stop the horizontal sync signal
    • end the interrupt

    The sequence of actions might be a little bit different than listed here. 

    The actions might be done by microcode, and the 16 bit processor can deliver 16 bit at a time, so this will be fast enough for most operations. If there are applications (games) that need higher speed, resolution could be dropped to 320 x 240 (TBD to be determined). [ edit: Sprites were added to the video system, see here]

    Since the starting point of a line is under software control, it will be easy to do fast vertical scrolling. A clear screen will also go fast, because only a single cleared line has to be present, and all other lines can point to the same cleared line.

View all 18 project logs

Enjoy this project?

Share

Discussions

Marcel van Kervinck wrote 2 days ago point

Flipping through my notes of more than 2 years ago, today I discovered that we *did* briefly consider using a 16-bit register file with tri-state outputs: the 74172 !! But then we found a remark about its extinction on http://www.6502.org/users/dieter/tarch/tarch_2.htm and we moved in a different direction. It never occurred to me there might be later 74 numbers with a (somewhat) similar function. Cool find! While I now like that the resulting Gigatron has no state hidden in the TTL chips (everything can be probed), this is a bit subjective and the Kobold looks pretty impressive as well. Looking forward to its operation.

  Are you sure? yes | no

roelh wrote 14 hours ago point

When selecting parts, I normally start at a big distributor like Mouser. If they dont list the part or have no stock, it's a no-go. If the part is cheap and they have thousands on stock, its a green flag. Unfortunately they are quite expensive for very simple things, like connectors.

  Are you sure? yes | no

Marcel van Kervinck wrote 8 hours ago point

Yeah. I wasn't aware of Mouser and friends at the time. I had no real idea of what I was getting into.

  Are you sure? yes | no

Marcel van Kervinck wrote 04/28/2019 at 12:21 point

I like where the Kobold design is going. I wasn't aware of the 74LS670 register file, and I'm positively surprised by its availability. Although it has few pins, I think I would have considered it too complex for the Gigatron objective had I been aware of it. It has a lot more transistors than the '181 for example. But not many pins, so that's cute.

  Are you sure? yes | no

bobricius wrote 04/14/2019 at 17:07 point

I am exited with this project.

  Are you sure? yes | no

roelh wrote 04/10/2019 at 12:48 point

Yes, you made it famous, thank you !

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates