Close
0%
0%

Kobold K2 - RISC TTL Computer

A 16 bit RISC computer with video display, from just a few TTL and memory chips.

Similar projects worth following
The Kobold K2 CPU will be on a single pcb, constructed from TTL IC's.

Its main characteristics are:
- 16 bit processor, 16 bit databus
- 8 registers
- can access one Megabyte of memory
- no microcode
- every instruction executes in two cycles


To make it a complete computer, the K2 CPU will be connected to a mainboard, that will have:

- memory
- video system, 80 x 25 characters text
- video system, full color graphic mode with two layers
- sound
- onboard mass storage 32MByte
- I/O connectors

Constraints are:
- low number of parts (TTL)
- no of-the-shelf processor or microcontroller
- no 74181 ALU

For the CPU part, around 40 TTL IC's will be used.

MOTIVATION

After having worked several months on the first Kobold CPU , I got the feeling that it was going in the wrong direction. I was working on a Javascript assembler, and got tangled up in the microcode complexity. I also didn't like that so many parts were needed to decode the microcode. So I decided to make a huge change in the design. Here is Kobold K2 !

So what will change ?
  - Microcode is not used any more, instructions will be RISC
  - Four new 16-bit data registers in hardware (now total 8 registers)
  - The 8-bit ALU will change to 16-bit ALU
  - All instructions need two cycles (fetch, execute) 

The Kobold K2 will be almost twice as fast, and its operation will be easier to explain.
The video system will stay mostly the same.


STRATEGY

Finding the balance between low number of parts and high functionality is one of the key aspects of TTL CPU design (at least, for me it is). I want to keep the part count low, but not to the extreme as in #1 Square Inch TTL CPU. The CPU part of the computer should fit on a single PCB.

To keep the control system simple, every instruction should execute in a single cycle. If the ALU was kept 8 bits wide, that would mean 2 instructions for many 16-bit actions (as in the Z80 or 6502), and that would slow down 16-bit operations. Therefore, the ALU is now 16 bit wide. I don't want to use the 74181 ALU, so to keep part count reasonable, the ALU has only a few functions. The small number of functions also simplifies control.

The average performance per clockcycle is expected to be higher than that of a 6502 or Z80 and might come close to the performance of a 68000 in several situations. The performance is mainly due to the RISC strategy, fast access to 4 data registers and 4 address registers, and to having everything 16 bit wide.

PCB IMPRESSION

The pcb of the CPU is now (5 oct 2019) routed. It also gives an impression of the various CPU parts (ALU, Registers, Control), see also this log about the PCB.

LOGS

1.  Operation principle

2. Instruction set

3. Addressing modes

4. Instruction sequencing

5. Subroutines

6. Instruction encoding and conditional branching

7. Schematic of the CPU

8. CPU schematic explained

9. PCB impression of the CPU

10, Changing the memory access model

Adobe Portable Document Format - 79.12 kB - 10/20/2019 at 18:54

Preview
Download

kobold K2 20191011.circ

File for Logisim simulation of K2 CPU

circ - 926.71 kB - 10/11/2019 at 19:54

Download

K2 CPU Schematic 20191011.pdf

Schematic for old memory access system

Adobe Portable Document Format - 75.29 kB - 10/11/2019 at 18:18

Preview
Download

  • Changing the memory access model​

    roelh6 days ago 0 comments

    In the past weeks, I did first draw the design in the Logisim simulator. The first few instructions were succesfully simulated. Then I started working on a Javascript assembler-simulator combination.

    While I've been making assemblers in the past, this one proved quite difficult. You'll remember from one of the previous logs, that the instructions can be arranged within 8-instruction blocks in almost arbitrary ways. But the actual sequence becomes important for flow control and optimization of (conditional) jumps. Above that, each 8-instruction block must end with a jump to the following block, and sometimes a slot in a block stays unused because there is a multi-word instruction or instruction sequence that can not be distributed over two blocks.

    The goal was, that the assembly programmer doesn't have to concern himself with the above subtleties, and that the assembler program does all this. Now in a 'normal' assembler, the instructions have very less interaction with each other. That's totally different now. 

    But now that the struggle to do automatic instruction sequencing by the assembler has almost been completed, an inconvenience in the design came to the surface of my mind. That is the memory access system.

    The proposed system has two models, the linear and the object model. This forms a kind of two-dimensional memory system with the A0-A15 address on one axis, and the page-or-displacement value on the other axis. If a language like C would do memory allocation, using the Kobold system, that would mean that memory would be allocated in two dimensions. This would imply a lot of complications.

    So I decided to change to a more common model, while keeping most of the advantages of the 'old' system. The schematic of the old model is still available as version 20191011.

    NEW MEMORY MODEL

    In the new memory model, the object system has become a part of the linear model. It is almost the same as in the Kobold-one.

    The address of an operand is formed by:

    • A0: from address register. Is high for byte-access to the MSB of a word.
    • A1 - A4: from address register, OR'ed with the four displacement bits
    • A5-A15:  from the address register
    • A16-A19: 4 bits from the page register that belongs to the address register

    An instruction can have a 4-bit displacement that is OR'ed to bit A1 - A4 of the above address. The result determines the position of a memory operand.

    The address of the next instruction is constructed as follows:

    • A0 is always zero, because an instruction is a word.
    • A1, A2, A3 come from the NNN bits in the current instruction
    • A4 - A15 come from address register A0 (PC, program counter)
    • A16 - A19 come from the page register of the program counter

    You see that the lowest four bits of the program counter are not used to address the next instruction. That opens the possibility to store a copy of the program page in those four bits. This gives us the same more-than 64K jump capability as in the old model, for instance for return address storage:

    • The PC is moved to a dataregister, and the subroutine stores it in the stack frame
    • At return, the stored address is written to the program counter, and to the page register of the program counter, at the same time! As before, the return address is always in the same instruction slot.
    • So now the page register has correct contents, because that was previously stored in A0 - A3. And A4 - A15 will be used to fetch the instruction, together with the page (A16 - A19).

    And, also, a jump or call is still able to reach all memory positions without a near-or-far mechanism.

    The schematic is now updated. The log Addressing modes was also updated.

  • PCB impression of the CPU

    roelh10/04/2019 at 18:54 0 comments

    A colorful log this time.

    The CPU is completely routed now. I used a mix of automatic and manual routing. 

    I chose a DIN41612 connector for the CPU (3 rows of 32 contacts). The middle row of contacts is not used. 

    The PCB size is 4.0 x 5.575 Inch ( 10.24 x 14.15 cm ). Clicking on the pictures gives a slightly larger image. 

    You will see component placement, top layer, bottom layer and their combination here. Clicking on the pictures gives a slightly larger image.

  • CPU schematic explained

    roelh10/02/2019 at 19:31 0 comments

    This will be a quite long log....  luckily, it will naturally stop when all 42 IC's have been explained...

    This will be a detailed description of the schematic. For a good understanding, first read the first log and the logs that follow it.

    The schematic was split into nine sections, that will be discussed:

    • Instruction register
    • ALU
    • Data and Address registers
    • Shift unit
    • Buffer unit
    • Page registers
    • Control
    • Bytewise memory access
    • Specials

    I want to start with something simple, the ALU, but let's first do the instruction register because allmost every subcircuit is connected to instruction bits.

    [edit: this describes an older version of the schematics, it needs an update]

    INSTRUCTION REGISTER

    The instructions are 16 bits wide. The register gets written during the FETCH cycle. Its outputs are (from top to bottom):

    • N0, N1, N2 hold the slot number of the next instruction
    • Z, that is active for ZPAGE (and vector) addressing
    • P0, P1 select which of the four registers A0 - A3 puts its contents on the address bus
    • R0, R1 select D0-D3 as source register, and D0-D3 or A0-A3 as destination register
    • D0-D3 select the displacement (0-15) that is put on A16-A19 of the address bus
    • A, L, M, S are the instruction opcodes, mainly connected to:
      • A is 0 for an address register destination, 1 for a data register destination. For writing to memory, it selects between writing a word and writing a byte.
      • L is 1 for forcing the first ALU input (signals OP0-OP15) to zero (by disabling the data register output and using pulldown resistors). This will change ADD to MOV and NOR to MOVC (MOV-complement)
      • M selects (when S=0) between ADD and NOR in the ALU, and (when S=1) between register-to-register operation and a move to memory
      • S is 0 for an ALU operation with memory operand, and S is 1 for register-to-register operation or a move to memory.

    Note that during reset, the instruction is forced to 0x0000. The function of the instruction bits and their combinations can also be found in the Instruction encoding log.

    ALU

    The ALU is quite simple. There are two input busses, D0-D15 that normally come from memory, and OP0-OP15 that come from one of the data registers D0-D3. Outputs are BUS0-BUS15.

    At the left side, you see 16 gates that perform the NOR function on the two inputs. In the middle, four HC283 adders will add both inputs. At the right side, a bunch of multiplexers will choose either the NOR or the ADD as a result on the result signals, called BUS0-BUS15. The IR_M signal comes from the instruction register, and provides the selection signal for the multiplexers. 

    Note that the multiplexers that deliver the BUS0-BUS15 signals can be put in high-impedance state with the FN_OE/ signal, and the adder has a carry-in and carry-out, both to be discussed later. 

    DATA AND ADDRESS REGISTERS

    The data and address registers are built with the 74HC670 (that chip is explained HERE). There are 4 data registers and 4 address registers, these are all 16 bit wide. (These are actually latches instead of registers, to be discussed later).

    The input to the registers is shown at their left side. The data bits come from the ALU (they are connected to BUS0-BUS15), and the R0, R1 bits determine which of the four registers in the IC gets written (when DATA_WE/ or ADDR_WE/ is low).

    The output of the data registers is connected to the OP0-OP15 bus. One of the four registers is selected with the same R0, R1 signals. Note that the IR_L bit, coming from the instruction register, can disable the register output. The outputs will then be pulled low by pulldown registers (on another part of the schematic). Since the OP0-OP15 bus is one of the inputs to the ALU, this will change an ADD instruction to a MOV and a NOR instruction to MOVC (MOV-complement).

    The output of the address registers is selected by P0 and P1 coming from the control section, but when ZPage addressing is selected, this will always be A1 (the workspace pointer WP),...

    Read more »

  • Schematic of the CPU

    roelh09/18/2019 at 19:34 0 comments

    What you have all been waiting for....

    The first version of the schematic for the CPU was just uploaded to the file section !

    A few things were added:

    • Provision for reading and writing bytes from/to memory. Address bit A0 will select low or high byte (as with most processors). Writing bytes will need software assistance. Single-cycle reading and writing of 16-bit words is of course still possible.
    • The 64-word Zero page has got companion of another 64-word zero page. The second zero page is called the vector page, and is automatically selected whenever zero-page contents is written to the PC or D0 register. This is ment to be used for frequently accessed subroutines. A call can get its address from the zero page, saving space for an immediate value in the program code. The D0 register can also be used to store a value in the vector page.

    The connectors of the CPU will probably change.

    Explanation of schematic will be done in a next log.

    Number of chips grew slightly above 40, it is now 43. 

    [ edit: schematic updated today, 20191020 ]

  • Instruction encoding and conditional branching

    roelh09/17/2019 at 18:24 0 comments

    At the moment, this is the instruction encoding:

    You get a bigger version when you click on it.

    An instruction is 16 bits. The upper 8 bits are in the left part of the picture, and the lower 8 bits are in the right part. The upper 4 bits control almost directly the data path, and the other bits set displacement and register numbers.

    CONDITIONAL BRANCHING

    The CPU only supports branching on carry set. To be more precise, EVERY ADD instruction will do a branch on carry.

    The branch must be within the same instruction block, and the branch target is the next slot as defined in NNN, but with the lowest bit set to one.

    So what if we don't want to branch at all ? Then we put the next instruction in a slot with an odd number. If now the carry gets set, it will make the lowest bit of the next slot number one. But it is already one, so it will go to the same instruction, whatever the carry bit may be.

    And what if we want to branch on NO-carry ? We simple change the positions of 'normal' next instruction and the jump target. This is possible, because instructions can be in any position within a block.

    Note that jump-on-carry can also be used to test for zero. Just add 0xFFFF to the value that you want to test. If there is a carry, the value was not zero. The constant 0xFFFF will not cost us space for an immediate variable if we put it in the zero page space.

  • Subroutines

    roelh09/15/2019 at 10:24 0 comments

    This is the instruction sequence for subroutines:
    • In slot 2, save the program counter to the D3 register
    • In slot 3, load the program counter with the subroutine address (found in slot6, shown in GREEN)
    • The subroutine is executed now. The first instruction saves the return address, that was stored in D3, in the workspace at slot 15.
    • When the subroutine has finished, the program counter is loaded with the saved program counter and execution continues at slot 7 in the main program.
    • Back in the main program, instruction 4 is executed, followed by instruction 5 and then a jump to the next block. 

    By convention, the slot to return to is always slot 7.

    In most cases (but not in this example), the subroutine will also create its own "stack frame" by using a new value for the workspace pointer WP. This makes nested and recursive functions easy.

    In this example, the call instruction is in slot 2 and 3. But it can of course also be placed in other slots. Note that this mechanism utilizes the possibility to put instructions in any order that you want. 

    ADDRESSING WITHIN 256K RETURN LOCATIONS WITH ONLY 16 BITS 

    The program counter has 16 bits, where bit zero is always 0 because instructions are only on even addresses. Each value in the PC addresses a sequence of 8 instructions. So the maximum number of instructions that can be addressed is 262144. 

    For the return address, only 16 bits have to be stored. The drawback is that within an 8-instruction block there can only be one subroutine call, because the return point is always in slot 7. 

    So there is no "near or far" madness in CALL and RETURN instructions. Return addresses are just a single 16 bit word (instead of two 16 bit words). This makes calling and jumping faster and saves memory space. 

  • Instruction sequencing

    roelh09/14/2019 at 20:32 0 comments

    WARNING: The addressing and sequencing of instructions in Kobold K2 is rather unusual.

    In the previous log I stated that the program counter uses the object model. A simple picture will make this clear:

    This shows how the program counter points to a block of eight instructions (This could be a block of sixteen instructions, but for various reasons eight is used). We call these eight instruction positions 'slots'.

    Every instruction is 16 bits wide.

    All the instructions will be executed, and then the CPU will continue with a next block of instructions. How ?

    Within every instruction, the lowest three bits contain the slot number of the following instruction that must be executed (indicated in RED). It acts as a kind of counter. In the seventh slot, the slot number of the following instruction is 0 again. But the instruction in that slot increments the PC, so  the CPU will continue with the next block of instructions. The PC-increment is just a regular CPU instruction. No extra hardware needed for that. (Note that the PC must be incremented by 2 each time, because instructions are at word addresses.)

    This makes it easy to support immediate operands. The instruction that uses an immediate operand, specifies the PC as pointer and uses the slot number (here: 7)  as displacement:

    Several instructions in the same block can have an immediate operand, as long as you use a different slot number for every immediate value.

    Get used to all the tricks that can be done ! For instance, there is no need to put the instructions in sequence:

    In this example, the first four instructions are placed in the first four slot numbers. But at instruction 3, the instruction tells us that the following instruction is in slot 7 ! 

    So instruction number four is in slot 7. The following instructions are placed in preceding slots. The last one is placed in slot 4.

    Actually, you can place instructions in any order that you want, as long as each instruction points to the next one. (But by convention, the first instruction is in slot 0.)

  • Addressing modes

    roelh09/14/2019 at 18:21 2 comments

    This 16 bit processor has a 20 bit address bus. How is this address generated ?

    Around 40 years ago, developers of the Intel 8086 were facing the same problem. This time, we will use an easier solution. But I doubt if my solution will be more succesful.

    In the Kobold K2, memory can be accessed with the following addressing modes:

    ADDRESS REGISTER INDIRECT

    Each of the four address registers has its own 4-bit page register. A page register can be written with a MOVP instruction. 

    The memory address consists of bit 0-15 coming from the address register and bit 16-19 coming from the corresponding page register.

    In instructions that use this mode, the displacement should be set to zero.

    INDIRECT WITH DISPLACEMENT

    This is considered the main addressing mode. In this mode, the 4-bit displacement value in the instruction is added to bit 1 - 4 of the address (actually it is not added but XOR'ed) .

    In C terms, the address register can contain a pointer to many different structure instances. Each structure has a maximum of 16 word-sized members. The instruction can specify which member is addressed. This supports the "->" operator in a single instruction.

    Register A1 is intended to be used as 'workspace pointer', pointing to a set of 16 locations that can be used as local variables in a function. When a function is called, the workspace pointer can be set to a new value to get a fresh set of variables, so it is not needed to push the old ones on a stack one by one.

    As you can see in the picture, there can be seven W bits to define the workspace so 128 sets of registers are available.

    ZERO PAGE ADDRESSING

    There is also absolute addressing. Only short addresses, that are a part of the 16-bit instruction, are supported. It is called Zero page addressing because the upper part of the address is always zero. The instruction delivers the bits PPDDDD, for a range of 64 locations. When the destination of the instruction is A0 or D0, the V bit will be set, and that will supply another range of 64 locations. Note that this mode also uses the value of several workspace pointer bits.

    WORD OR BYTE ACCESS

    The K2 is designed as a 16 bit processor that reads or writes 16 bits from/to memory at the same time.

    In order not to exclude languages like C, support for 8-bit characters was added. Therefore, the K2 can address bytes or words in memory. The address in bit 0-15 is a byte address, so for accessing words, addressbit A0 is always zero.

    For 8-bit instructions that read or write memory, the A0 bit determines if the low or high byte in memory is used. The K2 is little-endian.

    However, to keep component count reasonable, a little software effort will be needed to read or write bytes:

    • When reading a byte, the low byte will always be the requested byte, but the high byte will in most cases not be zero. To be more precise, for memory-read actions there is no difference for byte- or word instructions. When A0=0, reading the low byte is done in exactly the same way as reading a word. when A0=1, the high byte is copied to the low byte but the high byte is not set to zero. So it might be needed to AND the result with 0x00FF. Since there is no AND instruction, read it into a data register with MOVC (move complement) and then do NOR 0xFF00 to get the same result.
    • There is a special MOVB instruction to write a byte to memory.  The hardware will write only the low or high byte, depending on A0. But when writing, a word should be written with the high byte being equal to the low byte. This could be done with a simple look-up table with 256 locations. 

    Note that the processor registers will always contain words.

  • Instruction set

    roelh09/13/2019 at 13:46 0 comments

    This is the envisioned instruction set. It is not complete, several instructions have to be added.

    Several things that are possible, are not in the overview. For instance, there is no INC for a data register but it can be incremented by adding #1 to it. Logical OR and AND are possible by combination of CPL and NOR instructions. 

    For subtracting, one of the operands must be complemented (CPL) and then incremented (INC) to obtain the 2-complement, and then both operands must be added (ADD). There is a ADDI (add and increment) available, so the INC and ADD can be a single instruction.

    A jump is a MOV to the program counter (A0). To do a call, you must store the programcounter (in a data register) and then do a jump (exact call system to be discussed later).

  • Operation principle

    roelh09/12/2019 at 09:40 0 comments

    CPU BLOCK DIAGRAM

    Main parts are:

    • 4 data registers D0 - D3 (16 bit)
    • 4 address registers  A0 - A3 (20 bit)
    • 16 bit ALU that can do only ADD, MOV and NOR
    • shift unit that can shift one position to the right or to the left
    • instruction register
    • Single memory for program and data

    Every instruction needs two cycles:

    EXECUTE CYCLE

    This will let the ALU calculate a new value and put this in the shift unit. Or it will store a data register in memory.

    The inputs for the ALU are:

      • a data register and a memory operand, or
      • a data register and an address register

    For memory operands, the address comes from an address register and displacement, or it is a zero-page address.  Immediate operands can be selected by using the program counter as address register (and using a displacement).

    FETCH CYCLE

    The contents of the shift unit is transferred to the destination register. The contents can be either shifted or unshifted. The PC (address register A0) is connected to the memory address. The next instruction is fetched from memory and is put in the  instruction register.

    Incrementing the PC will be discussed later.

    DATA FLOW FOR MAIN INSTRUCTION TYPES

    This shows how data from memory is added to a data register. 

    The ALU can also do a MOV or NOR operation. By making combinations, the following functions can be obtained:

    • LOAD: The ALU can transfer the memory data to a register without change.
    • CPL: A register can be complemented by NOR'ing it with the value #0.
    • SUBTRACT: Complement one of the operands. Then do the ADD and finally ADD #1.
    • OR: Do a NOR followed by complement-register
    • AND: First complement both operands, then do NOR 

    The same instructions can be done when the operand comes from an address register instead of from memory. The displacement value (part of instruction word) is not used here. This opens the possibility to use these bits to enable other functions, like shifting or add-with-carry. The register mode can be used to increment an address register, or move address register contents to a data register, or to add a data register to an address register, or add an address register to a data register.

    Finally, storing a data register is straightforward. There are instructions for store-word and store-byte. An address register can be stored by first moving it to a data register.

    Several topics will be discussed later:

    • incrementing the PC
    • conditional branches
    • carry handling
    • subroutines
    • loading the upper 4 bits of address registers

    I can already uncover a bit more by showing my nice drawing:

View all 10 project logs

Enjoy this project?

Share

Discussions

monsonite wrote 10/08/2019 at 10:06 point

Roelh - I like your minimum ALU and the use of the 74xx670 register file. I'm working on something similar, but using a 4-bit wide, bitslice approach in order to keep the logic layout and pcb design simpler. Are you proposing a 12.5MHz clock to keep things synchronous with a VGA output?   I'm looking forward to hearing of your progress.

  Are you sure? yes | no

roelh wrote 10/08/2019 at 10:34 point

Monsonite, I saw your postings on Anycpu/forum. For a CPU there are endless design possibilities...  I'm curious what you wil come up with. Yes, clock is synchronous with VGA. And thanks for for introducing Kobold on Anycpu: http://anycpu.org/forum/viewtopic.php?f=23&t=623

  Are you sure? yes | no

monsonite wrote 10/08/2019 at 12:37 point

Roelh - I was intrigued by Kobold-1, now there are so many new ideas in Kobold-2. I have spent the morning reading the project logs so that I now have a better idea of your design, and how it works.  BTW - now that you can get very cheap 4 layer pcbs from China, I would recommend the use of separate power and ground planes.  This will improve your signal integrity, with faster edges, and reduce signal distortion and noise. As well as providing a very low impedance ground plane, you get much better power distribution and can eliminate the overhead of the wide power distribution traces on the signal layers.  The slightly increased cost will be justified by much improved performance.

  Are you sure? yes | no

threeme3 wrote 09/14/2019 at 10:38 point

Roel, very interesting development again. Just curious about the PC increment, looking to your drawing something very smart is happening there I think. Just guessing, is it that during the fetch cycle the current PC is incremented by the ALU (with one one of the values in the data registers) and  written back in A0?

  Are you sure? yes | no

roelh wrote 09/14/2019 at 11:14 point

The PC increment system is very unusual. I will soon write a log about it.

  Are you sure? yes | no

Dave's Dev Lab wrote 09/14/2019 at 02:10 point

how are you planning to implement the VGA support?

  Are you sure? yes | no

roelh wrote 09/14/2019 at 07:28 point

This will be similar to the first Kobold. But interrupts are difficult in the new design, so I plan to use a DMA system where the video system stops the CPU to obtain access to the shared RAM. So the CPU will only run during blanking time.

  Are you sure? yes | no

Chase Rayfield wrote 09/18/2019 at 01:39 point

How about double clocking the ram and interleaving CPU and VIDEO accesses? I guess it depends on how fast the system clock is and how fast you ram is... even if only part of your ram used faster chips this might still make sense.

  Are you sure? yes | no

roelh wrote 09/18/2019 at 07:30 point

Hi Chase, video will have to read two 8-bit pixels from memory every 80nS, and I don't think I will succeed making a 40nS cycle for video and a 40nS cycle for the CPU. But the CPU could run almost continuously if video got its own independent memory, or if only characters are read and an independent character ROM is used.

  Are you sure? yes | no

Dan Maloney wrote 09/12/2019 at 14:54 point

Love these discrete chip CPU builds, especially TTL - I cut my teeth on those chips. Looking forward to seeing more progress!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates