Close
0%
0%

Kobold K2 - RISC TTL Computer

A 16 bit RISC computer with video display, from just a few TTL and memory chips.

Similar projects worth following
The Kobold K2 CPU will be on a single pcb, constructed from TTL IC's.

Its main characteristics are:
- 16 bit processor, 16 bit databus
- 8 registers
- can access one Megabyte of memory
- no microcode
- every instruction executes in two cycles


To make it a complete computer, the K2 CPU will be connected to a mainboard, that will have:

- memory
- video system, 80 x 25 characters text
- video system, full color graphic mode with two layers
- sound
- onboard mass storage 32MByte
- I/O connectors

Constraints are:
- low number of parts (TTL)
- no off-the-shelf processor or microcontroller
- no 74181 ALU

For the CPU part, around 40 TTL IC's will be used.

MOTIVATION

After having worked several months on the first Kobold CPU , I got the feeling that it was going in the wrong direction. I was working on a Javascript assembler, and got tangled up in the microcode complexity. I also didn't like that so many parts were needed to decode the microcode. So I decided to make a huge change in the design. Here is Kobold K2 !

So what will change ?
  - Microcode is not used any more, instructions will be RISC
  - Four new 16-bit data registers in hardware (now total 8 registers)
  - The 8-bit ALU will change to 16-bit ALU
  - All instructions need two cycles (fetch, execute) 

The Kobold K2 will be almost twice as fast, and its operation will be easier to explain.
The video system will stay mostly the same.


STRATEGY

Finding the balance between low number of parts and high functionality is one of the key aspects of TTL CPU design (at least, for me it is). I want to keep the part count low, but not to the extreme as in #1 Square Inch TTL CPU. The CPU part of the computer should fit on a single PCB.

To keep the control system simple, every instruction should execute in a single cycle. If the ALU was kept 8 bits wide, that would mean 2 instructions for many 16-bit actions (as in the Z80 or 6502), and that would slow down 16-bit operations. Therefore, the ALU is now 16 bit wide. I don't want to use the 74181 ALU, so to keep part count reasonable, the ALU has only a few functions. The small number of functions also simplifies control.

The average performance per clockcycle is expected to be higher than that of a 6502 or Z80 and might come close to the performance of a 68000 in several situations. The performance is mainly due to the RISC strategy, fast access to 4 data registers and 4 address registers, and to having everything 16 bit wide.

PCB IMPRESSION

The pcb of the CPU is now (5 oct 2019) routed. It also gives an impression of the various CPU parts (ALU, Registers, Control), see also this log about the PCB.

LOGS

1.  Operation principle

2. Instruction set

3. Addressing modes

4. Instruction sequencing

5. Subroutines

6. Instruction encoding and conditional branching

7. Schematic of the CPU

8. CPU schematic explained

9. PCB impression of the CPU

10. Changing the memory access model

11. More conventional instruction sequencing

12. New instruction set

13. Hello Simulator !

14. Instruction Map

ms-excel - 50.50 kB - 11/19/2019 at 10:17

Download

Adobe Portable Document Format - 79.82 kB - 11/19/2019 at 10:02

Preview
Download

circ - 937.97 kB - 11/19/2019 at 10:02

Download

hello 20191115.dmp

Hello World binary, to put in the Logisim RAM

dmp - 119.00 bytes - 11/15/2019 at 15:55

Download

  • Instruction Map

    roelh3 days ago 0 comments

    After making many changes, it seems that the instruction set is now finally stable. The instruction set is now very orthogonal. An instruction map was made, that clearly shows the meaning of the upper 8 bits of the instruction (Click on it for a readable version. Excel file is in the file section).

    There are four addressing modes:

    • (An+d) Register indirect with 4-bit displacement, includes 16 bit immediates. The result can optionally be incremented by 1.
    • (zpage) One of 128 locations in the zero page (lowest bits in instruction)
    • An+d The value of an address register, incremented with 5 bit constant. Also used for short jumps.
    • Imm-8  an 8-bit constant (lowest bits in instruction)

    Almost all instructions write the result to one of the four 16-bit data registers, or one of the four address registers (this includes the PC).

    The most used instructions have a color in the map, from top to bottom:

    • Green, logical NOR and MOV-Complement.
    • Pink, 16-bit ADD
    • Blue, MOV instructions (including jumps)
    • Yellow, MOV to memory (store)
    • Orange, conditional MOV instructions (including conditional jumps)

    There are several empty positions in the map, so there is room for an extended version that has more instructions.

  • Hello simulator !

    roelh7 days ago 0 comments

    Today a minor milestone was reached.

    The javascript assembler is working (for most instructions). I didn't work yet on the Javascript simulator .

    But I also have a Logisim simulator. The assembler output for 'Hello World' was loaded in the Logisim simulator, and it worked ! It's the first real running program written in Kobold Assembly !

    The program is this:

                0 ; Kobold K2 assembler test
                1 
                2 screen: equ 0xf000
                3 newline: equ 0x0d
                4 
    00000 7C00  5  mov 0,d0
    00002 7A20  6  mov text,a2
    00004 631E  7  mov screen,a3
    00006 7008+ 8 loop:
    00008 6540  9  mov (a2),d1
    0000A 7242  10  add 2,a2
    0000C 9560  11  mov d1,(a3) 
    0000E 4514  12  add 0xffff,d1 ;test for zero
    00010 A092 b13  brc loop ; branch if non zero
    00012 700C+ 14 hlt: jmp hlt
    00014 6084 
                15 
    00016 AAAA  16  data section
    00018 0014-
    0001A 0008-
    0001C FFFF-
    0001E F000-
                17 
                18 text:
    00020 0048  19  dw 'H'
    00022 0065  20  dw 'e'
    00024 006C  21  dw 'l'
    00026 006C  22  dw 'l'
    00028 006F  23  dw 'o'
    0002A 0020  24  dw ' '
    0002C 0057  25  dw 'W'
    0002E 006F  26  dw 'o'
    00030 0072  27  dw 'r'
    00032 006C  28  dw 'l'
    00034 0064  29  dw 'd'
    00036 000D  30  dw newline
    00038 0000  31  dw 0

    Some remarks:

    • In the instructions, source comes before destination: MOV SRC,DST
    • The first instruction (mov 0,d0) is non-functional, because execution after reset skips the first instruction
    • There is no HLT instruction. A the end of the string, the program just keeps jumping to the same HLT label.
    • Note the use of the ADD 0xffff instruction to test for zero. That's because there only is a carry flag, no zero flag.
    • Immediates that do not fit in 8 bits are 16 bits and placed at the end of a 16 word chunk, and flagged with a '-' for the human reader. so you find the 0xffff for the 'test for zero' on address 0x0001C. The values 0014 and 0008 are for branches to loop and hlt, to be discussed later.
    • The logisim textscreen responds to writing to every address of 0x8000 and higher. The textscreen has the address 0xf000 in this example.
    • The Logisim simulator can not read bytes from memory, so the text has been placed in words. The actual CPU wil be able to read bytes.
    • Line 13 is marked 'b' and lines 8 and 14 are marked '+', both will be explained later.
    • Actually, the page registers of A2 and A3 should have been set to 0 with an instruction, but in the simulator they are already zero by default. However, the reset hardware does set the page of PC to zero. 

    The Logisim file and the file to be loaded in the Logisim RAM are in the file section. 

    The assembler can be tried here: Kobold K2 Assembler. Just press 'Assemble' to run the assembler. Feel free to try some code changes. The assembler can load/store files from/to your own PC.

  • New instruction set

    roelh11/03/2019 at 13:57 0 comments

    Here is the new instruction set. It is quite conventional (thats my personal view), but has a few quirks due to the fact that a lot of functionality was pushed into the instruction set, while keeping the decoding circuits very simple.

    I will present the new set in three encoding tables, starting with a few simple ones.

    (There is also a colored instruction map available)

    WRITE TO MEMORY

    There are two ways to address memory:

    • Pointer with displacement. WP, A2 and A3 can be used as pointer (PC should be used for reading only).
    • Zero page. Zero page size is 128 words (256 bytes).

    All four data registers can be written to memory, either as word or as byte. To write an address register to memory, first move it to a data register and then write it.

    INCREMENT ADDRESS REGISTER

    Address registers can be incremented with a value from 1 to 31. This can also be conditionally, mostly used in combination with A0 (PC). So the branches can skip 15 instructions at most. For greater distances, 16-bit immediate conditional jumps must be used. Note that the destination of the addition can also be a data register.

    ALL INSTRUCTIONS

    This may look a bit complex. But you just pick an instruction, a source and a destination.

    Note that there are several ways to specify the source operand:

    • Address register indirect with displacement, (includes 16-bit immediate)
    • Zero page location
    • Address register plus small (5 bit) constant
    • short immediate (7 or 8 bit)

    Oddities:

    • The MOV to register instructions can be conditionally executed.
    • A jump or branch is just a move or conditional move to the PC
    • There are MOV-with-increment instructions. One of its uses is incrementing a value in memory with just two instructions
      • MOVI (value),D2 ; get value in D2 and increment it
      • MOV D2,(value) ; store value back
    • ADD-with-increment is just adding with the Carry-input of the ALU set. Useful for subtraction.
    • For MOV (and ADD) with 8-bit immediate, the lowest immediate bit is formed by choosing between the MOV and MOVI instruction. The assembler will handle this. Since this lowest immediate bit is connected to the carry-input of the adder, it is not available for the logical functions NOR and MOVC.
    • SHL and SHR instructions act on the result of the previous instruction.
    • The SHL instruction writes to memory as a side-effect. The assembler will use the highest zero-page word (at 0x00FE) as location to write to.

    Remarks for address register destination:

    • ADD instructions have 3 operands, the source-2 operand is always a data register, and the number of the source-2 data register must be the same as the number of the destination address register. So ADD A1, D2,A2 is possible but ADD A1, D2,A3 is not possible.

    [ edit 20191115: I pushed some more functionality in the ISA, schematic in file section has been updated ]

    [ edit 20191119: new schematic uploaded to file section ]

  • More conventional instruction sequencing

    roelh11/01/2019 at 22:02 0 comments

    This week I was working on the assembler again. As said in the previous log, the assembler is quite complex because it also has to place the instructions in the correct sequence. 

    Although the instruction sequencing and the conditional branching can be explained, the inner working of the assembler will be obscure due to its complexity.

    And if we ever come to the point where programs can be built on the machine itself, we also need an assembler that runs on the machine itself, so that must probably also be written in assembler (while the current assembler is written in Javascript). I shivered at the thought of having to code this again.

    So I decided. Design change.

    Thats the nice thing about a hobby project. You can keep changing. Your project can even keep changing without ever coming to an end....

    The instructions wil be in sequence. There will be a hardware 4-bit program counter, that will address the instructions together with the other bits in the HC670 register, much like the Kobold-one.

    At the end of a 16-instruction block, a jump instruction must be placed to go to the next block.  This will be done automatically by the assembler. Conditional branches will now also be done in a conventional way. 

    The three NNN bits that hold the next slot number can now be used for something else.  This will make the decoding of the special instruction variants easier, and give room to provide more options for some instructions.

    [ edit: In fact, the instruction variants vanished, making the ISA much more orthogonal. There is now also room for 8-bit immediates within the 16 bit instruction, and zero page size expanded to 128 words, see next log. ]

  • Changing the memory access model​

    roelh10/16/2019 at 17:52 0 comments

    In the past weeks, I did first draw the design in the Logisim simulator. The first few instructions were succesfully simulated. Then I started working on a Javascript assembler-simulator combination.

    While I've been making assemblers in the past, this one proved quite difficult. You'll remember from one of the previous logs, that the instructions can be arranged within 8-instruction blocks in almost arbitrary ways. But the actual sequence becomes important for flow control and optimization of (conditional) jumps. Above that, each 8-instruction block must end with a jump to the following block, and sometimes a slot in a block stays unused because there is a multi-word instruction or instruction sequence that can not be distributed over two blocks.

    The goal was, that the assembly programmer doesn't have to concern himself with the above subtleties, and that the assembler program does all this. Now in a 'normal' assembler, the instructions have very less interaction with each other. That's totally different now. 

    But now that the struggle to do automatic instruction sequencing by the assembler has almost been completed, an inconvenience in the design came to the surface of my mind. That is the memory access system.

    The proposed system has two models, the linear and the object model. This forms a kind of two-dimensional memory system with the A0-A15 address on one axis, and the page-or-displacement value on the other axis. If a language like C would do memory allocation, using the Kobold system, that would mean that memory would be allocated in two dimensions. This would imply a lot of complications.

    So I decided to change to a more common model, while keeping most of the advantages of the 'old' system. The schematic of the old model is still available as version 20191011.

    NEW MEMORY MODEL

    In the new memory model, the object system has become a part of the linear model. It is almost the same as in the Kobold-one.

    The address of an operand is formed by:

    • A0: from address register. Is high for byte-access to the MSB of a word.
    • A1 - A4: from address register, OR'ed with the four displacement bits
    • A5-A15:  from the address register
    • A16-A19: 4 bits from the page register that belongs to the address register

    An instruction can have a 4-bit displacement that is OR'ed to bit A1 - A4 of the above address. The result determines the position of a memory operand.

    The address of the next instruction is constructed as follows:

    • A0 is always zero, because an instruction is a word.
    • A1, A2, A3 come from the NNN bits in the current instruction
    • A4 - A15 come from address register A0 (PC, program counter)
    • A16 - A19 come from the page register of the program counter

    You see that the lowest four bits of the program counter are not used to address the next instruction. That opens the possibility to store a copy of the program page in those four bits. This gives us the same more-than 64K jump capability as in the old model, for instance for return address storage:

    • The PC is moved to a dataregister, and the subroutine stores it in the stack frame
    • At return, the stored address is written to the program counter, and to the page register of the program counter, at the same time! As before, the return address is always in the same instruction slot.
    • So now the page register has correct contents, because that was previously stored in A0 - A3. And A4 - A15 will be used to fetch the instruction, together with the page (A16 - A19).

    And, also, a jump or call is still able to reach all memory positions without a near-or-far mechanism.

    The schematic is now updated. The log Addressing modes was also updated.

  • PCB impression of the CPU

    roelh10/04/2019 at 18:54 0 comments

    A colorful log this time.

    The CPU is completely routed now. I used a mix of automatic and manual routing. 

    I chose a DIN41612 connector for the CPU (3 rows of 32 contacts). The middle row of contacts is not used. 

    The PCB size is 4.0 x 5.575 Inch ( 10.24 x 14.15 cm ). Clicking on the pictures gives a slightly larger image. 

    You will see component placement, top layer, bottom layer and their combination here. Clicking on the pictures gives a slightly larger image.

  • CPU schematic explained

    roelh10/02/2019 at 19:31 0 comments

    This will be a quite long log....  luckily, it will naturally stop when all 42 IC's have been explained...

    This will be a detailed description of the schematic. For a good understanding, first read the first log and the logs that follow it.

    The schematic was split into nine sections, that will be discussed:

    • Instruction register
    • ALU
    • Data and Address registers
    • Shift unit
    • Buffer unit
    • Page registers
    • Control
    • Bytewise memory access
    • Specials

    I want to start with something simple, the ALU, but let's first do the instruction register because allmost every subcircuit is connected to instruction bits.

    [edit: this describes an older version of the schematics, it needs an update]

    INSTRUCTION REGISTER

    The instructions are 16 bits wide. The register gets written during the FETCH cycle. Its outputs are (from top to bottom):

    • N0, N1, N2 hold the slot number of the next instruction
    • Z, that is active for ZPAGE (and vector) addressing
    • P0, P1 select which of the four registers A0 - A3 puts its contents on the address bus
    • R0, R1 select D0-D3 as source register, and D0-D3 or A0-A3 as destination register
    • D0-D3 select the displacement (0-15) that is put on A16-A19 of the address bus
    • A, L, M, S are the instruction opcodes, mainly connected to:
      • A is 0 for an address register destination, 1 for a data register destination. For writing to memory, it selects between writing a word and writing a byte.
      • L is 1 for forcing the first ALU input (signals OP0-OP15) to zero (by disabling the data register output and using pulldown resistors). This will change ADD to MOV and NOR to MOVC (MOV-complement)
      • M selects (when S=0) between ADD and NOR in the ALU, and (when S=1) between register-to-register operation and a move to memory
      • S is 0 for an ALU operation with memory operand, and S is 1 for register-to-register operation or a move to memory.

    Note that during reset, the instruction is forced to 0x0000. The function of the instruction bits and their combinations can also be found in the Instruction encoding log.

    ALU

    The ALU is quite simple. There are two input busses, D0-D15 that normally come from memory, and OP0-OP15 that come from one of the data registers D0-D3. Outputs are BUS0-BUS15.

    At the left side, you see 16 gates that perform the NOR function on the two inputs. In the middle, four HC283 adders will add both inputs. At the right side, a bunch of multiplexers will choose either the NOR or the ADD as a result on the result signals, called BUS0-BUS15. The IR_M signal comes from the instruction register, and provides the selection signal for the multiplexers. 

    Note that the multiplexers that deliver the BUS0-BUS15 signals can be put in high-impedance state with the FN_OE/ signal, and the adder has a carry-in and carry-out, both to be discussed later. 

    DATA AND ADDRESS REGISTERS

    The data and address registers are built with the 74HC670 (that chip is explained HERE). There are 4 data registers and 4 address registers, these are all 16 bit wide. (These are actually latches instead of registers, to be discussed later).

    The input to the registers is shown at their left side. The data bits come from the ALU (they are connected to BUS0-BUS15), and the R0, R1 bits determine which of the four registers in the IC gets written (when DATA_WE/ or ADDR_WE/ is low).

    The output of the data registers is connected to the OP0-OP15 bus. One of the four registers is selected with the same R0, R1 signals. Note that the IR_L bit, coming from the instruction register, can disable the register output. The outputs will then be pulled low by pulldown registers (on another part of the schematic). Since the OP0-OP15 bus is one of the inputs to the ALU, this will change an ADD instruction to a MOV and a NOR instruction to MOVC (MOV-complement).

    The output of the address registers is selected by P0 and P1 coming from the control section, but when ZPage addressing is selected, this will always be A1 (the workspace pointer...

    Read more »

  • Schematic of the CPU

    roelh09/18/2019 at 19:34 0 comments

    What you have all been waiting for....

    The first version of the schematic for the CPU was just uploaded to the file section !

    A few things were added:

    • Provision for reading and writing bytes from/to memory. Address bit A0 will select low or high byte (as with most processors). Writing bytes will need software assistance. Single-cycle reading and writing of 16-bit words is of course still possible.
    • The 64-word Zero page has got companion of another 64-word zero page. The second zero page is called the vector page, and is automatically selected whenever zero-page contents is written to the PC or D0 register. This is ment to be used for frequently accessed subroutines. A call can get its address from the zero page, saving space for an immediate value in the program code. The D0 register can also be used to store a value in the vector page.

    The connectors of the CPU will probably change.

    Explanation of schematic will be done in a next log.

    Number of chips grew slightly above 40, it is now 43. 

    [ edit: schematic updated today, 20191023 ]

  • Instruction encoding and conditional branching

    roelh09/17/2019 at 18:24 0 comments

    [ edit: the instruction set has changed, see New Instruction Set ]

    At the moment, this is the instruction encoding:

    You get a bigger version when you click on it.

    An instruction is 16 bits. The upper 8 bits are in the left part of the picture, and the lower 8 bits are in the right part. The upper 4 bits control almost directly the data path, and the other bits set displacement and register numbers.

    CONDITIONAL BRANCHING

    The CPU only supports branching on carry set. To be more precise, EVERY ADD instruction will do a branch on carry.

    The branch must be within the same instruction block, and the branch target is the next slot as defined in NNN, but with the lowest bit set to one.

    So what if we don't want to branch at all ? Then we put the next instruction in a slot with an odd number. If now the carry gets set, it will make the lowest bit of the next slot number one. But it is already one, so it will go to the same instruction, whatever the carry bit may be.

    And what if we want to branch on NO-carry ? We simple change the positions of 'normal' next instruction and the jump target. This is possible, because instructions can be in any position within a block.

    Note that jump-on-carry can also be used to test for zero. Just add 0xFFFF to the value that you want to test. If there is a carry, the value was not zero. The constant 0xFFFF will not cost us space for an immediate variable if we put it in the zero page space.

  • Subroutines

    roelh09/15/2019 at 10:24 0 comments

    [ edit: the contents of this log does no longer reflect the operation of the Kobold K2. So it is for amusement only.]This is the instruction sequence for subroutines:
    • In slot 2, save the program counter to the D3 register
    • In slot 3, load the program counter with the subroutine address (found in slot6, shown in GREEN)
    • The subroutine is executed now. The first instruction saves the return address, that was stored in D3, in the workspace at slot 15.
    • When the subroutine has finished, the program counter is loaded with the saved program counter and execution continues at slot 7 in the main program.
    • Back in the main program, instruction 4 is executed, followed by instruction 5 and then a jump to the next block. 

    By convention, the slot to return to is always slot 7.

    In most cases (but not in this example), the subroutine will also create its own "stack frame" by using a new value for the workspace pointer WP. This makes nested and recursive functions easy.

    In this example, the call instruction is in slot 2 and 3. But it can of course also be placed in other slots. Note that this mechanism utilizes the possibility to put instructions in any order that you want. 

    ADDRESSING WITHIN 256K RETURN LOCATIONS WITH ONLY 16 BITS 

    The program counter has 16 bits, where bit zero is always 0 because instructions are only on even addresses. Each value in the PC addresses a sequence of 8 instructions. So the maximum number of instructions that can be addressed is 262144. 

    For the return address, only 16 bits have to be stored. The drawback is that within an 8-instruction block there can only be one subroutine call, because the return point is always in slot 7. 

View all 14 project logs

Enjoy this project?

Share

Discussions

monsonite wrote 10/08/2019 at 10:06 point

Roelh - I like your minimum ALU and the use of the 74xx670 register file. I'm working on something similar, but using a 4-bit wide, bitslice approach in order to keep the logic layout and pcb design simpler. Are you proposing a 12.5MHz clock to keep things synchronous with a VGA output?   I'm looking forward to hearing of your progress.

  Are you sure? yes | no

roelh wrote 10/08/2019 at 10:34 point

Monsonite, I saw your postings on Anycpu/forum. For a CPU there are endless design possibilities...  I'm curious what you wil come up with. Yes, clock is synchronous with VGA. And thanks for for introducing Kobold on Anycpu: http://anycpu.org/forum/viewtopic.php?f=23&t=623

  Are you sure? yes | no

monsonite wrote 10/08/2019 at 12:37 point

Roelh - I was intrigued by Kobold-1, now there are so many new ideas in Kobold-2. I have spent the morning reading the project logs so that I now have a better idea of your design, and how it works.  BTW - now that you can get very cheap 4 layer pcbs from China, I would recommend the use of separate power and ground planes.  This will improve your signal integrity, with faster edges, and reduce signal distortion and noise. As well as providing a very low impedance ground plane, you get much better power distribution and can eliminate the overhead of the wide power distribution traces on the signal layers.  The slightly increased cost will be justified by much improved performance.

  Are you sure? yes | no

threeme3 wrote 09/14/2019 at 10:38 point

Roel, very interesting development again. Just curious about the PC increment, looking to your drawing something very smart is happening there I think. Just guessing, is it that during the fetch cycle the current PC is incremented by the ALU (with one one of the values in the data registers) and  written back in A0?

  Are you sure? yes | no

roelh wrote 09/14/2019 at 11:14 point

The PC increment system is very unusual. I will soon write a log about it.

  Are you sure? yes | no

Dave's Dev Lab wrote 09/14/2019 at 02:10 point

how are you planning to implement the VGA support?

  Are you sure? yes | no

roelh wrote 09/14/2019 at 07:28 point

This will be similar to the first Kobold. But interrupts are difficult in the new design, so I plan to use a DMA system where the video system stops the CPU to obtain access to the shared RAM. So the CPU will only run during blanking time.

  Are you sure? yes | no

Chase Rayfield wrote 09/18/2019 at 01:39 point

How about double clocking the ram and interleaving CPU and VIDEO accesses? I guess it depends on how fast the system clock is and how fast you ram is... even if only part of your ram used faster chips this might still make sense.

  Are you sure? yes | no

roelh wrote 09/18/2019 at 07:30 point

Hi Chase, video will have to read two 8-bit pixels from memory every 80nS, and I don't think I will succeed making a 40nS cycle for video and a 40nS cycle for the CPU. But the CPU could run almost continuously if video got its own independent memory, or if only characters are read and an independent character ROM is used.

  Are you sure? yes | no

Dan Maloney wrote 09/12/2019 at 14:54 point

Love these discrete chip CPU builds, especially TTL - I cut my teeth on those chips. Looking forward to seeing more progress!

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates