a day ago •
The past weeks were spent designing the CPU schematic, and optimizing the available microcode instructions. The CPU schematic is now in the file section.
As a reminder, here are the hardware registers:
The registers A and B are always loaded at the same time (with the same data). The B register contents can be stored in memory (and can also be shifted left or right). The A register is connected to the input of the ALU.
The UPC (micro-programcounter) register will contain the 'user' instruction. Two 3-bit fields can be used to select a source- and destination register. The remaining 10 bits select an address in the microcode storage.
There are 3 classes of microcode instructions. The first one is register load/store:
The user registers R0-R7 are addressed as (WP+displacement). For the displacement, the source or destination register field in UPC can be chosen, these will be at addresses 0,4,6,8,A,C,E. The microinstruction has the option to add 2 to the displacement. This make it possible to address 32 bit data (in memory or WP-based registers) in two 16-bit chunks, or address 16 word-sized registers instead of 8.
Note that when the instruction is loaded into UPC, it is also loaded in AB and Y. Why ? When the instruction contains immediate data, it is directly available to the ALU to do calculations. If it is an immediate byte load, the byte is already in the B register and can directly be moved to the destination.
The destination register can not be the same as the pointer because pointer register is a latch and not a edge-clocked flipflop.
The RL register is used to temporary store the result of the ALU for low bytes. When the ALU calculates the high byte, the high byte will be combined with the low byte that was stored in RL, and the 16 bit result will be stored at the destination.
The PC can also be input to the ALU. This is useful for adding a constant to the PC (branch) or for obtaining the return address when doing a CALL instruction.
All microinstructions can be executed conditionally depending on carry, zero or other conditions (Actually, at each microstep a certain microinstruction is selected depending on conditions).
There are 16 output signals that can be individually set or cleared by microcode. These are used for:
- selecting the condition for conditional execution
- put the B register in a mode for shift left or shift right
- write only to upper or lower half of a memory word (for byte writes)
- video signals, like pixel count reset, sync outputs
- several other I/O
04/30/2019 at 19:56 •
Just a quick test to see how it would fit on a PCB and if it would be difficult to route. This is the CPU only. It is connected to the mainboard by the two 2 x 18 headers. Bypass caps are not yet placed.
Routing was easy, the autorouter did it within a minute ! That is, after experimenting with placement a few hours. The size is about 100 x 150 mm.
04/27/2019 at 18:53 •
The biggest parts of the schematic of Kobold are almost completed. Here you see:
- Register B, that holds 16 bit data, and can shift the data left or right, and put it back on the databus
- Register A, that holds 16 bit data and can put a single 8-bit chunk on the address bus. The other 8 bits of the address bus will be provided by an address register (not shown here). During ALU operation, the address bus will not be used to provide an address to memory.
- The logic unit performs a NAND on the two 8-bit halves of the address bus
- The Adder adds the two 8-bit halves of the address bus
- The function selector selects the NAND or the ADD as result. Microcode bit IR4 is used to select the function.
- Finally, the 8-bit result goes to the lower or upper 8 bits of the result register
- The result register can put the 16-bit result on the databus
04/26/2019 at 19:05 •
Trying to have the hardware simple, and keeping in mind that 'large' SRAMs are easy to obtain, the choice was made to have just a single basic video mode.
That is, 640 x 480 pixels, with 256 colors per pixel. The pixel time for VGA is 40nSec.
The hardware needs assistance from software in order to operate correctly.
The pixel counter will increment every 80nS, so the video RAM will deliver two new pixel values every 80nS. The 8 bit color values go to the 'first pixel' and 'second pixel' registers. During the next 80nS, each of the outputs of these registers will be enabled for 40 nS.
When the pixel counter has reached a certain value, an interrupt will be given to the CPU. In the interrupt, the CPU will:
- obtain exact synchronization with the pixel counter
- when the pixel counter is exact 199, reset the pixel counter to zero, to obtain 32 uSec line duration
- start the horizontal sync signal
- obtain access to the video RAM, disabling the pixel counter and the updating of the two pixel color registers, and enabling the pixel(X) register. It can now, after setting the correct pixel address in the line and pixel registers, write new data to the video RAM. The line and pixel registers may be latches, so the data can be written in the same cycle together with the address. Several locations may be written, depending on the available time during blanking.
- set the line (Y) register to the starting point of data for the next line
- when the end of a frame has been reached, do functions for frame synchronisation
- stop the access to video RAM, and stop the horizontal sync signal
- end the interrupt
The sequence of actions might be a little bit different than listed here.
The actions might be done by microcode, and the 16 bit processor can deliver 16 bit at a time, so this will be fast enough for most operations. If there are applications (games) that need higher speed, resolution could be dropped to 320 x 240 (TBD to be determined).
Since the starting point of a line is under software control, it will be easy to do fast vertical scrolling. A clear screen will also go fast, because only a single cleared line has to be present, and all other lines can point to the same cleared line.
04/21/2019 at 09:13 •
This is the new system diagram for the 16 bit bus system.
The program counter is in the set of four registers. However, incrementing the program counter would either cost several extra cycles or extra hardware. In this low-part-count solution, the lowest 4 bits of the PC are in a hardware counter, and the other bits are provided by the register set. When the 4-bit counter overflows, the microcode will add 16 to the part that is in the register set.
Register B is a temporary register for MOV instructions. It is loaded with 16 bit data from a source memory location, and then its contents can be stored to a destination memory location. These locations include the R0-R7 because they are also in memory. Register B can also shift data. It is constructed of two 74HC299 devices.
The ALU can do only two instructions, ADD and a logical instruction, NOR or NAND, [TBD, to be determined]. Other instructions are done by microcode. The ALU takes one operand from one of the registers PC, WP, X or Y, and the other operand from A. The outputs of these registers can be tri-stated in 8-bit sections, so adding A to X will first enable the low section of X and A, add them and put the result in the low section of R. Then the same will be done for the high byte.
Note that the ALU can also swap the two bytes in a word. This will be used when byte instructions need to store a byte in the upper 8 bits of a memory word. The microcode will look at address bit 0 to determine if a byte must be moved to or from the low or high part of a memory word. For storing a byte to a memory location, the microcode can disable writing to one of the two bytes that are in a memory word, so when a byte is written, the other byte in that memory word is not affected.
04/18/2019 at 11:27 •
The current design of the microcode structure is almost complete, with just a few loose ends that have to be solved.
However, the total design effort for the project will be quite huge, and it would be a pity if the result is inferior on certain aspects. It seems that with a little more effort, a better result can be obtained (at the cost of a few more components).
This will focus on two main aspects:
- Most instructions are just moving data around. If we change the bus from 8 to 16 bit, the system will run almost twice as fast. The ALU might
stay 8 bit to keep the part count reasonable (This will perhaps only cost a single extra cycle for instructions that do calculations).
- Several aspects of the Kobold are inspired on the PDP11. With some more effort, the design could be such that the microcode can implement a binary
compatible PDP-11 instruction set. (This will only be practical with a 16 bit bus).
So, work to do !
- Most instructions are just moving data around. If we change the bus from 8 to 16 bit, the system will run almost twice as fast. The ALU might
04/15/2019 at 17:17 •
The Kobold is advertised to handle 16 bit instructions, but everything is 8 bit. Even the microcode is only 8 bit wide. How does it work ?
[ What you need to know for the examples: The program counter is in register R7 (as on the PDP11), and is copied into address register PC when needed. The lowest bit of the PC is always 0.]
EXAMPLE: 16-bit ADD
As example, take an instruction that adds (X+6) to register R4:
The instruction is split into two parts, that operate almost independent of each other:
The first byte of the instruction is fetched from (PC) into the micro-program counter. From here, the micro-instructions determine the operation:
- First part:
- load the LSB of the 16bit accumulator with (X+6), (the LSB part)
- load the MSB of the 16bit accumulator with (X+6), (the MSB part)
- fetch an instruction byte from (PC+1) into the micro-program counter (changing flow of the microcode).
This instruction byte tells to add R4 (16 bits) to the accumulator and store the result back to R4:
- Second part:
- add the LSB of (WP+4) to the LSB of the accumulator
- add the MSB of (WP+4) to the MSB of the accumulator
- store the LSB of the accumulator back into (WP+4) (the LSB part of R4)
- store the MSB of the accumulator back into (WP+4) (the MSB part of R4)
The next section of the microcode will increment the PC and start the next instruction:
- connect the LSB of the PC (from the address registers) to the B-input of the adder of the ALU. The value 2 will be provided to the A-input of the ALU, so the byte value P+2 will be put in the accumulator.
- fetch the MSB of the pc from R7 [in the workspace] and put it into the accumulator, to make 16 bits complete.
- move the 16-bit accumulator contents to the pc in the address register set.
- fetch the next instruction from (PC) into the micro-program counter.
( from here, the micro-instructions for the next instruction are executed )
EXAMPLE: 8 bit immediate load
There are also instructions that have a single opcode byte, followed by an 8-bit immediate operand or z-page location. Branch instructions are an example of this.
As example, load register R3 with value 0x80:
- load the LSB of the accumulator from (PC+1)
- load the MSB of the accumulator with zero
- store the LSB of the accumulator in (WP+3) (the LSB part of R3)
- store the MSB of the accumulator in (WP+3) (the MSB part of R3)
Finally, the pc is incremented and the next instruction is fetched, as in the previous example.
04/14/2019 at 18:17 •
Here you see the schematic of the address registers, with the five HC670 chips.
Loading the address register
The inputs (REG0-REG15) come from the accumulator. The address register to write to is selected with the WA and WB inputs (that connect to bit IR4 and IR5 of the microcode byte). When ADDR_WE is active (low) the data is written into the selected address register.
The upper four bits (16-19) of the address register are written when ADDR_PAGE_WE is active.
Output of the address registers
The five 670's always put an address on the address bus (A0-A19), because the GR signal is always active. When the USE_XY signal is low, the PC or WP is connected to the bus (selected with bit IR4 from the microcode). When USE_XY is high, the X or Y register is connected to the bus, also selected with IR4.
Adding the displacement
At the upper right you see five OR-gates that "add" a displacement to the address. The lower four bits of the displacement come from the microcode byte. The fifth bit (DISPL4) comes from the control section. Since the displacement is not really added, it only works if the address in the address register is properly aligned. Alhough a four bit adder chip could have been used for the A1-A4, this only moves the problem when that adder has a carry. But having more than one adder chip here is against the minimum parts philosophy of the project.
The upper signals AP0-AP4, together with A5-A8 can be connected to the adder in the ALU. This connection is used for incrementing the PC. Only the lower byte of the PC is incremented. To cross a 256-byte boundary, an explicit instruction will be needed.
04/12/2019 at 20:01 •
This is the current state of the ALU design. There are still several loose ends. Carry signals for addition and shifting are not yet present. The ALU is 8 bits wide, but for clarity I only show 4 bits. (Clicking it will show a better readable version).
The ALU functions are LOAD, ADD, BIS, BIC, SHR and SHL
The ALU is intended to work on 16 bit words, in two sequential cycles that each handle a byte.
The ADD function is the easiest to explain. The upper MPX (multiplexer) connects the output of the AL accumulator byte to the upper inputs of the adder chip, 74HC283. The lower MPX is disabled (by ALU_F0 signal), so all its outputs are high. That means that the input from the databus (D0-D7) is flowing through the AND-gates to the lower inputs of the adder chip. So, the adder will add the databus byte to the AL byte and deliver the result in AH.
Ooops... what is that lower byte doing in the high part of the accumulator ? And the accumulator has been clocked, so the high byte of the accumulator is now in the low byte ?
After the next cycle, it will be all right. The high byte coming from the databus will be added to the high byte of the accumulator, that is in AL now. The result will go to the AH register, and the previous result of the low byte will at the same time go to the HL register.
How do we put something in the accumulator ? We set the ALU-F1 signal (connected to the upper MPX) to 1. This disables the upper MPX, its output will be zero. We now do the ADD operation. The databus contents will be added to zero, and the result will be put in the accumulator.
For the BIC (bit clear) function (that is a logical AND where one of the operands is inverted), the upper MPX is disabled but the lower MPX is now enabled. The lower MPX inverts the data (it is a 74HC158). The following AND gates will give result: DATA and (not ACCU). The adder will add zero to this, so that does not change the result. So the result is that any bit that was set in the accumulator, will cause the same bit from the databus input to be cleared.
The BIS (bit set) function (PDP11 parlance for logical OR) is similar to BIC, but now also the upper MPX is enabled again. so both adder inputs can receive data. When an accumulator bit is 0, the AND gates will transfer the databus inputs, the adder will add the 0 bit to it, so result for this bit is 1 (if there was no carry from previous bit). And when an accumulator bit is 1, it is inverted by the lower MPX so it delivers 0 to its AND gate, so the lower input of the adder will be 0. The adder output will be 1 because the upper input is 1 and the lower input is 0. The key idea is, that at the adder, the two inputs for a certain bit are never both 1. In this case, the adder will never generate an (internal or external) carry, and will behave as four OR gates.
Finally, the SHR (shift right) function can be done by the upper MPX, because it has a shifted version of the accumulator bits connected to one of its input groups. The databus input should be zero (or not, in that case it will be added to the result).
SHL (shift left) is the same as adding.
A few functions are missing, but they can be composed from the functions that are available. This can be done in microcode, so the instruction set can still contain these missing functions. It will only have a small impact on performance. The functions that I'm talking about are SUB, XOR and NOT.
04/10/2019 at 14:15 •
For generating the memory address, the square inch processor had only a H-L register pair, that had to be reloaded each time when another address was needed. For the new processor, I want to have several addresses on standby, ready to be connected to the address bus when needed. I also want 20 bit addresses.
The 74HC670 seems to be very suitable for this. It has 4 latches of 4 bits each (and only 16 pins). If we use five of them, we have four 20-bit registers. 1-to-4 decoders for read and write are built-in ! Here are the internals:
Bits 1 to 4 will get or-gates or an adder to add the 4-bit word-displacement to the address. The lowest address bit, bit 0, is used to select upper or lower byte in a word.
So, the address generation takes only 6 chips !