07/15/2019 at 13:20 •
There was just one thing that was known to be not right in the cpu design. That was the RESET function.
The CPU receives an active-low RESET signal from the main board. It is activated at power-on and when the reset button is pressed.
The reset signal will reset the UPC (micro-PC), so after a reset, the microcode will be executed from the microprogram store starting from address zero. At address zero, microcode will be present that sets the PC to a starting position (it might also do something more to start the CPU).
So far so good.
But the CPU has branch instructions. For forward branches, the 16-bit instruction code is 0000 0000 xxxx xxx0. An 8-bit addition to the PC would be everything that is needed, but in this architecture the lower 16 bits of the PC must be updated all at once (and we also like to have a carry from lsb to msb). The simplest microcode sequence will simply add the full 16-bit instruction code to the PC. This is only 4 cycles of microcode:
0000 add r,al,pc ; lsb of sum to r 0001 add a,ah,pc ; msb of sum to b, lsb from r to b 0002 ld pc,b; move to pc 0003 ld ayu,(pc); go to next instruction
Not all instruction bits are connected to the microcode storage: the three p and three q bits, that can select two registers for the instruction, are not connected. Instruction structure is:
rr-- -qqq ---- ppp-
Where only the dashes and r bits are connected to the microstore address.
You see the problem here: The microcode can not distinguish a reset from a branch with a distance of 00001110 or less !
We want to fix this, without adding extra parts to the design (the design has already drifted away from the minimal-parts goal).
A solution was found. We use one of the condition bits, that are also connected to the address of the microprogram. There are three condition bits:
The input bit comes from a flipflop that is loaded from a multiplexer on the main board (that can connect the input signal to several sources). For instructions in group 0 (the instruction group of the forward branch), the input flipflop by default gets the value of address line A0 when an ALU instruction is executed (The microcode uses this to see if a byte-memory access is for the higher or lower 8 bits in a 16-bit word.).
We use this input flipflop to solve the problem. In the branch instruction, both ADD instructions will have A0 low (because the PC is even and the jump distance also), so the input bit will be low.
When a new instruction is loaded, we will reset the input flipflop. It will stay low during the branch (as just explained).
But during system reset, the flipflop will be preset to one. The ADD instructions in the branch will be made conditional, to execute only when the input flipflop is low, so there are no ALU operations done, and the flipflop stays one.
Now, the final 'go to next instruction' will be made conditional, to only execute when the flipflop is low, So a branch will be executed. But when the flipflop is one, 'go to next instruction' will not be executed, and the following microinstructions will do the reset sequence.
Sidenote: The 'go to next instruction' was already conditional, because this is the moment in every instruction that the interrupt signal will be checked, and special action will be taken if the interrupt is active. In our case of the branch instruction, this means that it will become a kind of three-way jump (during reset, a interrupt can not occur).
The schematics and other logs will soon be updated with this change [done 20190723, also updated schematics].
06/15/2019 at 20:17 •
Today I finished the schematic of the main computer board (the first version, I guess...). You can find it in the file section, and the most important parts will be discussed here.
SYSTEM INPUTS AND OUTPUTS
The 24 system outputs can be directly controlled by microcode instructions. [updated 20190723] They are used for:
- video control
- SPI bus
- selecting shift modes for the B register
- enabling upper- or lower byte of video- or main memory
- selecting a system input, together with OP6 (instruction bit).
The system inputs can select from 8 input signals. Normally, IN_SEL0 and IN_SEL1 are zero, and the instruction bit OP6 selects between NON-ZERO condition or D0. So, instructions that check for zero must have bit 6 one. Instructions that check the lowest address bit A0, for enabling high or low byte, must have bit 6 zero.
For selecting keyboard or mouse signals, IN_SEL0 and IN_SEL1 must be written first. It is expected that polling these signals in the video-line interrupt will be just (barely) fast enough to read them.
VIDEO ADDRESS GENERATION
Next comes the video address generation. There are two situations:
- The video system is in control and reads the pixel data from memory. CPU_ACCESS/ is not active, VDU_ACCESS/ is active (low). Address bits A1 - A8 come from the pixel counter (changing every 160nS). Address bit A0 comes from the VDU_A1 signal from the timing generator (changing every 80nS). Address bits A9-A17 come from the 9-bit register formed by U4 and U19B. The NAND gate will generate an interrupt for the CPU when the count is 160 (128 + 32). The interrupt will end when the count is 192 (128+64). The end of the interrupt is important, because that is used by the CPU to exactly synchronize to the video signal, in order to generate exact horizontal sync signals and resetting the pixel counter on time.
- The CPU has control and can write pixel data to the video memory. CPU_ACCESS/ is active, VDU_ACCESS/ is not active. The output of the pixel generator is now disabled, and the address bits A0-A8 are now delivered by the buffers U6 and U17A. Note that the pixel counter keeps running. The video memory output is disabled.
VIDEO DATA HANDLING
When the VDU is in control (pixels read from memory), two pixels are read every 80nS and delivered in the registers U9 and U10. Seven outputs of the registers are connected to the resistor-based D/A converters that generate the voltages for the RGB signals. The timing section will determine if U9 or U10 delivers its data to the D/A converters.
- In HIRES mode, both pixels will be sequenced, so each pixel will be visible for 40 nS.
- In DUAL LAYER mode, bit 7 of pixel two determines which pixel will be displayed during this 80 nS cycle.
When the CPU is in control, it can write its data through the buffers U7 and U8 to the memory. The control section will in this case not enable the PIXEL_CLK, so the video output keeps displaying the same pixel (black).
The timing source is at the upper right. This should be a 25.175MHz oscillator, but that is only available in 3V3 version, so I took a 25MHz 5V oscillator and hope the monitor will handle it.
The two flipflops below the oscillator divide the clock down to 12.5 and 6.25 MHz. The NAND gate generates the 6.25 MHz CPU clock. It is only active (low) 1/4 of the time, this is needed because the CPU contains some latches instead of flipflops, and we must be sure that the correct latch is selected before the clock goes low. This asymmetric clock gives more time to select the correct latch.
The two 8-bit registers control the video section. The CPU can write to the video memory when both are disabled. Only one should be enabled at a time.
When the upper register is enabled, control signals for HIRES mode are generated.
When the lower register is enabled, control signals for DUAL LAYER are generated.
Next thing to do, is design a PCB for the main board.
06/10/2019 at 08:32 •
This is a first design for the instruction set. In the final computer, the instruction set could be changed any moment by putting a new microcode in the Flash microstore.
This instruction set borrows from the 68000, PDP-11 and SPARC !
Here are the registers:
The register set R0 - R7 has general registers, they can be used as data or address registers, comparable to the registers in the PDP-11, but in this case the registers can also contain 32 bit data. They are located in memory. There can be many of these register sets, and the WP register points to the current register set. A called function does not have to push registers, it simply changes WP to get a fresh set of registers.
The registers A0 - A7 are global address registers, located at a fixed position in memory. Each of these registers can point to a structure (of 8 words max). The elements of such a structure can be read or written with a single instruction.
All instructions are 16 bit, as follows:
And the overview of instructions:
A few remarks:
- All addressing is within 32KWords, except for the "MOV far" instructions that address the full 1 MByte.
- CALLS, JUMPs and RET are within 32KWord for word size, and use full 1 MByte destination for long size.
- On this level, there are no flags. Conditionals are done with compare-and-skip instructions. The skipped instruction can of course be a branch (with 7 bit value). Skipped instructions should not have immediate values following them, of course.
- Branches forward have the upper 8 bits zero, and branches backward have the upper 8 bits ones. A branch is done by simply adding the 16-bit instruction to the PC.
- Return instructions include a function-return value (an 8-bit immediate or any other value).
- Moving an immediate value to memory or adding a register directly to memory is supported. For 4-bit values, this moving is a single 16 bit instruction that includes the 4-bit value.
- The "indexed" addressing mode will add two registers to calculate a source or destination address, then do the requested operation, all in a single 16 bit instruction.
- In byte-size mode, The (R) or (R+) addressing modes access bytes in the lower or upper half of a memory word (depending on the lowest address bit), to access arrays of bytes.
- There are no memory-to-memory instructions (as in the 68000 and PDP-11). These might be emulated by letting the assembler generate two instructions.
It is expected that the "pointer + displacement" address mode will be used a lot. Note that the displacement is OR-ed to the address, so the address needs to be proper aligned. The following picture illustrates this addressing method
This is a rather ambitious instruction set. Perhaps it will be simplified if it is too much work to implement.
I am open to suggestions !
PS Several decoding details changed today (20190629).
06/06/2019 at 11:30 •
The current plan is that Kobold has a bitmapped, multicolor display. It has 640 visible pixels per line, with 256 colors per pixel.
But we want more.
Suppose we want to program a game, where the hero (let's call him Mario) has to move in front of a background picture. Making the hero move involves a lot of software actions:
- to remove him from the current position, redraw the background at the hero position
- the hero is represented by a rectangular bitmap where several positions have 'transparent' color. The transparent positions must be filled with correct background pixels of the new position
- the result of step 2 must be copied to the screen memory at the new position.
The early video games had a nice solution to this. At the positions that the hero could reach, the background was mostly black. This makes it really easy to remove the hero (just overwrite with zeros) and to place it at a new position. To make this realistic, the games were in a dark setting (in a cave or in outer space). Examples are:
- Nodes of Yesod
- Prince of Persia (Apple II 6502 source code available on the www)
For Kobold, we want to move Mario in front of a real background, without too many software actions.
This can be done with Sprites (Wikipedia), objects that can be shown on screen, appearing in front of the background. They can be moved without a lot of software work.
The spirit of the Kobold project does not allow to use a TMS9918 or other special video control chip. Simple TTL chips should be used. But real sprites will need a lot of TTL, also against the Kobold philosophy.
A solution (or call it compromise) has been found. There will be two modes (The mode can be set differently for each scan line):
- Hires mode, 640 pixels per line. Each memory word in video memory has two consecutive 8-bit (256 color) pixels.
- Dual layer mode, 320 pixels per line. Each memory word in video memory has two 8-bit pixels, one for the background layer and one for the foreground layer.
In dual layer mode, there will be a special bit or bit combination in the foreground pixel byte meaning 'transparent'. If a foreground pixel is transparent, the corresponding background pixel will be shown. If it is not transparent, the foreground pixel will be shown.
Although moving the foreground objects still involves software, moving Mario is simply painting a rectangle at old foreground position with 'transparent', and copying the hero's bitmap to the new foreground position.
Perhaps only a single extra TTL chip is needed for this dual layer mode. But I also have to increase the number of available control signals... I found space in the microinstructions to control 8 more output bits, now having 24 of them, and updated the description of the microinstructions.
Writing pixels to video memory will be quite fast because it will be supported by special microcode.
05/31/2019 at 16:59 •
A new option was added to the microinstructions, to enable powerful instructions of the form:
(R4 points to a struct. The contents of the 6th field of the struct is copied to R2.) A previous log was updated to show this new microinstructions.
The CPU schematic is now thought to be complete, the new version is in the files section now. The pcb design is also complete:
And here is a picture of the traces (Clicking on a picture will give more details) :
Dimensions of the pcb are 5.2 x 4 inch.
The work will now continue with designing a first version of the main board.
05/26/2019 at 20:15 •
This weekend I simulated the microcode in Logisim (V2.7.1). A few things didn't work, so schematic changed a little. The files in the file section are updated.
Some things were added in microcode:
- Micro-instruction to clear Y register (overlaps with bit set/clr). For easy access to a fixed region (at address zero), for constants or for variables for the video interrupt.
- Micro-instruction to clear upper byte of B register (overlap with other bit set/clr). Now we can move an immediate byte to a word register (Rn) with the upper half of the word set to zero, in a single 16 bit instruction.
The microinstructions are simple, but a sequence of them will be quite capable. It will be possible to add the 'long' instruction type to our PDP-11 lookalike instructions, so 32 bit MOV, ADD (and more) can be handled !
05/21/2019 at 09:45 •
The past weeks were spent designing the CPU schematic, and optimizing the available microcode instructions. The CPU schematic is now in the file section.
As a reminder, here are the hardware registers:
The registers A and B are always loaded at the same time (with the same data). The B register contents can be stored in memory (and can also be shifted left or right). The A register is connected to the input of the ALU.
The UPC (micro-programcounter) register will contain the 'user' instruction. Two 3-bit fields <ppp> and <qqq> can be used to select a source- and destination register with base address WP. The 10 remaining bits select an address in the microcode storage. Two of the remaining bits (the <rr> bits) can also select a displacement for memory access with pointer X or Y.
There are 3 classes of microcode instructions. The first one is register load/store:
The user registers R0-R7 are addressed as (WP+displacement). For the displacement, the <ppp> or <qqq> register field in UPC can be chosen, these will be at addresses 0,4,6,8,A,C,E. The microinstruction has the option to add 2 to the displacement. This makes it possible to address 32 bit data (in memory or WP-based registers) in two 16-bit chunks, or address 16 word-sized registers instead of 8.
Note that when the instruction is loaded into UPC, it is also loaded in AB and Y. Why ? When the instruction contains immediate data, it is directly available to the ALU to do calculations. If it is an immediate byte load, the byte is already in the B register and can directly be moved to the destination.
The destination register can not be the same as the pointer because pointer register is a latch and not a edge-clocked flipflop.
The RL register is used to temporary store the result of the ALU for low bytes. When the ALU calculates the high byte, the high byte will be combined with the low byte that was stored in RL, and the 16 bit result will be stored at the destination.
The PC can also be input to the ALU. This is useful for adding a constant to the PC (branch) or for obtaining the return address when doing a CALL instruction.
All microinstructions can be executed conditionally depending on carry, zero or other conditions (Actually, at each microstep a certain microinstruction is selected depending on conditions).
There are 24 output signals that can be individually set or cleared by microcode. These are used for:
- selecting the condition for conditional execution
- put the B register in a mode for shift left or shift right
- write only to upper or lower half of a memory word (for byte writes)
- video signals, like pixel count reset, sync outputs, video mode
- several other I/O
[edit: instruction codes updated 20190629]
04/30/2019 at 19:56 •
Just a quick test to see how it would fit on a PCB and if it would be difficult to route. This is the CPU only. It is connected to the mainboard by the two 2 x 18 headers. Bypass caps are not yet placed.
Routing was easy, the autorouter did it within a minute ! That is, after experimenting with placement a few hours. The size is about 100 x 150 mm.
04/27/2019 at 18:53 •
The biggest parts of the schematic of Kobold are almost completed. Here you see:
- Register B, that holds 16 bit data, and can shift the data left or right, and put it back on the databus
- Register A, that holds 16 bit data and can put a single 8-bit chunk on the address bus. The other 8 bits of the address bus will be provided by an address register (not shown here). During ALU operation, the address bus will not be used to provide an address to memory.
- The logic unit performs a NAND on the two 8-bit halves of the address bus
- The Adder adds the two 8-bit halves of the address bus
- The function selector selects the NAND or the ADD as result. Microcode bit IR4 is used to select the function.
- Finally, the 8-bit result goes to the lower or upper 8 bits of the result register
- The result register can put the 16-bit result on the databus
04/26/2019 at 19:05 •
Trying to have the hardware simple, and keeping in mind that 'large' SRAMs are easy to obtain, the choice was made to have just a single basic video mode.
That is, 640 x 480 pixels, with 256 colors per pixel. The pixel time for VGA is 40nSec.
The hardware needs assistance from software in order to operate correctly.
The pixel counter will increment every 80nS, so the video RAM will deliver two new pixel values every 80nS. The 8 bit color values go to the 'first pixel' and 'second pixel' registers. During the next 80nS, each of the outputs of these registers will be enabled for 40 nS.
When the pixel counter has reached a certain value, an interrupt will be given to the CPU. In the interrupt, the CPU will:
- obtain exact synchronization with the pixel counter
- when the pixel counter is exact 199, reset the pixel counter to zero, to obtain 32 uSec line duration
- start the horizontal sync signal
- obtain access to the video RAM, disabling the pixel counter and the updating of the two pixel color registers, and enabling the pixel(X) register. It can now, after setting the correct pixel address in the line and pixel registers, write new data to the video RAM. The line and pixel registers may be latches, so the data can be written in the same cycle together with the address. Several locations may be written, depending on the available time during blanking.
- set the line (Y) register to the starting point of data for the next line
- when the end of a frame has been reached, do functions for frame synchronisation
- stop the access to video RAM, and stop the horizontal sync signal
- end the interrupt
The sequence of actions might be a little bit different than listed here.
The actions might be done by microcode, and the 16 bit processor can deliver 16 bit at a time, so this will be fast enough for most operations. If there are applications (games) that need higher speed, resolution could be dropped to 320 x 240 (TBD to be determined). [ edit: Sprites were added to the video system, see here]
Since the starting point of a line is under software control, it will be easy to do fast vertical scrolling. A clear screen will also go fast, because only a single cleared line has to be present, and all other lines can point to the same cleared line.