• Looking at porting the 4 Bit CPU to a FPGA

    agp.cooper02/11/2023 at 12:08 0 comments

    A Virtual PCB

    At least that is how I explained it to my partner!

    It seems as if FPGA boards are pretty rare, almost all the main suppliers are out of stock, but I did find a Tang Nano 9k on ebay in Australia. So that is what I will use.

    Gowin (the FPGA manufacture) has a free and no licence educational version of the IDE for the Tang Nano (only 382 Mb) which works except for the bit stream uploader. Checking the Internet, for Linux there is no solution except for the third party programmer: openFGPALoader. Found instructions to compile openFPGALoader and it works fine. Of note, openFPGALoader's command line arguments are human readable.

    So next is to learn Verilog, but I will cheat and download a 74xxx library. But one thing I noted is that the list of implemented logic gates avoids those with tri-state outputs?

    Checking the Internet, it is strongly recommended not to use tri-state outputs as FPGA capacity to model them is quite limited. So okay I will redesign the 4 Bit CPU to use multiplexers:

    I changed the opcodes to suit the multiplexer, here is the control unit:

    I will have to swap the 74173 with a 74377 reduced to 4 bits.

    I will have to study the Gowin IP with regard to BROM and BRAM.


    74xxx Verilog Code

    Although I am not familiar with the syntax of Verilog, the 74xxx code is very easy to understand.

    There appears to be option with regard to ROM/RAM, you can roll your own or use the specialised RAM on chip. I have to review:

    • Shadow SRAM (17280 bits)
    • Block SRAM (468k/26)
    • PSRAM (64M bits)
    • Flip-Flops (6480, roll your own RAM)
    • and set up ROM

    TBC ...

    AlanX   

  • Waiting on a Part

    agp.cooper01/28/2023 at 01:10 0 comments

    PCB Assembly

    I have started assembly of the PCBs but I am waiting on a part.

    Instruction Set

    I have had time to think about improving the instruction set.

    The idea came from the number of steps required to swap out the accumulator to memory. Better if I have a register and a swap opcode.

    The idea actually frees up the opcode space as I can use the opcode "data" to specify the  register and the registers can serve other purposes (e.g. the page register).

    I need to look at how the memory data flow works as well.

    Rebuilding the Micro-Architecture

    I took the well published academic RISC micro-architecture:

    And derived the CHUMP micro-architecture:

    But along the way I saw an alternative memory arrangement:

    This configuration has three benefits:

    • The SRAM write logic is simplified (perhaps I over designed the write logic in the first place).
    • The opcode logic is unchanged.
    • The accumulator data can be written to SRAM in the same instruction cycle (CHUMP does it on the next instruction cycle).

    I have tested the new micro-architecture in Logic-Sim (and it works fine. 

    Adding Registers to the CPU

    This allows the inclusion of register read/write logic:

    In the above drawing I will probably keep the old PC and JNC logic.

    Now I can replace the Page opcode with a register(s) read/write opcode.

    Working LogiSim Version 7

    Have been working on the Version 7, it now has register read/write opcodes. Added four registers of the eight available slots:

    • Write Page/(no page read)
    • Write Output/Read Input
    • Write Reg A/Read Reg A
    • Write Reg B/Read Reg B

    Swapped the Page opcodes (Ex/Fx) with JNC opcodes (8x/9x).

    Replaced the new 8x/9x opcodes with 8r/9r were r is a register constant or memory reference:

    • Page:  W/X = 0/4
    • I/O:      W/R = 1/5
    • Reg A: W/R = 2/6
    • Page:  W/R = 3/7

    Here is the Top Level:

    The Control Unit:

    The ALU:

    And the PC:

    Overall a pretty significant improvements on CHUMP V5 and the 4 Bit CPU V6.

    Here is the test code (the multiply algorithm):

    This algorithm uses the new JNC and REGs opcodes, and multiplies F x E (13 x 14), the result is D2 (210).

    Parts have Arrived

    The parts have arrived. Finished off one of the Diode ROM boards, next is the CPU board:

    TBC ...

    AlanX

  • Version 6

    agp.cooper12/26/2022 at 10:28 0 comments

    Version 6

    Version 6 is like version 5 except the expanded ALU is not used. Here are the op codes:

    I wrote up an 8 bit multiplication routine, first in C:

    #include <stdio.h>
    #include <stdlib.h>
    #include <stdint.h>
    
    unsigned short mul(unsigned char A,unsigned char B)
    {
      // Returns:
      //   A = A * B
      unsigned short res=0;
      unsigned char i=8;  // 8 bit
    
      LOOP:
        res=res+res;
        if (A>=0x80) {
          res=res+B;
        }
        A=A+A;
        i=i-1;
      if (i>0) goto LOOP;
      return res;
    }
    
    int main(void)
    {
      unsigned char A,B;
      unsigned short M;
      int i,j;
    
    
      for (i=0;i<=255;i++) {
        for (j=0;j<=255;j++) {
          A=(unsigned char)i;
          B=(unsigned char)j;
          M=mul(A,B);
          if (i*j!=M) printf("%6d %6d\n",i*j,M);
        }
      }
    
      return 0;
    }
    

    I tested all cases so I know it works. I then dumbed it down to 4 bits:

    unsigned short mult(unsigned char A,unsigned char B)
    {
      // Returns:
      //   C = A * B
      unsigned short C=0;
      unsigned char D=4;  // 4 bit
    
      LOOP:
        C=(C+C)&0X0F;     // 4 bit adjustment
        if (A>=8) {       // MSB of 4 bit
          C=(C+B)&0X0F;   // 4 bit adjustment
        }
        A=(A+A)&0X0F;     // 4 bit adjustment
        D=(D-1)&0X0F;     // 4 bit adjustment 
      if (D>0) goto LOOP; // 4 bit adjustment
      return C;
    }
    
    int main(void)
    {
      unsigned char A,B;
      unsigned short C;
    
      A=(unsigned char)5;
      B=(unsigned char)3;
      C=mult(A,B);
      printf("C=5*3 %d\n",C);
      return 0;
    }

    Although the code handles overflow into the high order bit of the result variable (C), in this implementation, I have not considered overflow of C. Therefore 3x5 is big as the algorithm can handle:

    Here is the code running:

    In the RAM window: A, B, C & D are updated as the program runs. At the end, the output port displays the answer.

    The run starts with: 

    • A=5; Operand
    • B=3; Operand
    • C=0; Result
    • D=4; Bit count

    At the end:

    • A=0
    • B=3
    • C=15 ; Correct!
    • D=0 
    • Output=15

    Here is a version that handles lager numbers:

    If you used A=13 (D) and B=15 (F), then the result would be C=3 and D=12 (C) or 195.

    Refer to the Simulation below:

    Unfortunately both of these programs are too big for my 32 byte PROM design.

    I will check the schematic tomorrow for any missed errors.

    Yeah, found two errors, fixed and forwarded for manufacture.

    AlanX

  • Schematic and PCB

    agp.cooper12/22/2022 at 12:40 2 comments

    Schematic and PCB

    Started the schematic design, the ALU is pretty well all new, so it will take time.

    The layout will be important as the auto-router will struggle with this many chips.

    ---

    The 16 byte diode PROM boards arrived today. Two boards will have nearly 400 components, so they will take a while to solder.

    ---

    Some updates to the simulation model.

    ---

    Some progress on the schematic, trying to group the chips:

    Instruction Set Again

    The minimum instruction set is:

    1. LOAD
    2. ADD
    3. NAND
    4. ?
    5. JNC
    6. STORE
    7. READ
    8. PAGE

    Missing are instructions like LEA, CALL and RTN, etc, but these require structural changes. 

    Subtraction is pretty easy to do, I will use reference variables here.

    A = A SUB B:

    • READ A
    • LOAD M
    • NAND F
    • READ B
    • ADD M
    • NAND F
    • STORE C

    Test if equal:

    • READ A
    • LOAD M
    • NAND F
    • READ B
    • ADD B
    • NAND F
    • ADD F
    • JNC [A != B]
    • ...  [A == B]

    Test if bits are HIGH:

    • READ A
    • LOAD M
    • NAND F
    • NAND MASK
    • ADD 1
    • JNC [false]
    • ...  [true]

    Test if bits are LOW:

    • READ A
    • LOAD M
    • NAND MASK
    • ADD 1
    • JNC [false]
    • ...  [true]

    Other logic gates can be derived from the NAND gate, but may require memory to store intermediate results. Although NAND can replace XOR in many cases, XOR is "necessary" for efficient toggling of bits.

    An alternate instruction set is:

    1. LOAD
    2. ADD
    3. AND
    4. XOR
    5. JNC
    6. STORE
    7. READ
    8. PAGE

    Subtraction is using XOR.

    A = A SUB B:

    • READ A
    • LOAD M
    • XOR F
    • READ B
    • ADD B
    • XOR F
    • STORE C

    Test if equal:

    • READ A
    • LOAD M
    • XOR F
    • READ B
    • ADD M
    • XOR F
    • ADD F
    • JNC [A != B]
    • ...  [A == B]

    Test if bits are HIGH:

    • READ A
    • LOAD M
    • AND MASK
    • ADD F
    • JNC [false]
    • ...  [true]

    Test if bits are LOW:

    • READ A
    • LOAD M
    • XOR F
    • AND MASK
    • ADD F
    • JNC [false]
    • ...  [true]

    OR is awkward, using memory reference here again:

    • READ A   ; Set memory reference A
    • LOAD M  ; Using memory reference
    • XOR F
    • STORE C ; Save intermediate result
    • READ B   ; Set memory reference B
    • LOAD M  ; Using memory reference
    • XOR F
    • READ C   ; Set memory reference C
    • AND M    ; Uses address from previous store  
    • XOR F
    • STORE C ; Save result to C

    Compared to AND:

    • READ A   ; Set memory reference
    • LOAD M  ; Using memory reference
    • READ B   ; Set memory reference
    • AND M    ; Uses address from previous store 
    • STORE C ; Save result to C

    Compared to XOR using NANDs:

    • READ A   ; Operand A
    • LOAD M
    • READ B   ; Operand B
    • NAND M
    • STORE C ; Intermediate result
    • READ B
    • NAND M
    • STORE D ; Intermediate result
    • READ C
    • LOAD M  
    • READ A
    • NAND M
    • READ D
    • NAND M
    • STORE C ; Save result in C

    AND versus NAND

    While AND has the advantage that the logic in the general use case is slightly simpler.

    If I need to free up an op code slot (i.e. the XOR slot), the NAND op code is the way to go.

    Second Thoughts

    Not a lot to gain from the second op code page. I think I should have spent my time looking at structural changes.

    There are efficient algorithms for multiplication and division that only use ADD and NAND, so SAR and SAL and not required op codes.

    Where to Next?

    May be a stack to push/pop return addresses and intermediate results?

    Eventually I want to look at a single cycle Von Neumann architecture.

    AlanX

  • Instruction Set Shuffle

    agp.cooper12/21/2022 at 13:21 0 comments

    Instruction Set Shuffle

    Having ADD and SUB in different op code pages seems wrong, as SUB (ACC = Value - 1) can be coded as:

    • LOAD Value
    • XOR F
    • ADD 1
    • XOR F

    Yes, the carry flag works.

    The current op code set would be:

    • LOAD Value
    • PAGE 8
    • SUB 1
    • PAGE 0

    No saving!

    To test for a value you could use:

    • LOAD Value
    • XOR F
    • ADD Test
    • XOR F
    • ADD F
    • JNC [Value == Test]
    • [Value != Test]

    Or:

    • LOAD Value
    • SUB Test
    • ADD F
    • JNC [Value == Test]
    • [Value != Test]

    A better set of op codes would be:

    • 0-1  LOAD/LOAD
    • 2-3  ADD/SHR using Carry
    • 4-5  SUB/XOR
    • 6-7  AND/OR 
    • 8-9  JNC
    • A-B  STORE
    • C-D  READ
    • E-F  PAGE

    AND has been promoted over NAND as it can test for bit states:

    • LOAD Value
    • AND Bit Mask
    • ADD F
    • JNC [False == 0]
    • [True != 0]

    Also:

    • ADD 0 clear the Carry flag
    • SUB 0 sets the Carry flag.

    This will be version 5.

    Here is the simulation of up counting followed by down counting, then repeat:

    The code for the animation is:

    E0  PAGE 0           ; Select Op Code Set 0
    20  ADD 0            ; Clear Carry
    Repeat:
    00  LOAD 0           ; Clear ACC
    Loop1:
    AF  SAVE F           ; Output ACC
    A0  SAVE MEM[0]      ; Save to RAM
    21  ADD 1            ; Increment
    83  JNC 3            ; Loop1
    20  ADD 0            ; Clear Carry
    0F  LOAD F           ; Set F
    Loop2:  
    AF  SAVE F           ; Output ACC
    A0  SAVE MEM[0]      ; Save to RAM
    41  SUB 1            ; Decrement
    89  JNC 9            ; Loop2
    20  ADD 0            ; Clear Carry
    82  JNC 2            ; Repeat
    

    The top level:

    The ALU:

    And control:

    AlanX

  • Subtraction

    agp.cooper12/21/2022 at 00:52 0 comments

    Subtraction

    I have had a bit of a problem getting my head around subtraction and the borrow/carry flag. With some CPUs (such as the 6502) the carry flag is set and when underflow occurs then the carry flag is cleared. This works of course but makes JNC is not so useful.

    The 8086 works the other way. The carry flag is cleared and when underflow occurs the the carry flag is set. Since the 8085 and 8086 were my first microprocessors, I will go with this system, and JNC works better here.

    For the time being I am going to "jumper out" ADC and SBB, for ADD and SUB, as these instructions are more trouble then they are worth, at the moment.

    I have used the most significant bit of the page register as an OPCODE flag.

    Here is the top level view:

    Here is the PC:

    And the ALU:

    The PAGE op code E0 sets ADD and op code E8 set SUB.

    Note E0 00 clears carry while E8 00 sets carry.

    It is clear we can add other instructions to the ALU if desired.

    Here is some code to count up and then count down. JNC is tripped on overflow/underflow:

    E0    Set ADD
    00    Clear Carry
    20    Clear ACC
    AF    Output ACC
    A0    Store to Mem[0]
    01    ADD 1
    83    JNC 3
    00    Clear Carry
    E8    Set SUB
    2F    Set ACC to F
    AF    Output ACC
    A0    Store to Mem[0]
    01    SUB 1
    8A    JNC A
    E0    Set ADD
    00    Clear Carry
    82    JMP 2
    

    AlanX

  • Jump Logic

    agp.cooper12/18/2022 at 00:26 0 comments

    Jump Logic

    The current jump logic uses JNC (Jump on Not Carry). It is a complete system but not that intuitive. For multi-nimble/byte arithmetic it is straight forward, if no carry adjustment is required (i.e. not carry), skip the add carry code.

    A JGE (i.e. Jump on Greater or Equal) can be constructed by complementing one of the operands, and adding the other operand.  Following is an example of testing the input for a program number  1 to 6:

    Address    OpCode    Const    Comment    CODE
    #    READ PORT    
    0    C    F    READ F
    1    3    F    LOAD M (PORT)    
    2    0    0    CLEAR CARRY
    #    TEST PROGRAM 6
    3    0    A    ADD 10         # NOT 5
    4    8    7    JNC 7          # JGE 5
    5    E    7    SET PAGE       # JUMP PROGRAM 6
    6    8    0    JMP ADDR
    #    TEST PROGRAM 5
    7    0    1    ADD 1          # NOT 4
    8    8    B    JNC B          # JGE 4
    9    E    6    SET PAGE       # JUMP PROGRAM 5
    A    8    0    JMP ADDR    
    #    TEST PROGRAM 4    
    B    0    1    ADD 1          # NOT 3
    C    8    F    JNC F          # JGE 3
    D    E    5    SET PAGE       # JUMP PROGRAM 4
    E    8    0    JMP ADDR    
    #    SET NEW PAGE 1
    F    E    1    PAGE 1
    #     TEST PROGRAM 3
    10    0    1    ADD 1         # NOT 2
    11    8    4    JNC 4         # JGE 2
    12    E    4    SET PAGE      # JUMP PROGRAM 3
    13    8    0    JMP ADDR    
    #     TEST PROGRAM 2    
    14    0    1    ADD 1         # NOT 1
    15    8    8    JNC 8         # JGE 1
    16    E    3    SET PAGE      # JUMP PROGRAM 2
    17    8    0    JMP ADDR    
    #     TEST PROGRAM 1    
    18    0    1    ADD 1         # NOT 0
    19    8    C    JNC C         # JGE 0
    1A    E    2    PAGE 2        # JUMP PROGRAM 1
    1B    8    0    ADDR 0
    #     RETURN - NO PROGRAM SELECTED
    1C    E    0    SET PAGE 0
    1D    8    0    JMP ADDR 0
    1E    0    0    NOP
    1F    0    0    NOP
    

    One option is to add a comparator to the ALU and to use one of the flags (i.e. A<B, A=B and A>B) or the compliment to trigger the jump. It would have the advantage of not altering the accummulator.

    ---

    I have been thinking more about this. It would be an efficient and useful to have a "TEST" register but it requires freeing up an op code. XOR being the one to use:

    But rather than doing that I am thinking of an op code to select an alternate op code set.

    PAGE and READ are similar so they could share the same op code, and the PAGE slot used to set alternate op codes:

    • ADC/SBB
    • LOAD/LOAD
    • NAND/NOR
    • XOR/XOR?
    • JNC/JGE?
    • STORE/STORE?
    • READ/PAGE
    • OPCODE?/OPCODE?

    Another option is to use the most significant bit of the PAGE register.

    ADC/SBB, the ADC requires an inverter on the input and on the output for SBB (have to check this). Does JNC become JNB which is JGE? No need for a dedicated comparator?

    AlanX

  • Add with Carry

    agp.cooper12/17/2022 at 00:18 0 comments

    Add with Carry

    To date I have used plain and simple ADD with my DIY CPUs. ADC (i.e. add with carry) is useful for multi byte (nibble) addition as the carry is automatic. The downside is that the humble counter does not work, it skips 0. This can be fixed of course.

    Here is the old counter using ADD:

    20    LOAD 0          ; CLEAR ACC
    A0    STORE MEM[0]    ; SAVE TO MEM[0] 
    AF    STORE MEM[F]    ; OUTPUT
    01    ADD 1           ; INCREMENT ACC
    82    JNC 2           ; JUMP ON NOT CARRY
    00    ADD 0           ; CLEAR CARRY
    82    JNC 2           ; JUMP UNCONDITIONALLY 2

    On on overflow you need to clear the carry so that the next JNC is an unconditional jump. 

    Here is the new counter using ADC:

    00    ADC 0           ; CLEAR CARRY
    20    LOAD 0          ; CLEAR ACC
    A0    STORE MEM[0]    ; SAVE TO MEM[0]
    AF    STORE MEM[F]    ; OUTPUT
    01    ADC 1           ; INCREMENT ACC
    82    JNC 3           ; REPEAT
    00    ADC 0           ; CLEAR CARRY
    82    JNC 2           ; JUMP UNCONDITIONAL 2

     So on overflow you need to both clear the carry and clear the accumulator.

    Converting the ALU from ADD to ADC involves linking the Carry signal from the Control unit to the Carry In on the Adder in the ALU unit:

    For completeness here is the Control unit:

    The carry logic holds the carry (CY) until the next ADC instruction.

    Multi-Nimble Arithmetic

    Before moving on I tested an 8 bit counter using ADC:

    00  ADC 0     ; CLEAR CARRY
    20  LOAD 0    ; CLEAR ACC
    A0  STORE 0   ; CLEAR MEM[0] // LOW NIMBLE
    A1  STORE 1   ; CLEAR MEM[1] // HIGH NIMBLE
    AF  STORE F   ; CLEAR OUTPUT // LOW NIMBLE
    LOOP:
      C0  READ 0  ; PRESET ADDR TO MEM[0]
      30  LOAD 0  ; LOAD LOW NIMBLE
      01  ADC 1   ; INCR LOW NIMBLE
      A0  STORE 0 ; SAVE LOW NIMBLE
      AF  STORE F ; OUTPUT LOW NIMBLE
      C1  READ 1  ; PRESET ADDR TO MEM[1]
      31  LOAD 1  ; LOAD HIGH NIMBLE
      00  ADC 0   ; ADD CARRY TO HIGH NIMBLE
      A1  STORE 1 ; SAVE HIGH NIMBLE
    JUMP LOOP:
    00 ADC 0      ; CLEAR CARRY
    85 JNC 5      ; UNCONDITIONAL JUMP

    The ADC does not specifically need the JNC instruction.

    The above code is 16 bytes long. A pretty strong justification for a 32 (or more) byte PROM system.

    AlanX

  • Reverting Back to H/W CPU V1

    agp.cooper12/15/2022 at 11:22 0 comments

    Reverting Back to H/W CPU V1

    H/W CPU V1 has the paged Program Counter (PC) and most of the control hardware of H/W CPU V2/V3, I just have to delete the ALU tristate buffer, and edit the memory model.

    I will call this version V4:

    If you want to see meaningless "computer lights" flashing, here is the LogiSim animation: 

    The code been run is slightly more complicated than the minimum just to check some other  instructions:

    20    LOAD  0            ; Clear ACC
    AF    Store MEM[F]       ; Output ACC
    01    ADD   1            ; ACC<=ACC+1
    A0    STORE MEM[0]       ; MEM[0]<=ACC 
    C0    READ  0            ; ADDR<=0, preset memory fetch address
    3X    LOAD  MEM[ADDR]    ; ACC<=MEM[ADDR]
    AF    STORE MEM[F]       ; Output ACC
    82    JNC   X2           ; Jump if no carry
    00    ADD   0            ; Clear carry
    81    JNC   X1           ; Unconditional jump
    

    The op codes are:


    Missing Op Codes?

    You may think I am missing op codes such as SHR etc. But no, ADD and NAND are all you need. The rest can be synthesized from ADD and NAND. XOR is included because it needs 14 instructions to read from memory two operands, and to write to memory the resultant. XOR needs 5 instructions to do the same task.

    Still, XOR could be swapped out for another instruction.

    AlanX

  • 32 Byte Diode PROM

    agp.cooper12/15/2022 at 01:50 0 comments

    32 Byte Diode PROM

    I felt that a 16 byte diode PROM was too small. You cannot fit a program with a subroutine in 16 bytes. So I designed a 32 byte diode PROM, but unfortunately I could not get the auto-router to work. SO the answer is two 16 byte daughter boards:

    Here is the schematic:

    One nice thing is that the boards are stackable.

    I will have to put back the page circuity on the mother board.

    ---

    I have sent the PCB off for fabrication. Usually takes 5 or 6 days.

    ---

    I have made a mistake with regard to stacking the boards. You cannot get 20 pin long headers, typically they are 10 pin headers, and you need a gap between them. Oh well!

    I designed a PCB for the 10 long pin headers:

    But I am not getting them made until I use up the previous design.

    AlanX