Homebrew soft core

A project log for nedoPC-5

DIY personal computer built around 32-bit version of RISC-V processor

shaosSHAOS 11/27/2018 at 09:070 Comments

In the last week I tried to create my own RISC-V soft core for this contest, but now contest is over and I was not able to achieve minimum requirements (100% passing RV32I compliance tests and ability to run RTOS Zephyr), but I've got really close - my current version is passing 54 out of 55 tests (through Verilator), including misaligned load/store exceptions and a number of control and status registers with atomic reading/writing (and all of that takes 3.4K LUTs) - see

For now I made a decision that for this particular project exceptions and extra registers are overkill, so I rolled back a little and stayed with straightforward RISC-V implementation (only 2.2K LUTs of iCE40UP5K FPGA plus some BRAMs for 32 registers) that covers most of user level instructions and passing most relevant RV32I tests:

Check         I-ADD-01 ... OK
Check        I-ADDI-01 ... OK
Check         I-AND-01 ... OK
Check        I-ANDI-01 ... OK
Check       I-AUIPC-01 ... OK
Check         I-BEQ-01 ... OK
Check         I-BGE-01 ... OK
Check        I-BGEU-01 ... OK
Check         I-BLT-01 ... OK
Check        I-BLTU-01 ... OK
Check         I-BNE-01 ... OK
Check       I-CSRRC-01 ... FAIL
Check      I-CSRRCI-01 ... FAIL
Check       I-CSRRS-01 ... FAIL
Check      I-CSRRSI-01 ... FAIL
Check       I-CSRRW-01 ... FAIL
Check      I-CSRRWI-01 ... FAIL
Check I-DELAY_SLOTS-01 ... OK
Check      I-EBREAK-01 ... FAIL
Check       I-ECALL-01 ... FAIL
Check   I-ENDIANESS-01 ... OK
Check     I-FENCE.I-01 ... OK
Check             I-IO ... OK
Check         I-JAL-01 ... OK
Check        I-JALR-01 ... OK
Check          I-LB-01 ... OK
Check         I-LBU-01 ... OK
Check          I-LH-01 ... OK
Check         I-LHU-01 ... OK
Check         I-LUI-01 ... OK
Check          I-LW-01 ... OK
Check         I-NOP-01 ... OK
Check          I-OR-01 ... OK
Check         I-ORI-01 ... OK
Check     I-RF_size-01 ... OK
Check    I-RF_width-01 ... OK
Check       I-RF_x0-01 ... OK
Check          I-SB-01 ... OK
Check          I-SH-01 ... OK
Check         I-SLL-01 ... OK
Check        I-SLLI-01 ... OK
Check         I-SLT-01 ... OK
Check        I-SLTI-01 ... OK
Check       I-SLTIU-01 ... OK
Check        I-SLTU-01 ... OK
Check         I-SRA-01 ... OK
Check        I-SRAI-01 ... OK
Check         I-SRL-01 ... OK
Check        I-SRLI-01 ... OK
Check         I-SUB-01 ... OK
Check          I-SW-01 ... OK
Check         I-XOR-01 ... OK
Check        I-XORI-01 ... OK
FAIL: 10/55

About actual design - my idea was to get a standalone 32-bit CPU kind of thing (small FPGA board with flashed in soft core) that will use EXTERNAL memory with 8-bit data bus to look like some kind of RETRO, but with GCC support. 8-bit data bus means that every 32-bit instruction will be loaded at least in 4 steps and I figured out how to decode and execute those instructions in the same time with loading. I called this design Retro-V and it's got version number 1.0.0. Now more details.

Retro-V soft core has 2-stage pipeline ( or more precisely 1.5-stage pipeline ; ) with 4 cycles per stage, so on average every instruction takes 4 cycles (with 40 MHz clock it will be 10 millions instructions per sec max):

As you can see Retro-V core reads from register file in cycles 3 and 4 and write to register file in cycles 1 and 2 (the same as 5 and 6 for 2nd stage of pipeline). The fact that reading and writing are always performed in different moments in time allows us to implement register file by block memory inside FPGA. Also it is obvious that this design doesn't have hazard problem if the same register is written in one instruction and we have read in the next because instruction reads 1st argument in cycle 3 and write back from previous instruction is already happened in previous cycle. In case of jump (JALJALR or BRANCH instructions) next instruction from pipeline alread performed 1st cycle, so it stops right there and next cycle is 1st one from new address effectively re-initing the pipeline (so branch penalty is only 1 cycle). In case of memory access (LOAD or STORE instructions) state machine stays in cycle 4 for a while (to load or store bytes from/to memory one by one wasting from 1 to 5 extra cycles) and next instruction in pipeline is kind of frozen between cycle 1 and cycle 2 in the same time.

If we count only "visible" cycles (from the beginning of one instructions to the beginning of the next one) then:

Address bus is 16-bit wide (eventhough internally it's still 32-bit), so external memory could be up to 64KB (technically speaking it's configurable, so if FPGA has extra signal lines then address bus could be wider - up to all possible 32 bits).