Polaris CPU Almost at First Milestone

Polaris CPU now supports all RV64I instructions, with the following exceptions:

RDTIME, RDINSTRET, and friends, because these are actually specializations of the CSRRS instruction. I don't yet have CSR support implemented.
ECALL and EBREAK, because I don't yet have traps implemented. I need access to CSRs in order to configure and respond to traps.
WFI and MRET, because I again lack CSR support.
FENCE.VM, because this doesn't make sense on a CPU lacking an MMU.

As long as you do not need interrupts, and you can guarantee you'll never execute an illegal instruction, and that all memory accesses will terminate, then you can theoretically use Polaris now to run real RISC-V user-level code.

The CPU synthesizes, and is small enough to fit on an iCE40HX4K part (!!), albeit without Wishbone interface to memory or I/O.

I know I expected to overlap instruction fetch with as many execution cycles as I could, thus comparing the design to the 6502. However, in practice, it was easier for debugging purposes to not do this: fetching and execution occur on distinct clock cycles. For this reason, performance is even slower still than what I had originally wanted; however, given fast enough memory, we're still looking at a performance that compares well with a Z80. Sorry about that; I figured it'd be better to get something running first, then optimize the microarchitecture later.

As it is, though, the CPU weighs in at under 2400 logic cells, making it small enough for synthesis onto an iCE40HX-4K part. As far as I'm aware, this is the world's smallest 64-bit RISC-V processor.

Native Buses

The Polaris native bus is a distortion of the Wishbone bus, optimized for the needs of the RISC execution engine. The address and data buses as well as ACK_I, ERR_I, etc. all work the same; where things differ are in the control buses.

Instead of SEL_O to select which byte-lanes to place data on, it uses SIZ_O instead. The SIZ_O signal uses two bits to indicate how big the data transfer is for this cycle:

00 = byte (8-bits)
01 = half-word (16-bits)
10 = word (32-bits)
11 = double-word (64-bits)

When reading a value smaller than 64-bits, external logic is expected to properly align and sign-extend the value read. For example, all bytes read must appear on DAT_I[7:0], all half-words must appear on DAT_I[15:0], and all words on DAT_I[31:0]. The upper bits (DAT_I[63:8], DAT_I[63:16], and DAT_I[63:32], respectively) must represent the proper sign- or zero-extension, depending on the state of another bus control signal: SIGNED_O. Note that this signal is defined only when WE_O is negated. For this to work at all, obviously, low address bits are exposed as well. It's entirely possible to perform a 64-bit fetch on an odd address; external logic must raise ERR_I if it doesn't want to bother with handling non-aligned accesses.

Put another way, the RISC core only reads and writes whole 64-bit words, even if you're executing a LB or SH instruction. It's up to the external bus bridge to make those instructions behave appropriately. As you can imagine, this bus is not intended for general consumption. Eventually, I'll include a trusted Wishbone bridge with the core so users don't have to deal with this ultra-low-level stuff.

Polaris also implements two buses: a reduced bus for instruction fetch, and a full bus for general data I/O. They are different sizes as well; the instruction fetch bus is 32-bits wide, while the data I/O bus requires 64-bits. Both support 64-bit address spaces, though. This lets you implement a Harvard architecture design if you wanted; however, you'll need a mixed-size dual-port Wishbone bus arbiter if you want to interface the CPU to a common, Von Neumann bus architecture.

Next Steps

CSRs. My immediate problem is figuring out how I want to handle CSR support. According to the privilege specification, I must support no fewer than 29 (!!) M-mode CSRs. Most are read-only and constant, some are read-write. However, they're going to be sparsely placed in a CSR address space of 4096 registers, which will consume a lot more FPGA logic cells than I'd like. I estimate another 300 or so DFFs just for storage overhead alone.

All of the CSR accessors are read-modify-write instructions, and they have inconvenient edge cases involving the X0 register when used as either a source or destination. This adds a fair amount of complexity to the instruction decoding logic. Thankfully, none of the M-mode CSRs have read- or write-triggered behaviors, so I might be able to get away with avoiding this edge-case logic.

I was hoping to have CSR support finished today, but this turned out to be a much bigger problem than I had anticipated.

Interrupts. Once CSR support is in place, then I need to add support for illegal instruction traps and for external interrupts. These ought to be fairly easy to get working once I've completed the heavy lifting for the CSRs.

Bus Bridges. After I finish support for interrupts, the next step is to couple the CPU to actual memory and get it to run a real program on hopefully real hardware. To do this, I'll need a Wishbone bus bridge as discussed above. A mixed size, quad-port bridge will be needed to make anything even resembling the functionality of a Kestrel-2: a 32-bit I-bus and a 64-bit D-bus to the CPU, a 16-bit IO-bus to talk to devices like KIA and GPIA, and a 16-bit X-bus to talk to eXternal asynchronous RAM.

Documentation. I need to document this core thoroughly, along with any support cores, since I've decided to submit the microarchitecture for a poster session at the 5th RISC-V Workshop in November. I'm not sure if I'll make it in; however, if I do, I need to be ready with at least a preliminary data sheet and a poster with its architectural block diagram.

Polaris CPU is Alive AT LAST!

Plan for Supporting CSRs

Discussions

Become a Hackaday.io Member