Something about this project
Arise comes form a complex story which started fours years ago when I decided to design my own softcore. Recently renamed as "arise-v2r5", I believe it is misleading, giving the wrong idea or impression, like if it was the fifth revision of the second attempt ... while ... its (git) history goes longer and it comes on the legacy of three predecessors with a lot of commit each. They were (git) branches of experimentation that I explored and then abandoned; just a few ideas has survived and they are still under development, and I am still changing the ISA as well as the implementation.
One of the survived idea is: hardware context, and now I know why in Saint Sun/Oracle ... they didn't want to implement the feature in their SPARC: it's damn resource consuming since it eats a lot of silicon area, and the propagation delays becomes critical when you have a clock speed above half GigaHz.
Arise-v2r5 is physically implemented on a little Spartan 6 LX9 FPGA with a 33Mhz oscillator and a PLL which internally triplicate the clock at 99Mhz in order to handle the external SDRAM, therefore I can relax my mind about the Ghz problem: I will never have to solve it since I will never buy a ~Keuro fpga :DIn the last legacy branch along the implementation I came into the need of accessing three registers in write mode at the same, which makes the register file TOO complex!
The solution to this was: resource replication, a trick used in superscalar processors. It goes too damn complex and consumes too much area.
I abandoned it quite immediately, since I can simplify the datapath with a compromise which seems to allow the hardware-context feature without coming into the need of accessing three registers in write mode at the same time.
However, there is a price to pay: Arise-v2r5 now needs 9 clock edges to complete from S0:fetch to S8:writeback(1) while its predecessors needed just 5 clock edges. But I added stack instructions (push, pop), and more interesting features
(the picture shows a NVRAMWING board which I made for my fpga, it comes with a dual NVRAM asynchronous 120ns devices)
(1) well, arithmetic instructions like MUL32, DIV32, MAD32, takes up to 40 clock edges to complete, IO instructions like mem_load, mem_store, mem_atomic_tas, can go longer than expected if the external ram is a slow asynchronous device. Arise also supports special COP like DSP-(fixedpoint)-engine, therefore instructions like CORDIC64 or BKM64 can consume up to 75 clock edges processing 64bit data.
The first picture shows the state flow, as well as instruction format and cpu-context.
In the second picture, you can see the text editor I am currently focused on. It will also be used as code-viewer, following the assembly code (like gdb + ddd, but … it will be only assembly code)
Pipelined vs Non-Pipelined
There two basic approaches through which an LC3 can be designed: pipelined and non-pipelined. Pipelined approach means that the RISC processor simulates multiple process’ in one single cycle. The data is processed one after another as in a pipeline. On the other hand, non-pipelined approach waits for the entire address to complete all four stages of the LC3, before the next address is fetched and decoded.
- Non-pipelined uses single instructions. This reduces branch delay as well as serial instructions delay.
- A non-pipelined processor has a definite instruction set. Its output can be predicted to a certain degree. On the other hand Pipelined processors output varies form program to program.
- The output of the Fetch module depends upon the Program counter to be updated in the Writeback. However for pipelined instructions all the modules are executed simultaneously. Therefore, the program might behave incorrectly.