12/11/2015 at 23:57 •
The ISA for the CPU is pretty low density. With a word size of 32 bits, there's a fair amount of room to do everything... except for absolute addresses and some large constants. As I've experimented with the ISA, I've left gaps, extra bits, etc and it's a bit messy. I'm starting to clean up and make things a little more orthogonal now, with the idea that this will also allow the CPU core to become more efficient.
The ISA is defined in my Opcode worksheet, and I try to make sure this is up to date as I make changes to the core and the assembler.
After some review, it turns out that it's a little more compact than I originally thought. Here's a summary of the types of opcodes. Most of them share a common structure.
REG 00oo oooo oxxx aaaa bbbb cccc xxxx xxxx REGIND 01oo oooo vvvv aaaa bbbb vvvv vvvv vvvv IMM 10oo oooo vvvv aaaa xxxx vvvv vvvv vvvv DIR (32 bit) 11oo oooo vvvv aaaa bbbb vvvv vvvv vvvv DIR (64 bit) 11oo oooo xxxx aaaa xxxx xxxx xxxx xxxx AAAA AAAA AAAA AAAA AAAA AAAA AAAA AAAA
The key for the above is:
- First two bits are the form value. The different forms used to have very different structure, but now they have kind of merged together quite a bit and so in many ways this is not just an extension of the opcode number.
- o = opcode number, unique within the form.
- a, b, c = the 4 bit register identifiers. The specific opcode may not use all of these.
- v = a signed offset of some kind. Used for PC offset addressing, register offset addressing, as well as for constant values for various functions.
- A = the 32 bits of the second word. In most cases it is used as an absolute address, but for the ldi instruction it is a 32 bit constant.
- x = unused bits. Maybe an opportunity to build more compact forms. For example, instead of having the ALU command be part of the opcode number, maybe I could have a single opcode number for the ALU (or floating point, etc), and use some of these bits to indicate the type of operation. Unfortunately, these bits aren't always available in all addressing modes, so it's not always an easy choice.
12/09/2015 at 00:34 •
This is in the project, but it's helpful to have an external guide as well. I should build a small breakout board for most of this stuff to avoid a bunch of wire harnesses.
Function GPIO Pin Serial1 TX GPIO1 G16 PS/2 Mouse - Clock GPIO1 F17 PS/2 Mouse - Data GPIO1 D18 LED Matrix - R0 GPIO1 F18 LED Matrix - G0 GPIO1 E14 LED Matrix - B0 GPIO1 E15 LED Matrix - R1 GPIO1 F15 LED Matrix - G1 GPIO1 G16 LED Matrix - B1 GPIO1 F12 LED Matrix - A GPIO1 F13 LED Matrix - B GPIO1 C14 LED Matrix - C GPIO1 D14 LED Matrix - OE GPIO1 D15 LED Matrix - STB GPIO1 D16 LED Matrix - Clock GPIO1 C17 PS/2 Keyboard - Clock GPIO1 C25 PS/2 Keyboard - Data GPIO1 C26 GPIO1 D28 GPIO1 D25 GPIO1 F20 GPIO1 E21 GPIO1 F23 SPI - RTC SCLK GPIO1 G20 SPI - RTC MISO GPIO1 F22 SPI - RTC SS GPIO1 G22 SPI - RTC MOSI GPIO1 G24 RST OUT GPIO1 G23 SPI - Codec XDCS GPIO1 A25 SPI - Codec CS GPIO1 A26 Codec - Data Request Intr GPIO1 A19 GPIO1 A28 GPIO1 A27 GPIO1 B30 SPI - General SCLK GPIO1 AG28 SPI - General MOSI GPIO1 AG26 SPI - General MISO GPIO1 Y21
12/09/2015 at 00:26 •
I've gone through a few iterations on the design, both as part of the trial and error learning process, and as I wanted to add new options. Here are the current key features:
- 32-bit address bus (byte addressable)
- 32-bit data bus
- Big endian (actually has little endian support in both GNU utils and CPU core, but I really haven't tested much)
- 32-bit opcodes, with an optional 32-bit arg
- Addressing modes: inherent, direct, PC indexed, register indexed
- Single precision floating point
At the moment, the majority of the opcodes have source and destination as registers and simple load and store operations. There are 16 general purpose 32-bit registers. At the moment, %14 is used as the frame pointer, %15 is the stack pointer, and %13 and sometimes %12 are used for return values for function calls. All arguments are pushed onto the stack, though I may change that to registers eventually. We have 4-byte alignment that needs to take place for memory fetch and store. The CPU supports byte addressable memory, and so there are byte enable signals and bytelane flow.
Unlike a few hardcore folks, I'm implementing my design in an FPGA. I believe it will allow me to focus on the parts of the design I find most interesting, and allow me to skip the "where is that broken wire?" drudgery. I started the work on a Terasic DE1 board, and while the CPU itself can still run on that system (as well as something far smaller most likely), I've moved to the Terasic DE2i-150 board and the Altera Cyclone IV family.
Exception handling is quite primitive at this point. I have a simple vector block defined for interrupts, and currently only the reset address is defined. There is support to remap the base of the vector table, and so I anticipate remapping this into RAM in most system configs. No other interrupts are defined right now, but the main trick left is to implement stack push before jumping into the ISR, and defining a few new opcodes to enable and disable interrupts.
A protection bit/supervisor mode is defined, with a separate stack pointer. It is effectively unused at the moment, but I can build on the stub if I need it.
There is no memory protection at the moment. In theory, an external exception from an MMU could do this, but I haven't spent time building out those features yet.
Caching and Pipelining
No caching or pipelining at the moment. Soon though.
I currently have IP cores for the following devices:
- SDRAM (used for general purpose memory)
- SSRAM (used as frame buffer memory)
- Flash (unused at the moment)
- Async serial (used for the console, but could add as many as needed)
- SPI master (several busses)
- RTC module
- Micro SD card
- LCD touchscreen
- Dual A/D for joystick
- GPIO (switch and button interfacing)
- 640x480x256 VGA
On my list to do:
- PS/2 keyboard (I have the core written, just not integrated)
- Ethernet PHY that's part of the DE2i-150
- External MIDI interface (this is really a version of async serial for basic audio out)
I also am building some true custom hardware. One of my tests of the floating point core was an implementation of a Mandelbrot set renderer. The algorithm is a classic parallel algorithm however, and should be able to be implemented as a hardware module. You feed the matrices in, and after a short time, you read out the results. Should allow for a very fast block rendering method. On my list.