Bexkat1 CPU

A custom 32-bit CPU core with GCC toolchain

Similar projects worth following
This is a synthesizable 32-bit CPU core written in Verilog. I've also ported GCC, binutils, and newlib to produce machine code for this system. In addition to the CPU core, the project has a pretty wide selection of peripheral cores that I've developed or adapted from other open designs. The current project is configured for the Terasic DE2i-150 board and MAX10-lite (in progress), but should be synthesizable for many of the smaller Cyclone boards with appropriate adjustments.

Some core features:

32-bit data and address buses
16 general-purpose 32-bit registers
Absolute, direct, and relative addressing modes
Single-precision floating point support in hardware
Interrupt/exception handling
Wishbone compatibility
Supervisor/protected mode
(Incomplete) Port to MAX10-lite board
  • CPU Architecture Video

    Matt Stock02/13/2017 at 02:20 0 comments

    Here's the next video, which goes into more detail about the CPU design as well as walking though the state transitions for a simple add operation:

  • System Overview

    Matt Stock02/13/2017 at 00:36 0 comments

    I'm trying to get more documentation in place, in the form of some youtube videos. This one will give you a sense of the overall system architecture, and how the CPU interacts with other devices. Let me know if you have any questions or comments.

  • Supervisor mode

    Matt Stock01/02/2017 at 00:09 0 comments

    I've been working on fleshing out a supervisor mode with a goal towards being able to do multiprocessing in the unix way. The basic work is complete (protected opcodes, hardware and software interrupts that execute in supervisor mode, etc), but I'm working on the nuance now. In particular, I'm testing different ways to pass information from user space into kernel space. Since my current method of parameter passing is solely via the stack and the stack pointer swap out as part of the move to supervisor mode (supervisor stack pointer), this is mostly an exercise in C semantics now. My exception handler pushes the original stack pointer onto the supervisor stack before jumping to the exception handler, and so now I'm just working though the most sane way to reference that element (which isn't an argument to the interrupt handler!), and then use it as an index to pull out the other info on the user stack I care about.

  • ISA Rework

    Matt Stock01/03/2016 at 20:36 1 comment

    My first cut at an ISA was focused on getting the functions right, and leaving room to add more options later. Now that I've got most of the functionality I want, I can go back and look at ways to reduce the complexity, with a goal of improving performance.

    Read more »

  • Testing Part 2

    Matt Stock01/03/2016 at 20:25 0 comments

    As a mentioned earlier, I've been looking at pushing to the next round of project improvements, and that meant a better testing process. I tried using a "control" CPU, which would be compared to the output of the CPU under test, however that assumes that the number of clock cycles required for each operation wouldn't change. While useful in a few cases, a lot of the changes I'm interested in involve timing, and so that wouldn't work.

    I decided instead to make two ROM modes. The default one runs the monitor code, which allows for basic memory interaction as well as parsing of ELF binaries on the microSD to bootstrap other programs. The new ROM module is a set of POST routines written to progressively test the CPU as well as IO functions to check for functional regressions. This method has already paid for itself, since I found a small bug in a couple of the floating point opcodes.

    The method of test is fairly simple. I need to assume that some basic operations work, otherwise it won't even run the POST, which means immediate load of a register, immediate add, integer compare, and branch if not equal. The first tests evaluate register operations, the ALU and FPU. Then we test stack operations, branch tests, and all of the load and store operations. For the math and branch operations, we can compute the expected result and store them in the code, and generate an error when the result isn't as expected.

    In addition to the basic CPU tests, I'm also implementing a set of memory tests. This will allow me to better test the cache module, which I'll describe in the Doom project update.

  • Regression Testing

    Matt Stock12/12/2015 at 15:01 1 comment

    So far in these projects, I've been able to build iteratively and not run into too many nasty bugs. There are many layers of abstraction though (libraries, compiler, assembler, machine, CPU), and so when a bug does crop up, it can be really challenging to find.

    Most recently, I found that I had misunderstood some subtleties of transferring data between registers. The fix was simple - an opcode that zero fills the upper bits when you make a copy of an object smaller than the register size. But how this manifested itself was that sometimes printf() printed out the wrong character when printing a number. Eventually, I was able to isolate this to 33 % 10 resulting in 9 (not 3), which meant I didn't have to debug libc. After further narrowing the issue down to making a very small test case, I was able to see why the CPU was generating the incorrect value. That probably took me 4 days to debug.

    As I plan on making some radical changes that could break things, I need to consider how best to avoid introducing more of these kinds of issues, and if it happens, how to quickly determine the issue.

    Read more »

  • Instruction Set Architecture

    Matt Stock12/11/2015 at 23:57 1 comment

    The ISA for the CPU is pretty low density. With a word size of 32 bits, there's a fair amount of room to do everything... except for absolute addresses and some large constants. As I've experimented with the ISA, I've left gaps, extra bits, etc and it's a bit messy. I'm starting to clean up and make things a little more orthogonal now, with the idea that this will also allow the CPU core to become more efficient.

    The ISA is defined in my Opcode worksheet, and I try to make sure this is up to date as I make changes to the core and the assembler.

    Read more »

  • FPGA Pin Assignments

    Matt Stock12/09/2015 at 00:34 1 comment

    This is in the project, but it's helpful to have an external guide as well. I should build a small breakout board for most of this stuff to avoid a bunch of wire harnesses.

    Read more »

  • Architecture notes

    Matt Stock12/09/2015 at 00:26 0 comments

    CPU Design

    I've gone through a few iterations on the design, both as part of the trial and error learning process, and as I wanted to add new options. Here are the current key features:

    • 32-bit address bus (byte addressable)
    • 32-bit data bus
    • Big endian (actually has little endian support in both GNU utils and CPU core, but I really haven't tested much)
    • 32-bit opcodes, with an optional 32-bit arg
    • Addressing modes: inherent, direct, PC indexed, register indexed
    • Single precision floating point

    At the moment, the majority of the opcodes have source and destination as registers and simple load and store operations. There are 16 general purpose 32-bit registers. At the moment, %14 is used as the frame pointer, %15 is the stack pointer, and %13 and sometimes %12 are used for return values for function calls. All arguments are pushed onto the stack, though I may change that to registers eventually. We have 4-byte alignment that needs to take place for memory fetch and store. The CPU supports byte addressable memory, and so there are byte enable signals and bytelane flow.

    Read more »

View all 9 project logs

  • 1
    Step 1

    Clone the three source repos.

  • 2
    Step 2
    mkdir bekkat1
    cd bexkat1
    mkdir gcc binutils newlib
  • 3
    Step 3
    cd binutils
    $(BINUTILSREPOPATH)/configure --target=bexkat1-elf
    sudo make install

View all 6 instructions

Enjoy this project?



edmund.humenberger wrote 07/06/2016 at 09:39 point

Would be interesting to know if your Verilog project would run on  FPGA hardware.  Has 8 kLUT and max of 1 MBYTE of SRAM  supported by open source FPGA toolchain Yosys and Arachne PnR.  Toolchain even running on RaspberryPi.

  Are you sure? yes | no

Matt Stock wrote 07/06/2016 at 13:23 point

I'll take a look. The open toolchain has some appeal. The main concern I would have is the use of hardware multipliers in the current logic. I'd need to see if the Lattice unit has something similar and with sufficient quantity.

My current plan is to possibly replace GCC with LLVM/clang, and to see what it would take to create an 8-bit variant of the CPU.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/09/2015 at 02:31 point

Please write and publish more documentation :-)

  Are you sure? yes | no

Matt Stock wrote 12/09/2015 at 03:10 point

Yes, I'll be adding more over the next few days.  I also have a companion project I'll be adding to demonstrate the Doom port I made to this architecture.

  Are you sure? yes | no

Yann Guidon / YGDES wrote 12/09/2015 at 03:16 point

Yay ! Welcome to the DIY CPU club :-)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates