16-bit 'modern' homebrew CPU

DME is 16-bit custom CPU complete with a full software stack and a multi tasking, multi user operating system.

Similar projects worth following
DME CPU is a 16-bit processor realised in an FPGA. It is fairly complete with user and supervisor modes, paging and interrupts. It currently has access to 512kb of total memory (amount of SRAM on my FPGA board). Each process has up to 64kb of addressing, broken down into 32 pages (2K).

The project consists of:
- the CPU realised in Verilog
- a micro-code assembler
- an assembler
- a retargetted c compiler (LCC)
- a gate-level(ish) simulator
- a functional simulator
- a pre-emptive multi tasking OS (based on XV6)
- a Quartus II bdf file that ties my implementation to my FPGA board

I run the CPU on a Terasic Cyclone V GX Starter Kit. It's not the best board for this project (when I ordered it, I had no idea what to look for) . It gives DME access to a serial port, an SD Card reader that serves as its 'HD' and lots of lights and switches.

Computers have always fascinated me. I programmed in various languages when I was younger for fun. At work, people come to me with computer related problems and issues (alas). It was sometime last year, that one of my colleagues asked me in a casual way 'what is a kernel anyway'. I did my best to give him a (partly made up) explanation. The fact was, I didn't fully know the answer myself (I think he saw through it).

So when I got home, I started reading up on kernels. Still being interested in programming, soon after I decided I would program one, just for the heck of it. I started on one (learning c in the process), using the BOCHS simulater and learned about paging and context switching. But this site is not about that project.

As I googled terms such as stack frame and trap handler I hit Bill Buzbees website ( This guy didn't write an OS, he designed a whole CPU! This was one (or several!) levels deeper down the rabbit hole. I found it very interesting, but couldn't conceive of doing anything similar. I kept reading up on CPU designs and the history of computers.

This led me to the book 'The elements of computing systems (Nisan & Schocken)'. As soon as I read what the book was about I ordered it online. Then, because I couldn't wait, found a pdf version and got started right away (I still received the 'legal' copy several days later). This brilliant book guided me to building, and more importantly, understanding a very simple CPU. I could now see how simple 1's and 0's got combined into more and more complex building blocks until you had a working CPU. As I worked through the chapters a voice in my mind started saying: 'now you could build that cpu, a real one'.

The final piece of the puzzle fell in place when I revisited Bill Buzbees website and read that he had plans to build another CPU. But instead of building it out of actual hardware components (coolest way by far), this time he would use an FPGA. I had never heard of FPGAs. But once I read up, I realized that if I used an FPGA, building a CPU might just be within the realm of the possible for me.

Thus started DME CPU.


DME's Instruction Set Architecture

Adobe Portable Document Format - 29.64 kB - 10/03/2017 at 19:59


  • This is DME CPU's ISA

    ErwinM10/03/2017 at 19:38 0 comments

    This is DME's ISA:

    It's heavily inspired (copied?) from Bill Buzbee who designed an ISA for his planned Magic1.6 project. When I started my project I had no idea what to look for in an ISA, so I took his concept as a starting point. I took the load/store and the arithmetic instructions pretty much as I understood them. The 'architecture specific'-instructions (dealing with page tables and interrupts) I added myself. I also deviated from the pure RISC pattern by adding PUSH and POP instructions. As I was doing a design with a stack, these were just too good not to have and not too hard to implement.

    When I started my decoder in Verilog the plan was to not use microcode. At the time, I only had about half the instructions finalized and was implementing them in the decoder with 'case' and 'if-then' clauses in Verilog. Two reasons made me switch to using a microcode approach:

    1. It was too 'programmy' for my liking

    I started this project because I was intrigued by how simple logic blocks turn into a full blown CPU. For this reason I have tried to keep my Verilog as simple as possible. Going for a "I should be able to implement it in hardware"-feeling. The complex decoding logic I was typing into Verilog felt too much like programming and not like I was designing anything to do with hardware. If I'd have to do this in hardware I would have to move to a microcode approach.

    2. Microcode is a lot easier to change

    In the early days of my CPU my ISA changed a lot. As I wrote the assembler and later the compiler I constantly ran into instructions that couldn't target a certain register or instructions I thought would be handy, but which the compiler in practice would never use. Being able to change the microcode just be editing a text file has been such a good solution to deal with all these changes.

    So I ended up with a microcoded decoder. The decoder uses the OPCODE number as an index into a table of 48bit microcode. The bits in the microcode drive the control lines in the CPU. The microword looks like this:

    It uses 46 out of 48 bits. I could get that down by quite some bits if required as a lot of duplication has crept in over the time. But there is no real need to do so: the FPGA has plenty of room and it's not really interesting to me at this time.

    Looking at the ISA today I could see how, with some sorting and clever grouping it could probably be realised in a less complex set of Verilog statements without needing any microcode. Still, I think the microcode-approach has worked out well. It was something I could definitely see myself implementing in hardware, it is a breeze to change and, this being an FPGA, I have a suspicion that at the LUT level both approaches come down to pretty much the same thing.

  • DME is flawed

    ErwinM09/24/2017 at 13:53 0 comments

    The trap logic 

    When I started DME, I decided I'd do a RISC kind of architecture (actually Bill Buzbee decided it for me, as I took his Magic1.6 ISA as a starting point). So for the trap mechanism, I opted for a RISC like approach as well. DME has two sets of registers: USER and IRQ. User programs run in the user registers until a trap happens, the cpu then immediately switches register banks and continuous operating using the IRQ registers. The upside of this, is that during most traps the OS does not need to spend time saving user registers as they are preserved and can just be switched back to once the OS returns to user space.

    The flaw

    DME does not service IRQs when it is in its IRQ mode. Initially, I thought that would be no problem: it would finish servicing the current IRQ and then after it returned to user space it would immediately trap again and service the missed IRQ (it does get saved). This thinking is still mostly valid except for: syscalls. As I started fleshing out the OS it occurred to me that often a sycall, for example 'read', would trigger DME into IRQ mode only to kick-off some operation, f.e. read from the SD Card, and then go to sleep (in IRQ-mode!) waiting for the operation to finish. This 'finish' is signaled by an IRQ! Now, if every process is in IRQ mode waiting for an IRQ, the system will deadlock as the IRQ will never get serviced.

    Possible workarounds

    1. Don't sleep waiting for IRQ, wait(poll) for finish
    I could simply make a process wait for whatever operation needs to happen (keyboard input, reading a disk block) instead of putting it to sleep. In practice, since DME will never have multiple users working on it at the same time, the user experience of both approaches would probably be much the same. However, it means I don't get to play with IRQs as much in my OS. Just as important, this does not work for the timer IRQ.

    2. I could tell my scheduler that 'if there aren't any processes to run' (all are sleeping), it should run each irq handler. This works, and is actually what my OS does now (simple to implement). However, again this does not work for the timer_irq: i cannot run that each time there is nothing to run (it would accomplish nothing, as there is nothing to run!). It means timer_irq is only triggered when a process returns to user space.

    3. I could have a special user space process that always operates in user space and never does a syscall and thus is always runnable to catch irqs. Although this works, its an expensive solution. The process would eat CPU slices just to catch irqs. No good.

    A better design

    The real solution is of course to add another bank of registers: user -> system -> irq. User would always trap to system. When operating in system we could then still catch (real, hardware) irqs which would push us into irq mode. Only when in irq mode are all irqs disabled.

    DME is defined in verilog and the addition of a third register bank would not be too complex I think (ha!). However, I've decided against tackling that task atm as I want to keep the momentum going. I may decide to do it later, but lets first see how big of a problem this turns out to be. Meanwhile, i'm going with workaround (2).

  • Dhrystone scores are in!

    ErwinM09/24/2017 at 13:38 1 comment

    Yesterday and today I have been trying to get the Dhrystone benchmark program running on DME. It has turned out to be a typical homebrew cpu/os experience that makes this project so interesting to me.

    After I downloaded the c code i tried to compile it. I got an error about missing malloc (which I have not implemented), but after replacing these with two static defines the code compiled, assembled and linked without errors. I tried to run it directly (without the OS): no go, the program gets stuck. Since I am running it without the OS i cannot write to the screen so I decide that debugging this way is a no go.

    Next, I try to run it through my (work in progress) OS. It loads the code, runs it and....gets stuck (even at this point in the project, my first reaction is still 'yeah, well can't expect random c code to work like that). After some googling I tracked down a newer version of Dhrystone (version 2.1, was working with 1.1 before). This version is much nicer for two reasons: it has more verbose output telling the user what SHOULD be happening. And secondly, it's source is annotated to explain even more.

    I add some debugging printf statements and run the code again. It is only then I noticed that *nothing* is actually getting printed to the terminal (this is not always immediatley obvious as my OS is still very noisy with lots of debugging info being printed). I poke at the problem a bit but soon give up and call it a day.

    When I got back today I decided there should be no reason printf should not be working, it has been working for the last few weeks. I put in a halt() just after the first printf, run it and...nothing. The Dhrystone code looks like an ancient dialect of c to me and has fairly long and complex strings in the printf statements. I decide to comment out most of them. Run the code and now the single printf statement is being printed to the screen. Aha! so now I only need to find the offending string and then figure out how to fix it! As I am commenting in 5 or so statements at a time and testing the code, another realisation dawns on me: each added statement makes the code longer, is all the code getting loaded in by the os? A quick look with the simulator reveals that the answer is no it's not. The last part of the program (where the strings are), does not get loaded.

    My OS uses a unix-like inode scheme for a FS (based of xv6) and it uses direct blocks and indirect blocks to load files. Indirect blocks are used for larger files. Turns out the Dhrystone program is the first program that utilised my indirect block code and it didn't work. After some probing at the code, I realised a 'greater than' should be replaced with a 'greater than or equal'. Solved! the Dhrystone program is now being loaded in its entirety and each printf statement is working as it should!

    Except that the program still gets stuck. Lucky for me, it gets stuck in a VERY basic routine (Proc_5), which only assigns a character to a pre-declared memory location. And this is not working. I isolate the code and sure enough it does not work: it seems as my asm/linker is not correctly assigning the addresses for the pre-defined char location. So I dive back into the assembler. Now, this thing has been working for the last couple of months, so that code is only a little familiar to me. I put in some breakpoints (yay for debugging scripting languages) and find out that no memory is being reserved at all for statements with a size of 1 byte! The assembler essentially skips them. Few lines of code later and the assembler now handles odd byte sizes (any odd number of bytes would have tanked) correctly in its bss section.

    I recompile Dhrystone and presto, it no longer gets stuck. Operation succes. Dhrystone 1.1 runs fine as well with the assembler fix. So in trying to get a piece of c code to run, I had to fix a bug in my OS, fix a bug in my assembler/linker and had to implement a missing instruction (unsigned greater than or equal) in my compiler, that I...

    Read more »

View all 3 project logs

Enjoy this project?



f4hdk wrote 10/01/2017 at 04:55 point

I like the project. Retargeting a compiler to a new CPU architecture is not an easy task, at least for me. I would really like to know how to do it.

And the OS seems also quite complete. Congrats!

Have you seen my FPGA computer project? It is much less complex and performant than yours, but it has a VGA graphic output.

  Are you sure? yes | no

ErwinM wrote 10/03/2017 at 16:57 point

retargetting LCC is hard at first, then suddenly you are done. At least that is how it was for me. Although I must admit when I started reading the book I hardly understood anything at first. 

I really like your project! It involves actual wires which impresses me immediately. Also, I would also like to have played with VGA, but my board didn't come with a connector. 

You did all the design yourself. Did you let yourself at least be inspired by existing designs? 

  Are you sure? yes | no

f4hdk wrote 10/03/2017 at 18:55 point

Thanks for the “like”. For my A2Z CPU, my goal was to invent all the design from scratch, my own way. The design mainly comes from my own thoughts. I took the project like a high-school exam/problem: “you have to build a computer from scratch, you have 2 years, it starts now”… So I have implemented my own ideas, and I had rarely a look at existing architectures. The result is of course not optimal, but it works fine!
My project started in my mind exactly like yours: after the discovery of the 2 TTL-CPU projects Magic-1 and Big-Mess-O-wires, and the webring. I was immediately fascinated.

About retargeting a compiler, maybe I will try that later. Sure, I will read the book that you mention. But my list of future projects is already full for many years. 

About VGA with your FPGA-board, this is clearly feasible! You just need to add a few components on an additional board (veroboard) connected to the GPIO port. If you know how to solder, I can explain how to do it with simple schematics. Or you can buy a VGA-board like this and connect it with jumper wires.

  Are you sure? yes | no

Dr. Cockroach wrote 09/28/2017 at 11:02 point

Wow :-)

  Are you sure? yes | no

BigEd wrote 09/27/2017 at 17:11 point

You ported LCC - excellent step up for homebrew CPU.  About "Elements of computer science" - do you mean the 1973 book (Estes and Ellis) or the 1996 one (Jha, P. K. Mahanti, Sahoo)?

  Are you sure? yes | no

ErwinM wrote 09/28/2017 at 07:49 point

Turns out, I was actually talking about "The Elements of Computing Systems" by Noam Nisan and Shimon Schocken. It's a teach-yourself-by-doing book. I highly recommend it.

  Are you sure? yes | no

BigEd wrote 09/28/2017 at 09:04 point

Ah yes - that's the one associated with Nand2Tetris, always seemed like a great idea although I haven't read it.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates