Kestrel Computer Project

The Kestrel project is all about freedom of computing and the freedom of learning using a completely open hardware and software design.

Similar projects worth following
With each passing day, technically capable consumers of computing technology increasingly lose their rights with computer hardware. While some look to prominent Linux suppliers as an escape from the Intel/Microsoft/Hollywood oligarchy, I have taken a different route -- I decided to build my own computer completely from scratch. My computer architecture is fully open; anyone can review the source, learn from, and hack it to suit their needs.

From the main project website:

  • No back doors. No hardware locks or encryption. Open hardware means you can completely understand the hardware.
  • No memberships in expensive special interest groups or trade organizations required to contribute peripherals.
  • No fear of bricking your computer trying to install the OS of your choice. Bootstrap process is fully disclosed.
  • Designed to empower and encourage the owner to learn about and even tweak the software and the hardware for their own benefit.
  • Built on 64-bit RISC-V-compatible processor technology.

More precisely, the Kestrel-3, my third generation design, aims to be a computer just about on par with an Atari ST or Amiga 1200 computer in terms of overall performance and capability, but comparable to a Commodore 64 in terms of getting things to work.


This block diagram illustrates my vision of a Furcula-to-Wishbone bus bridge. The KCP53000 CPU exposes a Furcula bus for both its instruction and data ports. Once these buses are arbitrated to a single interconnect, the KCP53001 is used to talk to Wishbone peripherals and memory.

JPEG Image - 205.76 kB - 11/13/2016 at 15:59



This block diagram illustrates how the pieces of the CGIA fit together to serialize graphics data to the VGA port.

JPEG Image - 1.10 MB - 06/16/2016 at 18:57



Here, I draw a GEOS-inspired dialog box-like thing, interactively as you can see.

Portable Network Graphics (PNG) - 22.93 kB - 04/11/2016 at 20:23



Here, I'm writing software to draw simple boxes to the screen using the XOR operator directly on the framebuffer bitmap.

Portable Network Graphics (PNG) - 54.16 kB - 04/11/2016 at 20:22



Finally got block storage working inside the emulator, and along with it, a visual block editor. It's based on my own Vi-Inspired Block Editor (VIBE).

Portable Network Graphics (PNG) - 52.55 kB - 04/11/2016 at 20:21


View all 8 files

  • Commencing Third Pipeline Stage

    Samuel A. Falvo II07/02/2017 at 23:27 0 comments

    Since my last update, I've made many small and incremental improvements to the load/store unit and the register "write-back" side of the X-Register Set modules. To a reasonably good approximation, I think this completes 90% of my work on these stages. I think there are some small artifacts that need to be added still, but these will depend upon the cooperation of other units not yet written, so will have to wait.

    With that said, I think it's time to start on the Integer Execute stage of the pipeline. This is the stage that basically encapsulates the ALU I've already written for the KCP53000.

    LSU Features

    The KCP53010's front-side bus will conform to Wishbone B.4 Pipeline Mode specifications. This new direction satisfies several problems I was having before with the KCP53000, allowing me to collapse several support modules into the core of the CPU effortlessly.

    The B.3/B.4 Standard Mode/Furcula bus ties the master and slave side of the bus inextricably together, which required more sophisticated state machines when adapting to other buses. The 64-bit to 16-bit bridge (KCP53003) added a significant amount of overhead to the circuit, as did all the other bridges that were required to interface the KCP53000 to the Kestrel-2 hardware. It worked; but, it was very slow, and only just barely met timing requirements for a working computer.

    The B.4 Pipelined operation greatly reduces the complexity involved with bridging different bus widths. Supporting 64-bit, 32-bit, 16-bit, and 8-bit transfers over a 16-bit external bus came surprisingly easy once I realized that the command and response (or master and slave, as referenced in the Verilog sources) sides of the bus can be cleanly divorced from each other. I'm banking on this simplification to reduce both layout pressure as well as bump the CPU's operating frequency to a more comfortable rate.


    Because I now natively support Wishbone, the CPU is now directly responsible for handling address misalignment and data path routing. Right now, the LSU doesn't take misalignment into consideration. This is a known bug, but will be addressed later. However, I'm thinking the hardware to detect and respond to this (and similar) condition(s) will still result in a net reduction in complexity.

  • Mega Progress Update

    Samuel A. Falvo II06/04/2017 at 19:34 0 comments

    I could have sworn that I'd posted an update already, but looking at my logs feed, I clearly have not.

    Topics covered below include:

    • Serial Interface Adapter Core Completed
    • Initial Program Adapter Core
    • KCP53010: Successor to KCP53000 CPU

    Serial Interface Adapter Core Completed

    Not much more to say than that. It's done. It's not as small as I'd like, but on the other hand, it's also more flexible than your typical UART design. It allows you to send and receive serial data streams (LSB first only), with or without start bits, stop bits, etc. Frame checking is up to the software using it. It supports configurable FIFO depths and widths (up to 16-bits wide), allowing you to tune the core for your needs. Those who have programmed the Commodore-Amiga's internal UART will be right at home with how this adapter works. A nice, wide divisor allows for data rates as low as hundreds of bits per second, to as high as tens of megabits per second.

    Data is sent over a pair of wires, TXD and TXC, forming data and forwarded clock, respectively. Data is received on RXD and RXC, respectively. It should be noted that it can be synchronized on RXD, RXC, or both. For lower-speed applications, RXD is sufficient. For higher-speeds, you probably want to ignore RXD and focus just on RXC. The choice is yours.

    This core provides a 16-bit Wishbone B.4 Pipelined Mode slave interface; it should be easily usable with 8-bit devices as well.

    New Initial Program Adapter Core

    The Kestrel-3 code-base now includes a new core, currently with the name "IPA". This core has one mission: to facilitate loading the initial bootstrap code into RAM on a ROM-less computer design. From the processor's perspective, it looks exactly like ROM memory, and sits where ROM normally would; however, on the back-end, it parasitically feeds of the RXD and RXC pins of the SIA core. The idea is simple: when the processor reads a half-word from anywhere in ROM's address space, it blocks until the IPA receives two bytes. The bytes must be sent in PC-standard 8N1 serial format. The IPA is synchronized on the RXC input, so you'll need either a proper USART or a microcontroller to drive it. Since I have two Arduinos and an ESP8266 microcontroller at my disposal, this is not a blocking drawback.

    The idea is you spoon-feed the computer an instruction stream designed to explicitly store data into memory, like so:

    ; X1 = pointer into RAM
    ; X2 = value to store (byte)
    ADDI    X1,X0,0
    ADDI    X2,X0,$03
    SB      X2,0(X1)
    ADDI    X2,X0,$7F
    SB      X2,1(X1)
    ; ...etc...
    and so on until you have loaded 1KB to 2KB worth of code into RAM. If you need more than this, you'll need to manually reset X1 somehow, and continue loading your data. This approach is slow, of course; however, it saves me the hassle of needing to implement a DMAC just for the serial port. LUTs are precious in these smaller FPGAs, so this is a pretty big win for me. Besides, this only has to happen exactly once upon system reset, and the bootstrapper doesn't need to be terribly large (4KB seems like an awfully large bootstrapper to me).

    When the initial program is loaded, you kick it off by sending a JAL X0, 0(X0) instruction.

    The IPA exposes a Wishbone B.4 Pipelined Slave interface, and only supports 16-bit half-words. Attempting to read or write bytes from this space will fail in unpredictable ways. Don't do it. Thankfully, when the CPU fetches instructions, it fetches them 16-bits at a time.

    This is not the first ROM-less Kestrel computer I've made. Indeed, my very first, the W65C816-based proof of concept Kestrel-1, only connected to SRAM and a single VIA chip for I/O. The architecture of the Kestrel-1 and the iCE40-targetting Kestrel-3 designs share much in common.

    Kestrel 1p4Kestrel-3
    CPUW65C816P-14, 4MHzKCP530x0, 25MHz
    Performance2 MIPS max.6 MIPS max. (KCP53000),
    12 MIPS est. max. (KCP53010)
    Word Width8/1616/64
    RAM32KB max.256KB min., 512KB typ., 2^60 B max.
    I/O1 VIA with 16-bit parallel I/O 1 SIA, V.4 compatible serial, 110bps to 12.5Mbps possible.
    IPL MechanismBus mastering...
    Read more »

  • Not forgotten, I promise!

    Samuel A. Falvo II05/31/2017 at 03:47 0 comments

    I've started employment, and this has been taking up a significant amount of my free cycles. Apologies for slow progress. I'll probably not be able to spend much time on this project for maybe two more months while I continue my on-boarding/training process.

  • Might Switch Back to MISC CPU

    Samuel A. Falvo II05/07/2017 at 16:52 3 comments

    Don't worry; I still want the RISC-V ISA. But when working with such a tiny FPGA family as the iCE40 line-up, I might have to switch to software-emulation to get what I want.

    I was curious today, and synthesized a bunch of cores to see what their resource utilizations are like.

    SRAM Interface130 LUTs
    Serial Interface Adapter710 LUTs
    KCP53000 CPU + 16-bit bus bridges5500 LUTs
    S16X4A CPU (Kestrel-2)510 LUTs

    If I build out a KCP53000-based computer design, I'll not have any room left on the HX8K FPGA to implement even a tiny boot ROM with. I would need to somehow implement a DMA engine in under 1000 LUTs which simultaneously works with the SIA's quirks as well as serving as an IPL processor for the computer. Not only that, but the computer would have access to exactly one I/O channel.

    If I were to somehow expand the S16X4A to 64-bits, dumb expansion and synthesis run gives me a figure of 1500 LUTs. RAM + SIA + S64X CPU is still small enough to let me synthesize an appreciable on-chip ROM for bootstrapping purposes.

    Switching to a 64-bit wide variant of the S16X4A CPU and relying on software emulation to provide RISC-V compatibility might be the way forward, at least for these smaller FPGAs.

  • Project Repository Switching Back to Mono-Repo

    Samuel A. Falvo II05/06/2017 at 20:33 0 comments

    After spending an appreciable amount of time working with the Kestrel-3's components in a many-repo configuration (e.g., one for CGIA, one for the CPU, and so forth), I feel that juggling all these different components is more trouble than they're worth.

    For this reason, I've decided to (over time) bring all these different components back under one repository. This includes:

    • All Kestrel-2 components. This means I'll be discarding the GitHub kestrel2 repository once I bring it back under the mainline kestrel repository.
    • The MGIA and GPIA are already in mainline project; but enhancements, such as CGIA, GPIA-2, and my up-coming SIA, currently exist either in ad-hoc locations or in separate repositories.
    • The complete KCP53000 family of cores.

    While this negatively impacts anyone trying to use open source hardware package managers like FuseSoC, it will make things significantly easier for me as a maintainer, and I think, probably for anyone wishing to contribute back to the project at a later time. If there's enough pressure to repackage components of the Kestrel via FuseSoC or similar tooling, I think the time is well-spent to make these tools mono-repo-aware, rather than having to cater to them.

    As a consequence of this, I think the directory layout of the repository will necessarily have to change as well. Cores needn't be tied to specific Kestrel versions, for example, so the S16X4A or MGIA can be reused in other contexts.

    I will probably get around to this in the next couple of days, or as I need various cores.

  • SIA Progress: Transmitter and its FIFO Complete

    Samuel A. Falvo II05/06/2017 at 07:08 0 comments

    So, after cleaning out the garage, I decided to sit down and work on the SIA core's transmitter logic. I didn't need to do much to make things work; I just re-used my XST (eXpermental Serial Transmitter; no relation to anything from Xilinx) core's transmitter logic, and re-jiggered it to meet SIA's requirements.

    I am quite pleased that the SIA is nearly complete. I just need to write the top-level `sia.v` file that binds the receiver and transmitter components together, and I'll finally have a workable, and reusable, UART that plays nicely on Wishbone B.4 bus.

  • SIA Register Set

    Samuel A. Falvo II05/03/2017 at 15:31 0 comments

    The Serial Interface Adapter (SIA) core is coming along nicely, if a bit slowly.

    I've just implemented the Wishbone slave port for it. It exposes 16 bytes to the programmer, with 16-bit wide registers. Here's the register map for the core (with byte offsets), taken from the Verilog sources as of this posting. Not everything is implemented yet; I still need to finish the transmitter section, for example. Also, SPI features of my XST core aren't exposed and will be removed if not optimized away by synthesis tools. (As you can imagine, this is all quite preliminary still.)

    // +0	CONFIG (R/W)
    //	...........11111	Specifies character frame length,
    //				including start, stop, and parity bits.
    //	........000.....	Undefined.
    //	.......1........	Enable RXC edge sensitivity.
    //	......1.........	Enable RXD edge sensitivity.
    //	...111..........	TXC mode.
    //		000		Hardwired 0.
    //		001		Hardwired 1.
    //		010		IEEE-1355 Strobe (*)
    //		011		Undefined.
    //		100		Idles low; TXD transitions on rising edge.
    //		101		Idles high; TXD transitions on falling edge.
    //		110		Idles low; TXD transitions on falling edge.(*)
    //		111		Idles high; TXD transitions on rising edge.(*)
    //		* - reserved for this purpose, but might not be implemented.
    //	..1.............	RXC edge polarity
    //		0		Idles low; sensitive on rising edge.
    //		1		Idles high; sensitive on falling edge.
    //	00..............	Undefined.
    // +2	STATUS	(R/O)
    //	...............1	RX FIFO *not* empty.
    //	..............1.	RX FIFO is full.
    //	.............1..	TX FIFO *is* empty.
    //	............1...	TX FIFO is *not* full.
    //	.00000000000....	Undefined.
    //	1...............	One or more other bits set.
    // +4	INTENA	(R/W)
    //	...............1	RX FIFO *not* empty.
    //	..............1.	RX FIFO is full.
    //	.............1..	TX FIFO *is* empty.
    //	............1...	TX FIFO is *not* full.
    //	000000000000....	Undefined.
    // +6	RCVDAT (R/O)
    // +6	SNDDAT (W/O)
    // +8	UNUSED			Unused; hardwired 0.
    // +10	UNUSED			Unused; hardwired 0.
    // +12	BITRATL			Baud rate generator.
    // +14	BITRATH
    //	1111111111111111	Lower bits of divisor.
    //	0000000000001111	Upper bits of divisor.
    //				Bit rate = 100Mbps / (divisor + 1)

  • DMA Controller Coming Together

    Samuel A. Falvo II04/29/2017 at 23:56 0 comments

    I'm working on the DMA controller which will ferry data from the V.4 receiver into RAM. This controller is currently intended for initial-program-load (IPL) purposes, since I don't think I'll have enough resources on the FPGA to implement an appreciably sizeable ROM.

    I think it's coming along well so far. Here's a timing diagram courtesy of gtkwave.

  • Employment Acquired!

    Samuel A. Falvo II04/28/2017 at 15:56 3 comments

    Good news for the project: I've a new job which I'll be starting in mid-May. I hope to work more regularly on this project starting after mid-July, however, as I'll be on-boarding until then, plus holidays.

  • More Thoughts On Remex: Switch Back to SPI?

    Samuel A. Falvo II04/17/2017 at 03:42 0 comments

    When I first conceived of a computer-with-standardized-I/O-channels architecture for the Kestrel-1, I conceived of using bit-banged SPI ports. Later, when I resurrected the idea for consideration in the Kestrel-3 on icoBoard Gamma board, I tried to map my ideas and desires for talking efficiently to block I/O and to a terminal into a single SPI master/slave interconnect. I wasn't happy with the results, so I later decided that I thought a Spacewire-like interface was the way to go for Kestrel-3 I/O channels. However, I subsequently had some doubts develop over its overall system simplicity as I tried writing the Verilog to make it all happen.

    I've decided I'm going to switch back to SPI, at least for now. I'll revisit Spacewire at a later time. I list the reasons why below.

    When I first tried to use SPI for an I/O channel, I originally tried two approaches to framing data and enforcing flow control. These approaches were either not flexible enough or required a large amount of resources on the slave device to implement. I've since devised a third solution which, I think, neatly solves the problem. It seems quite economical to implement, and it definitely has some advantages over Spacewire (and, interestingly, Ethernet).

    The first approach I took used the SPI slave-select signal as a framing delimiter. When asserted, the slave controller knew a fresh packet of data to interpret was on its way. When negated, it could return to a quiescent state. This works great for master-to-slave communications. The reverse data path is not well supported, however. It requires a dedicated (and non-standard) service-request signal, which functions not unlike an interrupt pin on more traditional backplane buses. When service-request is asserted, the host knows the slave needs to communicate with the host. This communication path must still be conducted using a master/slave protocol exchange of some kind, but at least the host can get away without having to poll the device all the time. Another problem with this solution is that it requires at least five digital I/O pins to implement, preventing it from being used on a 1x6 PMOD port.

    The second approach I took discarded the slave-select signal all-together, leaving only MOSI, MISO, and CLK signals. The master/slave relationship continued to exist (only the master can drive CLK). But, I observed that the link was strictly point to point, so the slave-select signal had very limited utility. In its place, I decided to frame data using HDLC, PPP, or COBS. If the slave indicated that it wanted to operate asynchronously, the master would need to drive CLK continuously, allowing the slave to send data when it deemed appropriate. Otherwise, the CLK would be driven only until the number of responses balanced the number of outstanding requests. In either case, both directions used the same framing protocol. The problem with this approach is basic flow control. How big can the frames be? If I use an ESP8266, they can be quite sizeable. If I use a ATtiny microcontroller, not so much! How to implement flow control? I'd need to follow HDLC-like RR/RNR-style flow control, which operates on a packet-by-packet basis. That means I'd need enough buffer space to support at least 7 outstanding frames, which I'd then have to arbitrarily limit to, say, 256 bytes each. So, estimated, a microcontroller would need about 2KB minimum space to support this interconnect technology, not counting driver overhead, and of course, the intended application of the controller in the first place.

    The solution, it seems, is to isolate the flow control mechanism from the delivery of individual bytes and framing. Each direction of the channel operates independently, and in one of two modes of operation. When the link is first established, each direction defaults to "flow control mode". In this mode of operation, bytes take on a special significance: bits 5:3 contains the number of 8-byte words which follows, while bits 2:0 contains the number...

    Read more »

View all 80 project logs

View all instructions

Enjoy this project?



JL9791 wrote 11/27/2016 at 01:20 point

I see you are still working with Forth :)  I came upon this by accident when researching stack CPUs
I would like to learn Forth someday, I like the simplicity of stacks (which reminds me of my Magic the Gathering days).

  Are you sure? yes | no

Samuel A. Falvo II wrote 11/27/2016 at 01:32 point

Not having to name every intermediate computation is quite liberating.  But if taken to an extreme, it can also be quite confusing.  :)  The solution is to learn to hyper-factor your code.  A single function in C could well take 16 word definitions in Forth.  Naming procedures is a nice trade-off, because it almost serves to document why your code is the way it is.  Not quite, but good enough for most purposes.  :)  Plus, it really aids in testing code to make sure things work as you expect them to.

  Are you sure? yes | no

JL9791 wrote 11/09/2016 at 01:09 point

I have been following your project for a while, particularly because you selected the RISC-V ISA to build your CPU around.  I recently came across something I had forgotten about:  the now open source Hitachi CPUs (Sega Genesis, Saturn, Dreamcast) found here

Did you consider those as the brain of your Kestrel?  If not, perhaps they may be a good alternative. :)

  Are you sure? yes | no

Samuel A. Falvo II wrote 11/09/2016 at 01:16 point

Nope, and I have no intentions to either.  I've invested too much into RISC-V to change now.  Switching ISAs today would literally set me back two years of effort.  Besides, performance of RISC-V CPUs are quite good in general; that my own CPU is as slow as a 68000 should not be taken as an indication that all such CPUs are that way.

In the future, I'd like to one day hack a BOOM processor into the Kestrel, which would give it a 4-way superscalar CPU.  But, for now, I just want something simple enough that people can understand.

Another reason for adopting RISC-V is that it has learned many things from both the successes and the failures of past architectures.

Thanks for the link though.  You're not the first to suggest it.  :)

  Are you sure? yes | no

JL9791 wrote 11/09/2016 at 01:18 point

Sure thing.  Yeah, I was not suggesting you scrap all your hard work, just curious.  Glad you are coming along pretty well with it now after the..uh..hiccups :)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates