Close
0%
0%

Kestrel Computer Project

The Kestrel project is all about freedom of computing and the freedom of learning using a completely open hardware and software design.

Similar projects worth following
With each passing day, technically capable consumers of computing technology increasingly lose their rights with computer hardware. While some look to prominent Linux suppliers as an escape from the Intel/Microsoft/Hollywood oligarchy, I have taken a different route -- I decided to build my own computer completely from scratch. My computer architecture is fully open; anyone can review the source, learn from, and hack it to suit their needs.

From the main project website:

  • No back doors. No hardware locks or encryption. Open hardware means you can completely understand the hardware.
  • No memberships in expensive special interest groups or trade organizations required to contribute peripherals.
  • No fear of bricking your computer trying to install the OS of your choice. Bootstrap process is fully disclosed.
  • Designed to empower and encourage the owner to learn about and even tweak the software and the hardware for their own benefit.
  • Built on 64-bit RISC-V-compatible processor technology.

More precisely, the Kestrel-3, my third generation design, aims to be a computer just about on par with an Atari ST or Amiga 1200 computer in terms of overall performance and capability, but comparable to a Commodore 64 in terms of getting things to work.

kcp53001-block-diagram.jpg

This block diagram illustrates my vision of a Furcula-to-Wishbone bus bridge. The KCP53000 CPU exposes a Furcula bus for both its instruction and data ports. Once these buses are arbitrated to a single interconnect, the KCP53001 is used to talk to Wishbone peripherals and memory.

JPEG Image - 205.76 kB - 11/13/2016 at 15:59

Preview Download

block-diagram.jpg

This block diagram illustrates how the pieces of the CGIA fit together to serialize graphics data to the VGA port.

JPEG Image - 1.10 MB - 06/16/2016 at 18:57

Preview Download

forth-3.png

Here, I draw a GEOS-inspired dialog box-like thing, interactively as you can see.

Portable Network Graphics (PNG) - 22.93 kB - 04/11/2016 at 20:23

Preview Download

forth-2.png

Here, I'm writing software to draw simple boxes to the screen using the XOR operator directly on the framebuffer bitmap.

Portable Network Graphics (PNG) - 54.16 kB - 04/11/2016 at 20:22

Preview Download

forth-1.png

Finally got block storage working inside the emulator, and along with it, a visual block editor. It's based on my own Vi-Inspired Block Editor (VIBE).

Portable Network Graphics (PNG) - 52.55 kB - 04/11/2016 at 20:21

Preview Download

kes-eforth-rectangles.png

I tried to get a nice, more or less pretty, static demo for a screenshot on Twitter. But, bugs happened, and I ended up having to debug. Turns out, it made for a better screenshot, because it shows a more realistic user experience. Funny how that works!

Portable Network Graphics (PNG) - 15.10 kB - 04/09/2016 at 14:13

Preview Download

kes-eforth-coldboot.png

When you first "power-on" a Kestrel-3 emulator, it can drop you into the Forth programming language environment. (The Kestrel-3 emulator aims to emulate the Digilent Nexys-2 board, and so has 16MB of RAM.)

Portable Network Graphics (PNG) - 6.09 kB - 04/09/2016 at 14:12

Preview Download

20160322_231722.jpg

Schematic, recalled from memory, of the computing elements of the Kestrel-1 home-made computer. What is NOT shown is the DMA circuitry to load code into RAM under host PC control, and reset logic. The schematic has one error in it: the BE line is tied high through a 1K resistor, just like the RDY line. This lets the IPL circuitry tri-state the CPU's address and data buses under host PC control.

JPEG Image - 4.24 MB - 03/23/2016 at 15:39

Preview Download

View all 8 files

  • Employment Acquired!

    Samuel A. Falvo II18 hours ago 2 comments

    Good news for the project: I've a new job which I'll be starting in mid-May. I hope to work more regularly on this project starting after mid-July, however, as I'll be on-boarding until then, plus holidays.

  • More Thoughts On Remex: Switch Back to SPI?

    Samuel A. Falvo II04/17/2017 at 03:42 0 comments

    When I first conceived of a computer-with-standardized-I/O-channels architecture for the Kestrel-1, I conceived of using bit-banged SPI ports. Later, when I resurrected the idea for consideration in the Kestrel-3 on icoBoard Gamma board, I tried to map my ideas and desires for talking efficiently to block I/O and to a terminal into a single SPI master/slave interconnect. I wasn't happy with the results, so I later decided that I thought a Spacewire-like interface was the way to go for Kestrel-3 I/O channels. However, I subsequently had some doubts develop over its overall system simplicity as I tried writing the Verilog to make it all happen.

    I've decided I'm going to switch back to SPI, at least for now. I'll revisit Spacewire at a later time. I list the reasons why below.

    When I first tried to use SPI for an I/O channel, I originally tried two approaches to framing data and enforcing flow control. These approaches were either not flexible enough or required a large amount of resources on the slave device to implement. I've since devised a third solution which, I think, neatly solves the problem. It seems quite economical to implement, and it definitely has some advantages over Spacewire (and, interestingly, Ethernet).

    The first approach I took used the SPI slave-select signal as a framing delimiter. When asserted, the slave controller knew a fresh packet of data to interpret was on its way. When negated, it could return to a quiescent state. This works great for master-to-slave communications. The reverse data path is not well supported, however. It requires a dedicated (and non-standard) service-request signal, which functions not unlike an interrupt pin on more traditional backplane buses. When service-request is asserted, the host knows the slave needs to communicate with the host. This communication path must still be conducted using a master/slave protocol exchange of some kind, but at least the host can get away without having to poll the device all the time. Another problem with this solution is that it requires at least five digital I/O pins to implement, preventing it from being used on a 1x6 PMOD port.

    The second approach I took discarded the slave-select signal all-together, leaving only MOSI, MISO, and CLK signals. The master/slave relationship continued to exist (only the master can drive CLK). But, I observed that the link was strictly point to point, so the slave-select signal had very limited utility. In its place, I decided to frame data using HDLC, PPP, or COBS. If the slave indicated that it wanted to operate asynchronously, the master would need to drive CLK continuously, allowing the slave to send data when it deemed appropriate. Otherwise, the CLK would be driven only until the number of responses balanced the number of outstanding requests. In either case, both directions used the same framing protocol. The problem with this approach is basic flow control. How big can the frames be? If I use an ESP8266, they can be quite sizeable. If I use a ATtiny microcontroller, not so much! How to implement flow control? I'd need to follow HDLC-like RR/RNR-style flow control, which operates on a packet-by-packet basis. That means I'd need enough buffer space to support at least 7 outstanding frames, which I'd then have to arbitrarily limit to, say, 256 bytes each. So, estimated, a microcontroller would need about 2KB minimum space to support this interconnect technology, not counting driver overhead, and of course, the intended application of the controller in the first place.

    The solution, it seems, is to isolate the flow control mechanism from the delivery of individual bytes and framing. Each direction of the channel operates independently, and in one of two modes of operation. When the link is first established, each direction defaults to "flow control mode". In this mode of operation, bytes take on a special significance: bits 5:3 contains the number of 8-byte words...

    Read more »

  • SRAM Read/Write Tests Successful!

    Samuel A. Falvo II04/11/2017 at 21:50 0 comments

    I made a quick and dirty circuit to exercise RAM reads and writes. The idea is simple: ramp through a counter. Bits 0..3, 5..20 (a total of 20 bits) routes to the address pins of the icoBoard Gamma's SRAM chip. Bit 4 is used to select read/write. This way, the RAM alternates between reads and writes. Data input is taken from the current address bus, while the data output pins drive LEDs directly, with NO intervening processing.

    Test 1 - Cold start - Random data is shown on the LEDs.

    Test 2 - After running test 1 for a while and resetting the board, the values read back appear to correspond to the current address.

    In short, RAM is accepting data, and is reporting the same data back.

    I ecstatic. After many months of failure after failure with other FPGA boards, I'm just so happy that this is working. You have *no* idea.

    The next step is to work on completing a serial I/O interface to talk to the outside world with. I might interface a S16X4 as a test CPU before trying the RISC-V. Not sure yet.

  • icoBoard Gamma Back in Business

    Samuel A. Falvo II03/29/2017 at 16:18 1 comment

    Yesterday, I decided to breakdown and acquire my first Raspberry Pi computer. I got a RPi 3 and, I must admit, it is a nifty little device. Accolades aside, though, this platform was the original way to program icoBoard FPGA boards, and thought since I cannot raise icoBoard Gamma on any of my Linux OR Windows laptops, I'll try the RPi route. It's cheap enough, so why not?

    Immediate success. Not only have I never seen an FPGA program in about a quarter of a second before, but the whole arrangement worked out-of-the-box (except for one brain-fart on my part: if you've attempted to install icoprog for USBaseboard before, be sure to remove those binaries from your path so that the icotools makefiles correctly detect the right way to program the board). Seriously: if you can imagine this as the FPGA world's "MacOS" (where things "just work"), this is it.

    Does this mean Kestrel-3 development is back on track? Not quite; I still need to gain employment, and my energy is still focused on that. But, at least I have a working FPGA board again, and I hope one in which I can reliably talk to RAM with.

  • Multi-Purpose Experimental Serial Transmitter

    Samuel A. Falvo II03/27/2017 at 05:50 0 comments

    Taking a break from job-hunting and my resume editor project, I wondered if I could do better than the Spacewire/IEEE-1355 when making a serial transmitter. To find out, I created the Experimental Serial Transmitter repository to find out.

    This code is not production-grade. It's pretty amateurish, actually. It's probably buggy in certain edge-cases as well.

    This transmitter should support between 1 and 64 bit transfers. I know it works between 1 and 63 bits; 64 bits is as-yet unproven and probably buggy. But, that's OK for now; this is just a prototype. Think "hack-day" project.

    To use as an EIA-232/422/423/485 transmitter (which shifts data LSB first), you load the TXREG register with a bit pattern like follows:

    63 : 10
    9
    8 : 10
    11111....11111Data0

    Bit 0 is the start bit, and must be 0 (since TXD idles high). Bits 8:1 comprise the 8-bit word you wish to send. Finally bit 9 is the stop bit, and must be set to 1. Bits 63:10 don't need to be set to anything per se, but it's good practice to set them to 1 just in case.

    If you want to add parity, then you'll just stuff the parity bit in bit position 9, and the stop bit in bit 10. Simple.

    The BITS parameter tells the engine how many bits to shift out (for 8N1 transmissions, you'll set this to 10. For 8E1 or 8O1, 11. Add one more again for each additional stop bit).

    TXBAUD tells it how fast (how many system clocks per bit cell). The TXC output is automatically generated, and the circuit tries hard to maintain 50% duty cycle (regrettably, it cannot do this with odd baud rates, but it comes as close as it can).

    As data is shifted out, the value of the RXD input is shifted in at bit 63. For EIA-232 uses, this is almost certainly not useful. It's best to treat this as garbage. However, if you loop TXD back to RXD, you could perhaps use this circuit as a crude 1-bit DAC as well.

    To use this circuit as an SPI controller (which typically shifts data MSB first), you use the TXREGR register instead. This register is exactly like TXREG, except the bits are reversed:

    0 : 78 : 63
    Data00000.....000000

    Note that the data you want to send now occupies the highest bits of the register, rather than the lowest. Be sure BITS is set to 8, or whichever is appropriate for the slave device. Note that you'll need a general purpose output to serve as slave-select. XST does not provide this signal on its own.

    XST only supports SPI CPHA=1, CPOL=0 (mode 1). I'll play around with the circuit to see if I can also support the other three modes. CPHA=1/CPOL=1 (mode 3) should be trivially easy to support. CPHA=0, however, will require a bit of thought. The Verilog implementation, I think, is a bit too simplistic to support it without larger adjustments to the code.

    Tip: If you want to cheaply bit-swap a word, write the value into TXREG, and read back via TXREGR (or vice versa). You'll need a way to disable the transmitter shift register engine, though.

    Since RXD (which doubles as MISO) is always shifted into the register at bit 63, after an SPI word is sent, the received data will appear in the lower bits of the TXREGR register.

    Credit where it's due: the primary inspiration is the Commodore-Amiga's PAULA chip's UART design.

  • Mothballing Kestrel Computer Project.

    Samuel A. Falvo II03/07/2017 at 21:13 4 comments

    Abstract

    I’ve been unemployed since November 2016, and Kestrel-3 progress has slowed to a crawl despite all my efforts devoted exclusively towards it. Without small wins, I lose hope and it manifests when attempting to look for a job. Mothballing this project in favor of other projects is the only way forward. I’ll be resurrecting my old attempt at self-employment, RezuRezu, in the hopes that it either helps me land another job soon-ish, or I actually succeed in running my own company.

    http://kestrelcomputer.github.io/kestrel/2017/03/07/kestrel-winter

  • Kestrel-1/3?

    Samuel A. Falvo II03/05/2017 at 06:12 0 comments

    Before I talk about what I'm doing now, let me talk about what I've done since my last update.

    The Remex RX pipeline hasn't changed; it still receives characters and places them into a queue. I still have not yet designed a Wishbone interface for this queue yet. It's coming though.

    The TX pipeline remains incomplete. I have the transmitting PHY/serializer, I have parity tracking, and I have the ability to transmit any N-Char or L-Char provided something spoon-feeds it. But, at the moment, I do not yet have a "what do I transmit next?" circuit that functions autonomously. It's designed, and I've written some Verilog for it, but it remains untested. I'm blocking on this, in part, because I'm not sure this is the direction I want to go. There's something nibbling at my gut that says the circuit I've designed is too complex and can be greatly simplified somehow. So, I'm meditating on it before I proceed further. If worse comes to worse, I can hook what I have up to a Wishbone interface, and let the CPU decide what to transmit and when. This will completely break compatibility with Spacewire and IEEE-1355, basically turning the interface into an RS-232 interface with data-strobe signalling. Not what I'd like to do if I can avoid it.

    Per my previous post, since the RX and TX pipelines are cross-coupled with each other, and interactions exist both locally and remotely, you can imagine that testing this arrangement is on the more difficult side. Part of me is thinking that this is why IEEE-1355 interfaces have failed in commercial industry. EIA-232, T1, E1, SONET, and several ATM-based interfaces are based on a strictly unidirectional, synchronous or plesiochronous relationship between the bits sent by a transmitter and the bits received by another receiver. No feedback loops exist (at least at the physical and data-link layers), and therefore, are much simpler to test and predictably build hardware for. Because they were designed for time-division multiplexing, frame rates are (more or less) isochronous, and so buffer management is (ostensibly) simpler, since the need for deep buffers doesn't exist as long as you can service the bit-stream fast enough. This is now more appealing to me; however, the only thing stopping me from dropping IEEE-1355 and going back to telco-style, TDM-based protocols is, frankly, not knowing how to solve the auto-baud problem. So, IEEE-1355 it is for now.

    So, what am I up to now? Honestly, trying for a small victory. My goal is, in essence, to reproduce the Kestrel-1 in the icoBoard that I've received. My plan is to embed a KCP53000 CPU and all the necessary bridging to a 16-bit Wishbone bus, couple it to the on-board SRAM chip, a 256 byte bootstrap ROM, and one GPIA core to provide general purpose I/O. The goal is to blink some LEDs under CPU control. That's it.

    Unfortunately, I have no idea what the CPU core's timing is like, since the icotime utility reports a timing loop somewhere. Since this isn't necessarily a problem in practice, I'm planning on starting the CPU off at 1MHz, and ramping the clock up from there using a binary search to quickly determine, empirically, its maximum clock speed. I figure, at 1MHz, it will run at around 100,000 instructions per second, and should be plenty slow enough for the core to boot up. I doubt I'll be able to get the core running at 25MHz like on the Xilinx FPGA, but we'll see how well it fares. If it fares at all.

    I'm hoping this works, for if I can't get something this simple working in a reasonable amount of time with a reasonable amount of effort, I see no further reason to continue to work on this project.

  • On IEEE-1355 vs UARTs

    Samuel A. Falvo II03/01/2017 at 18:13 0 comments

    I think I know why IEEE-1355 didn't take off. While cores for this interconnect are quite small (truly, about on par with EIA-232 UARTs with similarly sized FIFOs), they're not necessarily as easy to test as EIA-232. EIA-232 links are just about as simple as SPI, when push comes to shove: you have a dumb transmitter that isochronously sends out bits. It doesn't care what those bits are. You have a dumb receiver that plesiochronously attempts to sample bits. As long as the transmitting and receiving clocks are relatively synchronous with each other (the error is small enough), everything works and you get a reliable serial communications stream. The receiver's higher layers ultimately are responsible for packet framing. These two components, the receiver and transmitter, are otherwise 100% isolated from each other. That makes them easier to both validate and verify.

    IEEE-1355 has separate TX and RX pipelines just like EIA-232; but, they're cross-coupled, and that means they interact. A feedback loop implicitly exists, which makes validation and verification a much more complicated affair. Transmitter A has a credit counter which is replenished by receiver B, while transmitter B's credit counter is replenished by receiver A. It does this through (preferably) hardware-scheduled transmission of FCT tokens.

    Part of me wonders if I should have just stuck with an E1, ATM, or SONET-inspired frame structure, relying on scrambling to help ensure synchronization between TX and RX components. It sure seems like it would produce simpler hardware, be easier to test, and be easier to document as well. The problem remains of how to maintain synchrony between the transmitter and receiver after negotiating a higher throughput. Even at relatively modest speeds, the FT-232 chip on my Arduino Uno loses framing with my (then) host PC's serial port, apparently due to differing baud rate base frequencies.

  • Remex RX Pipeline Update

    Samuel A. Falvo II02/16/2017 at 16:42 0 comments

    RX Pipeline

    I managed to implement a Remex receive pipeline which I'm happy with. It's capable of supporting arbitrary bit rates up to RxClk(Hz)/4 bits per second throughputs safely, although you can probably push it to RxClk/3 bits per second. It deposits all data characters (all N-chars and EOP and EEP characters) into an 8-deep 9-bit FIFO. The FIFO has a (very!) degenerate Wishbone B4 interface on it, so it should be quite easy to couple to a Wishbone B3 or B4 interface later on.

    Because of the high peak throughputs on the Remex interconnect combined with a very shallow FIFO, traffic over the interconnect will "stutter" quite frequently, consisting of bursts of activity separated by intervals of idle activity. I expect real-world throughput to not be that fast until I deepen the FIFO and/or attach a DMA interface to the pipeline. Both are planned, but I need to make sure I have enough room for them first!

    TX Pipeline

    My next set of tasks includes getting the transmit pipeline working. TxClk will be derived from RxClk using a programmable down-counter. This lets the host control transmit data rate with about as much control as you'd typically find in a UART. I'm still trying to figure out overall architecture of the TX pipeline.

    Miscellaneous

    I should note that the RX pipeline, having only an 8-deep queue, consumes around 300 logic cells in the iCE40 parts. I'm guessing that the TX pipeline will take up about as much space, but I won't know until its done. I have no estimate for the Wishbone bus interface yet. This already means I cannot implement a lot of independent channels, so I'll probably restrict myself to just 3 or 4. It could be as small as 2.

    (EDIT: Through a conversation I had on IRC shortly after posting this article, I was referred to this paper which suggests a reasonable implementation size for a complete SpaceWire interface comes to around 460 LUTs. I think it's reasonable, then, to speculate my implementation will weigh in around 600 LUTs, accounting for my relative lack of experience with FPGA design engineering. Further, the same paper suggests a maximum RX throughput of RxClk*2/3, rather than RxClk/3. Exciting!)

    Pragmatically, it's not be as bad as it sounds; yes, it cramps my style, but we must remember that IEEE-1355 is designed to be a packet switched protocol. This means all packets have a (possibly source-routed) destination address field as the first n bytes of a frame. Thus, we can still support a large number of expansions by making use of switches. I was hoping to avoid having to do things this way especially at first, but having a smaller number of channels than planned is not a deal-breaker for me. Even one channel is, while inconvenient, still viable.

  • Remex I/O Channels

    Samuel A. Falvo II02/05/2017 at 08:02 9 comments

    IBM mainframes have some pretty nice names for their channel architectures. The original, of course, simply is known as "channels." But, when they needed higher performance, IBM released something called ESCON. Later, when that wasn't enough, they released a fiber-optic and substantially faster version called FICON.

    As you might guess, I'm not particularly interested in being sued by IBM for infringing on their trademarks, so KESCON or some similar portmanteau or initialism is simply out of the question. Thankfully, it's not a big problem to come up with a decent name of my own: Remex channels.

    I selected the name remex because it is the flight feathers of a bird; in a way, it's one of the "primary interfaces" between a bird and its environment.

    Kestrel-3's I/O channels are based on 1x6 Pmod connectors, 3.3V logic, IEEE-1355 DS-SE-02 signalling, and using a modified Spacewire-like protocol for communications between the computer and peripherals. The result is not compatible with Spacewire or even stock IEEE-1355, due to my insistence for supporting bit-banged peripherals on Arduino-class microcontrollers, which depending upon how they're programmed, can operate at best in the kilobits per second range. However, if the device relies on an FPGA or a GA144-type microcontroller, performance can easily reach many tens of megabits per second.

    As I type this, I have completed a preliminary data-strobe decoder and character decoder for the receive-pipeline, which is arguably the most performance critical part of a Remex link. (See Github repo.) Right now, icetime reports that the top clock rate for the receiver is 157 MHz, which means you could theoretically feed it a 51 Mbps input data rate. (Unlike IEEE-1355 links made professionally, I'm not using self-clocked receiver logic due to the innate difficulty with getting such a thing working on a single development tool-chain, much less across a plurality of different FPGA development systems!) The icoBoard Gamma has a 100MHz oscillator standard, so I expect to drive it at 100MHz to achieve a top throughput of 33 Mbps. That's not a fantastically high data rate (a smidge over 2.5 MB/s peak data rate; real-world performance remains to be measured); but, for an amateur production like mine, it should be plenty powerful enough for a long time to come.

    Besides, if we really need 200Mbps throughput, someone can release an FPGA-/toolchain-optimized revision to the core which enables the receiver to be truly self-clocked. One thing is for sure: 2.5 MB/s isn't fast enough to support even monochrome 640x480 bitmapped displays at 60fps. However, it is capable of 30fps (needs only 1.6 MB/s), so basic animations should still be doable.

    I'm still playing around with the circuit details as I develop it, since this is the very first time I've ever made any IEEE-1355-compliant link. It's also why I'm not writing any unit tests at this time; things are prone to change quite drastically as I learn more about the requirements of the circuit. For now, all the test-benches just generate waveforms for viewing in gtk-wave or similar tool.

View all 72 project logs

View all instructions

Enjoy this project?

Share

Discussions

JL9791 wrote 11/27/2016 at 01:20 point

I see you are still working with Forth :)  I came upon this by accident when researching stack CPUs http://www.strangegizmo.com/forth/ColorForth/msg01746.html
I would like to learn Forth someday, I like the simplicity of stacks (which reminds me of my Magic the Gathering days).

  Are you sure? yes | no

Samuel A. Falvo II wrote 11/27/2016 at 01:32 point

Not having to name every intermediate computation is quite liberating.  But if taken to an extreme, it can also be quite confusing.  :)  The solution is to learn to hyper-factor your code.  A single function in C could well take 16 word definitions in Forth.  Naming procedures is a nice trade-off, because it almost serves to document why your code is the way it is.  Not quite, but good enough for most purposes.  :)  Plus, it really aids in testing code to make sure things work as you expect them to.

  Are you sure? yes | no

JL9791 wrote 11/09/2016 at 01:09 point

I have been following your project for a while, particularly because you selected the RISC-V ISA to build your CPU around.  I recently came across something I had forgotten about:  the now open source Hitachi CPUs (Sega Genesis, Saturn, Dreamcast) found here http://0pf.org/j-core.html

http://j-core.org/

Did you consider those as the brain of your Kestrel?  If not, perhaps they may be a good alternative. :)

  Are you sure? yes | no

Samuel A. Falvo II wrote 11/09/2016 at 01:16 point

Nope, and I have no intentions to either.  I've invested too much into RISC-V to change now.  Switching ISAs today would literally set me back two years of effort.  Besides, performance of RISC-V CPUs are quite good in general; that my own CPU is as slow as a 68000 should not be taken as an indication that all such CPUs are that way.

In the future, I'd like to one day hack a BOOM processor into the Kestrel, which would give it a 4-way superscalar CPU.  But, for now, I just want something simple enough that people can understand.

Another reason for adopting RISC-V is that it has learned many things from both the successes and the failures of past architectures.

Thanks for the link though.  You're not the first to suggest it.  :)

  Are you sure? yes | no

JL9791 wrote 11/09/2016 at 01:18 point

Sure thing.  Yeah, I was not suggesting you scrap all your hard work, just curious.  Glad you are coming along pretty well with it now after the..uh..hiccups :)

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates