Close
0%
0%

BoxLambda

A retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

Public Chat
Similar projects worth following
BoxLambda is an open-source project with the goal of creating a retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

BoxLambda is a software-hardware cross-over project. The plan is to provide room for experimentation both on the FPGA RTL side and on the software side.

Key Goals

  • Create a sandbox for experimenting with software and (FPGA) HW.

    • Simplicity: It should be easy to jump in and do something: create, hack, tinker.
      • It should be doable for a single person to develop a good understanding of the entire system, software and hardware.
      • Deterministic Behavior: By design, it should be clear how long an operation, be it an instruction or a DMA transfer, is going to take.
      • Single User/Single Tasking OS booting to a console shell.
    • Create a Modular Architecture allowing for a mix-and-match of software and hardware components.
      • Support for partial FPGA reconfiguration.
  • Target Hardware is Digilent's Arty-A7 and/or the Nexys-A7.

  • The computer should support the following peripherals:

    • Keyboard
    • Mouse (optional)
    • Joystick (optional)
    • Serial port
    • SD card storage
    • VGA Display
    • Audio output
  • Sound and graphics should be sufficient to support retro-style 2D gameplay.

I'm keeping a project Blog and documentation here.

  • First Contact: Hello World!

    Epsilon06/19/2022 at 12:21 0 comments

    After the IRQ post, I started looking for the shortest path to get something simple to work. The idea is to bring up something small, an embryonic version of the project. Iteratively, I then keep growing this small system until I end up with a system that meets the goals. After each iteration, the project should be functioning somewhat better than it was before.

    Iterative Design Spiral Iterative Design Spiral

    Halfway through the first iteration, I realized I needed to figure out my git workflow, or I wouldn’t be able to commit and push my work. Hence, the previous post.

    The Tiny System

    Now, back to taking that first step: I want to bring up the RISCV processor and run a test program on it that can print to the serial port. In other words, I want to run a ‘Hello World!’ program on my Arty A7-35T. Doing so will give us access to print-style debugging, which is sure to come in handy down the road.

    To get to ‘Hello World’, I need to put together a tiny system consisting of the following cores:

    • Ibex RISCV processor (to run the SW).
    • Internal memory (to hold the SW).
    • wbuart32 (serial port console).
    • A Wishbone interconnect to connect the processor to memory and the UART core.

    The Ibex repository includes an example system, called Simple System, that’s similar to the initial system I have in mind, but it does not include a Wishbone interconnect. It shouldn’t be too hard to add a Wishbone interface to Ibex myself, but first I should take a look around to see if a Wishbone-for-Ibex solution already exists. Lo and behold it does:

    https://github.com/batuhanates/ibex_wb

    The ibex_wb SoC Cores

    The ibex_wb SoC includes the following cores:

    • ibex: The RISCV CPU core. The ibex_wb project was pointing to a 3-year-old version. I modified it to use the BoxLambda ibex fork.
    • wbuart32: UART core. The ibex_wb project was pointing to a 3-year-old version. I modified it to use the BoxLambda ibex fork.
    • riscv_dbg: JTAG debug interface. This is a pretty complex core. I ifdef’d it out for the time being. To be revisited.
    • wb_gpio: GPIO core, for sampling buttons and switches and driving LEDs.
    • wb_timer: A timer core, so we can do things like *usleep()* from software.
    • spramx32: Single Port RAM. To be replaced at some point by a Dual-Port RAM.
    • core2wb/core_if/wb_if/slave2wb: Ibex to Wishbone interfacing logic.

    The ibex_wb/soc/fpga/ directory has an SoC build for Cyclone V, the Arty A7-100T, and the Nexys4-DDR. I added an arty-a7-35/ subdirectory, using the Nexys4-DDR SoC code as a starting point.

    This ibex_wb SoC is pretty much a perfect match for the initial system I had in mind. How convenient!

    The ibex_wb SoC Software

    The software is located in the ibex_wb/soc/fpga/arty-a7-35/sw/ directory:

    • libs/soc/ contains drivers for the cores
    • examples/ contains example programs. I tested the hello and the blinky programs.

    ibex_wb/soc/fpga/arty-a7-35/sw/examples/hello/ contains a simple Makefile to build the software and generate a hello.mem file. hello.mem holds the initial contents of the internal memory of the SoC. The file’s contents are included in the FPGA bitstream.

    The same directory also contains a linker script, link.ld, that specifies how much memory is available, and where all the code, data, and stack should go:

    OUTPUT_ARCH(riscv)
    ...
    MEMORY
    {
    	ram         : ORIGIN = 0x00000000, LENGTH = 64K
    }
    
    _min_stack      = 16K;   /* minimum stack space to reserve */
    _stack_start	= ORIGIN(ram) + LENGTH(ram) - 4;
    ...
    SECTIONS
    {
    	.vectors : ...
    
    	.text : { ...
    	
    	.data : { ...
    
    	.bss : { ...
    
    	.stack (NOLOAD): {
    		. = ALIGN(4);
    		. = . + _min_stack ;
    		. = ALIGN(4);
    		stack = . ;
    		_stack = . ;
    	} > ram    ...
    }
    

    ORIGIN should be set to match the CPU’s boot vector. On the FPGA side, the boot vector is specified during CPU core instantiation. I currently have it set to 0 in ibex_soc.sv:

    wb_ibex_core wb_ibex_core (
    .instr_wb     (wbm[COREI_M]),
    .data_wb      (wbm[CORED_M]),
    .test_en      (1'b0),
    .hart_id      (32'h0),
    .boot_addr    (32'h0),
    ...
    

    Synthesizing the SoC

    The original ibex_wb repository appears...

    Read more »

  • Git Workflow and Setup

    Epsilon06/12/2022 at 09:07 0 comments

    Git Workflow

    BoxLambda is a GitHub project that depends on a bunch of other GitHub projects. How do I pull it all together under one umbrella? I don’t just want to copy somebody else’s code and submit it into my repository. That would be impolite and I would lose all tracking with the original project. I want to be able to track the original project, make my own changes, and integrate the project into the BoxLamba repository.

    Git submodules are a great solution for this situation. Submodules allow you to keep a git repository in a subdirectory of another git repository. When you’re making changes inside the submodule subdirectory, those get committed to that submodule’s repository. The parent (supermodule?) repository on the other hand, just tracks submodule commits. From git’s point of view, the submodule subdirectory is not a subdirectory, it’s a file with a reference to a git repository and a specific commit within that repository.

    OK, I guess that sounds pretty confusing. Here’s a much better explanation:

    https://git-scm.com/book/en/v2/Git-Tools-Submodules

    Directories and branches

    I’ll be using the following directory layout in the BoxLambda repository:

    boxlambda/doc
    boxlambda/fpga/ibex (ibex fork git submodule)
    boxlambda/fpga/wbuart32 (wbuart32 fork git submodule)
    boxlambda/fpga/<other FPGA git submodules>
    boxlambda/fpga/<BoxLambda specific FPGA files that don't fit in any of the submodules> boxlambda/sw/<SW fork git submodules>
    boxlambda/sw/<BoxLambda SW files that don't fit in any of the submodules> 

    Each of the git submodules is a fork of a GitHub project discussed in earlier posts. For example, boxlambda/fpga/ibex/ contains my ibex fork, not the original ibex repository.

    In each of the forked submodules, two branches are relevant:

    • master: I’m keeping the master branch in sync with the master branch of the repository I forked from. Having this branch makes it easy to pull in updates as well as to submit the occasional pull request to the original project.
    • boxlambda: On this branch, I’ll be making changes for BoxLambda.

    In the BoxLambda repository itself, I have the following long-running branches:

    • master: I will submit releases to this branch. The master branch should always be in good shape.
    • develop: This is where the work is happening. Things will be in flux here. This branch will not always be in good shape.
    • gh-pages: This branch holds the BoxLambda Blog files. GitHub Pages are by default on the gh-pages branch of a GitHub project.
    • boxlambda-gh-pages-wip: This branch holds work-in-progress Blog updates. This branch also contains some config file modifs specifically for local previewing, which is why this is a long-running branch, rather than a topic branch. When updates are ready for release, I merge them to gh-pages.

    I already pushed this structure to GitHub. Feel free to take a look around:

    https://github.com/epsilon537/boxlambda

    GitHub does a great job displaying submodule subdirectories:

    https://github.com/epsilon537/boxlambda/tree/develop/fpga

    My Setup

    I’m working on Ubuntu WSL on Windows 11. It would be better to work on a native Linux box, but I need to be on Windows for other work, so WSL it is.

    WSL is working well for me. My C: drive shows up as /mnt/c under Linux, so sharing files between Linux and Windows is easy. The clipboard also works seamlessly between Windows and Linux and the Linux apps run right inside the Windows desktop.

    Xilinx’s Vivado installation was straightforward. As a test, I built Ibex’s Arty A7 example using the README instructions. Synthesis, implementation, and bitstream generation went just fine.

    However, when I tried to program the bitstream on my Arty A7 board, connected via USB, I noticed that Vivado wasn’t detecting the board. Ugh. WSL is not perfect after all.

    As a workaround, I installed the Vivado Lab edition on the Windows side. Unlike a regular Vivado installation, the Lab edition is very small. It’s intended for lab machines physically...

    Read more »

  • Interrupts, and estimated FPGA Resource Utilization.

    Epsilon05/29/2022 at 09:06 0 comments

    Our CPU supports the following interrupts (taken from https://ibex-core.readthedocs.io/en/latest/03_reference/exception_interrupts.html):

    Ibex Interrupts:

    Interrupt Input Signal ID Description
    irq_nm_i 31 Non-maskable interrupt (NMI)
    irq_fast_i[14:0] 30:16 15 fast, local interrupts
    irq_external_i 11 Connected to platform-level interrupt controller
    irq_timer_i 7 Connected to timer module
    irq_software_i 3 Connected to memory-mapped (inter-processor)
    interrupt register

    The Timer

    The RISC-V spec includes a timer specification: RISC-V Machine Timer Registers (see RISC-V Privileged Specification, version 1.11, Section 3.1.10). The Ibex GitHub repository contains a compliant implementation as part of the Simple System example:

    https://github.com/epsilon537/ibex/tree/master/examples/simple_system

    We’ll be using this timer module implementation, so we don’t need a separate PIT module.

    The Timer module flags interrupts via signal irq_timer_i. The CPU sees this as IRQ ID 7.

    The Fast Local Interrupts

    We can freely assign 15 local interrupts. I’ve got the following list:

    • 1 interrupt line per Reconfigurable Module (RM), so 3 in total. The default RMs are VERA and a Dual JT49. VERA uses one interrupt line, JT49 uses none.
    • 1 interrupt line each for:
      • wbuart
      • sdspi
      • wbi2c
      • ps2_mouse
      • ps2_keyboard
      • Praxos DMA
      • Quad SPI
      • ICAP
      • DFX Controller
      • GPIO.

      That’s 10 interrupts in total.

    The interrupts are serviced in order of priority, the highest number being the highest priority.

    I have ordered the Fast Local interrupts as follows:

    Fast Local Interrupt Assignments:

    Interrupt Input Signal ID Description
    irq_fast_i[14] 30 RM_2 interrupt (Default: not assigned)
    irq_fast_i[13] 29 RM_1 interrupt (Default: VERA IRQ)
    irq_fast_i[12] 28 RM_0 interrupt (Default: not assigned)
    irq_fast_i[11] 27 Praxos DMAC IRQ
    irq_fast_i[10] 26 sdspi IRQ
    irq_fast_i[9] 25 wbuart IRQ
    irq_fast_i[8] 24 ps2_keyboard IRQ
    irq_fast_i[7] 23 ps2_mouse IRQ
    irq_fast_i[6] 22 sbi2c IRQ
    irq_fast_i[5] 21 GPIO IRQ
    irq_fast_i[4] 20 Quad SPI IRQ
    irq_fast_i[3] 19 DFX Controller IRQ
    irq_fast_i[2] 18 ICAP IRQ
    irq_fast_i[1] 17 not assigned
    irq_fast_i[0] 16 not assigned

    The Platform Level Interrupt Controller.

    One interrupt line is reserved to connect an external interrupt controller. I don’t have any use for it right now, however, so I’m going to leave this unassigned for the time being.

    Since we currently don’t have a use for the Programmable Interrupt Controller, I’ll remove it from the Architecture Diagram.

    Will It Fit? Estimated FPGA Resource Utilization.

    I could keep adding modules and dream up architectures all day long, but some kind of reality-check is long overdue. I’m going to create a fork of all modules identified so far and run them through synthesis, as-is, just to get a sense of the resource utilization on the Arty A7-35T and the Nexys A7-100T. We won’t get more than ballpark figures out of this, but that’s all we need right now.

    Synthesis

    Synthesis is handled by Vivado, Xilinx’s FPGA Design Suite. Vivado is free to download: https://www.xilinx.com/products/design-tools/vivado/vivado-ml.html.

    The synthesis tool turns a module’s Verilog/System Verilog/VHDL source code into a netlist of gates. In the process of doing so, the tool also generates a utilization report, relative to the available resources of the target FPGA. It’s this utilization report we’re after right now, not the generated netlist.

    Here’s an example utilization report, generated during the synthesis of the MIG core:

    https://github.com/epsilon537/boxlambda/blob/main/doc/mig_7series_0_utilization_synth.rpt

    For most of the cores, synthesis was just a matter of pointing Vivado to the core’s source tree and hitting the Run Synthesis button. There were a few exceptions:

    • VERA did not...
    Read more »

  • BoxLambda Architecture, First Draft.

    Epsilon05/22/2022 at 09:44 0 comments

    In this post, we organize the key components from the previous posts into an architecture diagram. Along the way, we identify a few new components.

    None of what’s shown here is set in stone. The diagrams below contain some speculative pieces and there are quite a few loose ends to tie up as we get further into the project.

    The Nexys Configuration

    Nexys Draft Architecture Block Diagram BoxLambda Draft Architecture Block Diagram for Nexys A7-100T.

    This is a draft architecture diagram showing the Nexys A7-100T configuration. Further down, I’ll show the Arty A7-35T configuration.

    Internal RAM

    The system is configured with 256KB of Dual-Port RAM (DPRAM) and 128KB of Video RAM (inside the VERA module). The A7-100T has 607KB of Block RAM in total, so more than enough Block RAM should be left over for other purposes, e.g. for the Black Box Module (see below).

    The CPU has memory-mapped access to DPRAM. As long as no other Bus Masters are competing for access to the same bus, instructions executing from DPRAM will have a fixed cycle count.

    DMA Bus and Processor Bus

    The DPRAM is hooked up to two system buses: a DMA bus and a Processor bus. Bus masters (currently only CPU and DMAC) have access to both buses as well, but the intent is that the DMA Controller uses the DMA bus for MEMC<->DPRAM transfers and the CPU uses the processor bus for DPRAM access. This intent is not hardwired into the system, however. The DMA Controller can set up transfers over the processor bus, and the processor can access external memory over the DMA bus. The two system buses are there to give bus masters some flexibility to stay out of each other’s way.

    Note that, besides access to external and internal memory, the DMA Controller also has access to VERA, the sound cores, and the SD SPI module via the DMA bus.

    Both the Processor Bus and the DMA bus are 32-bit pipelined mode Wishbone buses.

    The Interconnect

    A bus on a block diagram is just a line connecting blocks. In reality, the Interconnect consists of Cross Bars, Arbiters, Address Decoders, and Bridges. I will follow up with an architecture diagram showing the BoxLambda Interconnect details.

    To build the Interconnect, I will make use of the components contributed by the gentlemen below:

    CPU Configuration

    The Ibex CPU configuration is shown as RV32IC, the I and the C indicating Integer and Compressed instruction set, respectively. I would like to include the extensions for integer multiplication and division (M) and bit manipulations (B) into the build as well. Those extensions are going to take up a considerable amount of space, however, and will also have an impact on timing closure. I’m going to defer the decision on those extensions until we have more insight into this project’s FPGA utilization and timing.

    Note that there’s no Instruction or Data Cache. Code executes directly from DPRAM or DDR memory. Data access also goes straight to DPRAM or DDR memory.

    The Black Box, and other Reconfigurable Partitions

    The Black Box Partition is an empty area in the FPGA’s floorplan. This is where you can insert your application-specific logic. Do you need hardware-assisted collision detection for your Bullet-Hell Shoot’em Up game? Put it in the Black Box. A DSP? A CORDIC core? More RAM? As long as it fits the floor plan, you can put it in the Black Box region. The Black Box has bus master and slave ports on both system buses.

    Notice that the Black Box sits inside RP_0, Reconfigurable Partition 0. A Reconfigurable Partition is a region on the FPGA where you can dynamically load a Reconfigurable Module (RM) into. Going back to the previous examples, the collision detector, DSP, CORDIC core, or RAM module, would be...

    Read more »

  • Key Components Part 3: DMA and Peripherals.

    Epsilon05/11/2022 at 14:43 0 comments

    Let’s wrap up the selection of key components for the BoxLambda computer.

    DMA

    I was on the fence for a while, deciding whether or not I should include a DMA engine in our machine. In a previous post, I said I would use DMA to move data between external and internal memory. However, a DMA Controller is by definition a bus master, and having multiple bus masters (DMAC and CPU) adds significant complexity to the architecture: access to shared buses and slaves, impact on timing, etc. In a system with only one bus master, the CPU, you don’t have to worry about any of that.

    Then I snapped out of it and remembered that BoxLambda is intended to be a platform for RTL experimentation. It would be silly to restrict these RTL experiments to bus slave components only. In other words, the BoxLambda architecture is going to have to accommodate bus masters, so we might as well include a DMA Controller.

    Some use cases for DMA in the scope of our computer include:

    • Moving data between external (DDR) and internal (Block RAM) memory.
    • Streaming from memory to the audio DAC.
    • Blitting, i.e. copying data into video memory, taking into account the video memory’s organization. For instance, copying a rectangular block of data into a frame buffer requires striding between rows of pixel data. Another example: Bit planes with 1, 2, or 4 bits-per-pixel color depths require barrel shifting when copying data to a specific pixel offset.

    I spent many hours online searching for DMA Controllers. I was a bit surprised that there were so few options, so I kept digging. I found ZipCPU’s, FreeCore’s, and Ant Micro’s DMA controllers. The Anti Micro DMAC seemed to be the most interesting option, with two Wishbone ports, pipelined mode, striding support, and support for any byte boundary alignment.

    I had this post, with the Ant Micro selection, ready to go. But then I happened across an old post on Reddit where somebody proposed a ‘smart’ DMA concept: a DMAC with a tiny CPU embedded in it. That sounded like a great concept, so I pinged the author to check what became of his idea. In response, the author generously decided to release his code on GitHub! The core is called Praxos. Here is the repository:

    https://github.com/esherriff/Praxos

    Praxos has tiny CPU with a small amount of program and data memory embedded in the core, allowing you to write microcode specifying the DMA behavior you want: word/non-word alignment, incrementing/decrementing/non-incrementing source and/or destination address, strides between transfers, combining sources, barrel shifting… Maximum flexibility!

    It’s not perfect though. Praxos only has one bus master port, an Avalon port at that. It should be doable to slap a standard Wishbone port onto it, but in its current form, I think it won’t be able to take advantage of Wishbone’s pipelined burst mode. That’s unfortunate for a DMAC.

    Still, having the option to hack together my own application-specific DMA microcode sounds like a lot of fun. I just have to go with the Praxos option.

    Many thanks to esherriff for making his code available!

    Storage

    I’m going to use ZipCPU’s SD Card Controller in combination with the FatFs software library to mount a FAT filesystem on the SD card:

    The SD Card Controller has a Wishbone slave port.

    Keyboard and Mouse

    FreeCores has PS/2 keyboard and mouse modules: https://github.com/freecores/ps2

    These cores don’t have a Wishbone slave port, so we’re going to have to add that ourselves.

    Note that the Nexys A7 has a USB HID host interface for keyboard and mouse which, with the help of clever firmware on a PIC24 microcontroller, presents itself to the FPGA as a PS/2 interface. See the Nexys A7 Reference Manual for more details.

    I2C

    The I2C interface can be used to hook up a Real-Time Clock PMOD as well as a Wii Nunchuck Adapter.

    ZipCPU has an I2C core with a Wishbone port: https://github.com/ZipCPU/wbi2c...

    Read more »

  • Key Components Part 2: Graphics and Sound Cores.

    Epsilon05/01/2022 at 11:55 0 comments

    I spent some time researching graphics and sound options for BoxLambda. Here’s what I came up with.

    Graphics

    If you’re reading this, you must be into the build-your-own-computer thing, which probably means you’re aware of the super cool Commander X16 project. Frank van de Hoef created the very elegant VERA (Video Embedded Retro Adapter) module for the X16. Here’s a high-level specification, taken from the Commander X16 website:

    VERA module specifications:

    • Video generator featuring:
      • Multiple output formats (VGA, NTSC Composite, NTSC S-Video, RGB video) at a fixed resolution of 640x480@60Hz
      • Support for 2 layers, both supporting:
        • 1/2/4/8 bpp tile and bitmap modes
        • Support for up to 128 sprites (with inter-sprite collision detection).
      • Embedded video RAM of 128 KB.
      • Palette with 256 colors selected from a total range of 4096 colors.
    • 16-channel stereo Programmable Sound Generator with multiple waveforms (Pulse, Sawtooth, Triangle, Noise)
    • High-quality PCM audio playback from a 4 KB FIFO buffer featuring up to 48kHz 16-bit stereo sound.
    • SecureDigital storage.

    Other features, not mentioned in the blurb, include:

    • Fractional display scaling (scaling lower resolutions up to the 640x480 display resolution).
    • Horizontal and Vertical smooth scrolling

    Lucky for us, Frank recently released the VERA verilog code under the generous MIT license. You can find the code here:

    https://github.com/fvdhoef/vera-module

    I’m not particularly interested in VERA’s PSG (Programmable Sound Generator), or the non-VGA output formats, so I might remove those from the build.

    The 128KB of video RAM will take a big chunk out of our available Block RAM resources, but it’ll be worth it. We’re getting a lot of bang for our buck.

    Note that the VERA is designed as a separate FPGA with a SPI slave interface. Some modifications will be required to integrate it into our SoC.

    Xosera

    I also considered, but eventually dismissed, Xosera:

    https://hackaday.io/project/173731-xosera-fpga-based-retro-video-graphics.

    Xosera is a VERA-inspired video controller, but it is being developed independently by Xarc. I like the Amiga-style Copper processor that they added. Unfortunately, Xosera doesn’t have hardware sprites. That’s a showstopper for me. I’ll keep my eye on this project though. It’s an active project and features are still being added.

    Sound

    A sound core is a perfect candidate for Partial FPGA Reconfiguration. There are a lot of options (Wave-Table synthesis, FM synthesis, PSG…) and a lot of open-source cores available. It would be pretty cool if the software application can just download its synthesizer of choice as part of the program.

    Pretty much any core developed by Jotego sounds like a great idea.

    Technically, I don’t have to select a sound core. We already have sound through VERA’s PCM audio playback. I’m going to select a sound core anyway because I like retro sounds and I’d like to mess around a bit with one of the old-school PSG chips.

    I think I’ll go for a dual YM2149, one for music, one for sound FX, in a game context. The YM2149 was the Atari ST’s sound chip, so we’ll have a large music and sound FX archive at our disposal. Jotego developed an FPGA clone of the YM2149, the JT49:

    https://github.com/jotego/jt49

    Why not VERA PSG?

    The only reason I’m not going for VERA PSG is that, as of yet, very little music has been written for it. I’m sure it is a perfectly adequate PSG implementation.

    Why not SID?

    The SID chip is partially analog, making it much harder to emulate correctly on an FPGA. Also, while I like SID, I’ve probably heard enough SID music to last me a lifetime. I’m currently more interested in finding out what other retro sound chips have to offer.

    Interesting Links

    Read more »

  • Key Components Part 1: Bus, Microprocessor and Memory Controller.

    Epsilon04/23/2022 at 17:08 0 comments

    In the previous post, we discussed top-level requirements. Now we drill down one level, identify key components and apply our requirements to them. We also look around for existing cores or applicable specs that might fit the bill.

    The Bus

    The Bus, or interconnect, is the fabric stitching together the SoC internal components. For this project, the two most relevant SoC internal bus specifications are ARM’s AXI bus and the Open-Source Wishbone bus.

    AXI is very powerful, very popular, and very complex. It scales up well to very big SoCs. However, I don’t think it scales down very well to simple SoCs, such as BoxLambda, where low latency and low complexity are more important than high bandwidth and scalability. Hence, for this project, I’m electing to go with Wishbone.

    We’ll be using the Wishbone B4 specification.

    Sticking to a well-defined internal bus specification certainly helps to meet the Modular Architecture Requirement. Whether we can also accommodate Partial FPGA Reconfiguration using a Wishbone Interconnect remains to be seen.

    The Processor

    Processor Word Size

    Typical processor word sizes are 8-bit, 16-bit, 32-bit, and 64-bit. Which word size is the best fit for Boxlambda?

    • 8-bit: A good word size.
      • Pros:
        • An 8-bit word (i.e. a byte) is a good natural fit for a pixel value, an ASCII character code, or small integer values.
        • 8-bit processors, their programs, and their data are very compact.
        • 8-bit processors side-step some of the alignment issues seen with larger word sizes.
      • Cons:
        • An 8-bit word is too small to conveniently hold the values you need in a typical program - think calculations and table indices.
        • Toolchain support for higher-level languages is limited.
    • 16-bit: A clumsy compromise between 8-bit and 32-bits. Made sense when 32-bit processors were not readily available yet. Now, not so much.
    • 32-bit: Another good word size.
      • Pros: 32-bit words can hold most real-world numbers and cover a huge address space. 32-bit machines generally have good toolchain support.
      • Cons: Much bigger than its 8-bit counterpart, in terms of FPGA real estate, program size as well as data size.
    • 64-bit: A big and clunky word size, way too big to handle conveniently, intended for specialized use cases that don’t fit this project.

    I’ve decided to go for a 32-bit processor. A 32-bit processor (and associated on-chip memory) will take a bigger chunk out of our FPGA real estate, but I think it’s worth it. I like the convenience of 32-bit registers, and a 32-bit processor may come with a regular GCC toolchain.

    Processor Features

    Next to a 32-bit word size, we’re looking for the following features for our microprocessor:

    • Ease of programming, meaning:
      • Easy and well-documented Instruction Set Architectures (ISA). We want to be able to program the machine at assembly language level.
      • Shallow Pipeline: It is relatively easy to reason about the behavior of a processor with a two-stage pipeline. It is not very easy to reason about the behavior of a processor with a six-stage pipeline.
      • Good toolchain support, such as GCC, so we can build a software ecosystem for our machine.
    • An accessible and well-documented implementation.
    • Has to fit our FPGA, with enough space to fit the other components.

    With all that in mind, I think RISC-V is a great option.

    • Great ISA, building on lessons learned from previous popular processor architectures.
    • 32-bit support.
    • GCC toolchain support.
    • Open-Source.
    • Well-documented.
    • Very fashionable. Let’s ride that wave :-)

    There are a lot of RISC-V implementations to choose from. The Ibex project seems like a good choice:

    • 32-bit RISC-V.
    • Hig-quality, well-documented implementation.
    • SystemVerilog based. My preferred HDL.
    • Supports a small two-stage pipeline parameterization.
    • Very active project.

    The Memory Controller

    SDRAM memory access is pretty complicated. Memory access requests get queued in the memory controller, scheduled, and turned into a sequence of commands that vary in execution time depending on the previous memory...

    Read more »

  • Requirements Analysis

    Epsilon04/23/2022 at 17:08 0 comments

    Every new project starts with an empty document, a blank sheet. It’s a unique moment where you have complete freedom. As soon as you put something down in the document, you’ve made a choice and your options become limited. With each subsequent choice, you limit yourself more, until there are no more choices to make, at which point the project is complete. So, in a way, this post, along with the previous one, are the two most important posts of the whole project. We’re making our first choices, setting the direction of this project.

    Let’s go over the project’s goals/requirements and clarify a bit what they mean.

    Simplicity

    Simplicity will be a strong guideline when making design choices. For instance, it may mean that we decide against a popular-but-complex processor in favor of a more obscure-but-simple processor.

    It is hard to make something simple. The Simplicity requirement will make system design harder, not easier. For a case in point, see below.

    Deterministic Behavior

    Designing a deterministic system is more complex than designing a system that allows some slack in the completion of operations. However, once such a system is in place, it becomes much easier to reason about it and design applications on top of it, especially applications with real-time requirements. For instance, it would be pretty cool if the system is designed so that racing-the-beam becomes possible, i.e. time actions within an application’s main loop so that they take place on a specific screen scan line and a specific column on that scan line. Think Commodore 64 split raster bars and sprite multiplexing.

    Note that deterministic behavior must be guaranteed only when required by the application. Less deterministic operations are perfectly acceptable when the application does not require full deterministic behavior. E.g. a deterministic application runs from Block RAM with known, fixed memory access latency, while a non-deterministic application may run from bursty external memory.

    One consequence of the Deterministic Behavior requirement is that bus arbitration should be done using fixed time slots to be able to guarantee fixed timing, latency, and bandwidth to each bus master.

    Single User / Single Tasking OS

    We won’t be running Linux or any other multitasking OS for that matter. The platform will only run one application at a time and that application will be fully in charge of the entire system.

    A Single User / Single Tasking OS will provide the following services:

    • A console CLI shell allowing user and scripted access to:

      • navigate the file system
      • load/save software images to/from memory
      • copy/move/delete files
      • execute (transfer control to) applications in memory, optionally passing in command-line arguments
      • peeking and poking into memory
    • File System I/O kernel routines
    • Console I/O kernel routines: Input from a physically attached keyboard, output to a physically attached screen.
    • UART I/O kernel routines
    • Discovery and enumeration of hardware components. See Modular Architecture below.

    Not Boot-to-Basic

    I don’t want to be pinned down to, or give preference to, any particular interpreted language, so we’re not going going to Boot-to-Basic. We’re not going for full-retro boot-to-Basic.

    I would like to allow open support for multiple interpreted languages by letting the application image indicate in which language it’s written, e.g. by specifying on the first line the path to the interpreter to use, as commonly used in Linux scripting: #!/usr/bin/python, #!/usr/bin/ulisp, …

    It should also be possible to directly execute binary images of course.

    Modular Architecture

    I imagine a reference configuration to which hardware components can be added or from which components can be removed. Applications should be able to discover, with the help of the OS, whether a certain component is present or not.

    Partial FPGA Reconfiguration

    It would be very cool if a hardware component can be incrementally loaded into the FPGA, using Xilinx’...

    Read more »

  • Introducing the BoxLambda Project.

    Epsilon04/23/2022 at 17:06 0 comments

    Alright, this is it. We’re live. I’m starting a project called BoxLambda. Here’s the run-down, copied verbatim from the README.md:

    BoxLambda is an open-source project with the goal of creating a retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

    BoxLambda is a software-hardware cross-over project. The plan is to provide room for experimentation both on the FPGA RTL side and on the software side.

    Key Goals

    • Create a sandbox for experimenting with software and (FPGA) HW.
      • Simplicity: It should be easy to jump in and do something: create, hack, tinker.
        • It should be doable for a single person to develop a good understanding of the entire system, software and hardware.
        • Deterministic Behavior: By design, it should be clear how long an operation, be it an instruction or a DMA transfer, is going to take.
        • Single User/Single Tasking OS booting to a console shell.
      • Create a Modular Architecture allowing for a mix-and-match of software and hardware components.
        • Support for partial FPGA reconfiguration.
    • Target Hardware is Digilent’s Arty-A7 and/or the Nexys-A7.
    • The computer should support the following peripherals:
      • Keyboard
      • Mouse (optional)
      • Joystick (optional)
      • Serial port
      • SD card storage
      • VGA Display
      • Audio output
    • Sound and graphics should be sufficient to support retro-style 2D gameplay.

    You can find the source code for BoxLambda on GitHub: https://github.com/epsilon537/boxlambda/.

    Why?

    Does the world need another retro-style computer? Probably not, but I do. I’m a software engineer and I’ve been studying FPGA development for about a year now, specifically for this project.

    It’s an ambitious project and at least half of it (the FPGA half) is in a realm with which I have very little experience. I don’t know if the project will succeed. Maybe I’m too ambitious and too naive. We’ll see. This Blog will document the journey.

    What’s up with that name, BoxLambda?

    “Box”, as in, a physical box. “Lambda”, as in, an anonymous function, a software concept. It’s an attempt to convey that this project is both about hardware and software. Microsoft would have been a good fit too, but the name was taken.

    Interesting Links

    https://www.commanderx16.com : The Commander X16 is the 8-Bit Guy’s dream computer. This is the project that got me dreaming. I want to build a computer like this, but not exactly like this. I want to build my own.

    OK, that’s enough for an introductory post I think. See you in the next one!

View all 9 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates