Close
0%
0%

BoxLambda

A retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

Public Chat
Similar projects worth following
BoxLambda is an open-source project with the goal of creating a retro-style FPGA-based microcomputer. The microcomputer serves as a platform for software and RTL experimentation.

BoxLambda is a software-hardware cross-over project. The plan is to provide room for experimentation both on the FPGA RTL side and on the software side.

Key Goals

  • Create a sandbox for experimenting with software and (FPGA) HW.

    • Simplicity: It should be easy to jump in and do something: create, hack, tinker.
      • It should be doable for a single person to develop a good understanding of the entire system, software and hardware.
      • Deterministic Behavior: By design, it should be clear how long an operation, be it an instruction or a DMA transfer, is going to take.
      • Single User/Single Tasking OS booting to a console shell.
    • Create a Modular Architecture allowing for a mix-and-match of software and hardware components.
      • Support for partial FPGA reconfiguration.
  • Target Hardware is Digilent's Arty-A7 and/or the Nexys-A7.

  • The computer should support the following peripherals:

    • Keyboard
    • Mouse (optional)
    • Joystick (optional)
    • Serial port
    • SD card storage
    • VGA Display
    • Audio output
  • Sound and graphics should be sufficient to support retro-style 2D gameplay.

I'm keeping a project Blog and documentation here.

  • Testing with Verilator.

    Epsilon07/25/2022 at 19:58 0 comments

    Recap

    I currently have the following for BoxLambda:

    • A test build for an Arty-A7-35T, consisting of an Ibex RISCV core, a Wishbone shared bus, some internal memory, a timer, two GPIO ports, and a UART core.
    • A simple Hello World and LED toggling test program running on the FPGA test build.
    • A Makefile and Bender-based build system with lint checking.

    Testing

    How should I go about testing this project? Given that this is a system integration project rather than an IP development project, I think the focus should go to system-level testing rather than component-level verification. The components themselves have already been verified by their respective owners.

    Ideally, the testbench should allow for the following:

    • Execute system-level test cases in a reasonable time frame. With system-level test cases, I mean test cases where the DUT is the SoC.
    • A short lather-rinse-repeat cycle of making code changes and testing them on a system-level DUT.
    • Full signal visibility into the build, to aid test case development as well as debugging.
    • Reasonably easy automated testing. With the caveat that automated testing is never truly easy.

    Using the FPGA itself as the primary system-level testbench doesn’t meet any of these criteria, other than the first one. Code changes require resynthesis. Signal visibility on the FPGA is limited. Building a robust physical testbench for automated testing is complicated.

    A SystemVerilog-based testbench running on Vivado’s simulator is not an option for me either. The verification aspect of the SystemVerilog language is huge, the learning curve is steep, and the event-driven simulator is slow.

    The Python-based Cocotb test bench running on the Icarus simulator is a step in the right direction. It’s easy to build powerful automated test cases in Python. A Python-based testbench running on an event-driven simulator is slow, however.

    Luckily, there’s a fourth option: Verilator.

    Verilator

    Verilator is a compiler. It compiles, or rather verilates, an HDL design into a C++ model. It then picks up any user-provided C++ testbench/wrapper code and compiles the whole thing into an executable, optionally with the ability to generate traces. So you can run your FPGA design as an executable on your PC, and it’s fast. How cool is that!

    C++ is not an ideal language for test case development, but it’ll get the job done, and it’s a compiled language, so it’s fast.

    Overall, Verilator meets my test bench criteria very well.

    A simple Test Bench for Hello World

    I created a proof-of-concept test bench for the Hello World build. I started from the example code included in the Verilator distribution:

    https://github.com/verilator/verilator/blob/master/examples/make_tracing_c/sim_main.cpp

    I included UARTSIM, the UART co-simulation class that ZipCPU provides along with the UART Verilog implementation in the wbuart32 repository:

    https://github.com/epsilon537/wbuart32/tree/master/bench/cpp

    The test bench does the following:

    1. Instantiate the verilated Hello World model and the UARTSIM co-simulation object.
    2. Optionally, controlled by a command-line option, enable tracing.
    3. Run the model for a fixed number of clock cycles.
    4. While running the model:
      1. Feed the model’s UART output to UARTSIM.
      2. Capture and display the decoded UARTSIM output and the GPIO outputs.
    5. Pass/Fail criterium: After running the model for the set number of clock cycles, match the captured UART and GPIO outputs against expected results.

    As suggested by ZipCPU in his Verilog tutorial, I use nCurses for positional printing inside the terminal windows. This way, I can easily build a display that refreshes, rather than scrolls, whenever the model produces new UART or GPIO data to display.

    The result looks like this:

    This is the test bench source code, slightly edited for brevity:

    int main(int argc, char** argv, char** env) { std::unique_ptr<UARTSIM> uart{new UARTSIM(0)}; //Uart co-simulation from wbuart32. // Using unique_ptr is similar to "VerilatedContext*...
    Read more »

  • Warnings and Verilator Lint.

    Epsilon07/16/2022 at 10:53 0 comments

    Recap

    We currently have a simple Hello World test project for an Arty-A7-35T, consisting of an Ibex RISCV core, a Wishbone shared bus, some internal memory, a timer, GPIO, and UART core. We can build a simple Hello World test program for the processor and include that into the FPGA build. Software compilation and FPGA synthesis and implementation are managed by a Makefile and Bender based build system.

    The Hello World test project currently builds and runs just fine. However, from the number of warnings that Vivado spits outs during synthesis, you would almost be surprised it works at all. Since my previous post, I’ve been sorting through those warnings. I also added linting.

    Vivado Warnings

    If like me, you have a software background, you’ll probably see warnings as errors. They’re often benign but, ideally, they should be fixed.

    Vivado synthesis doesn’t seem to work like that. Vivado generates warnings for code that, to me at least, looks perfectly alright. For example:

    You attach a simple slave to a shared bus. The slave doesn’t require all input signals from the bus (e.g. a subset of the address lines). The slave also drives some of the optional output signals to a constant zero (e.g. an error signal).

    When synthesizing this slave module, Vivado will generate a warning for each unconnected input signal and for each output signal that’s driven by a constant. In other words: in Vivado, Warnings are not Errors. Warnings need to be reviewed, but they don’t necessarily need to be fixed.

    Btw, I’m just referring to regular Vivado warnings here. Vivado may also generate Critical Warnings. Critical Warnings indicate significant issues that need to be looked at and fixed.

    Synthesizing a component separately also generates a lot of additional warnings, compared to synthesizing that same component embedded in a project build, with all the inputs, outputs, and clocks hooked up. Many of those warnings can be avoided by adding constraints specifically for the standalone synthesis of that component, but I don’t think it’s worth the effort. I decided to focus instead on reviewing and fixing as many warnings as possible in project builds. Right now, that’s just the Hello World build.

    There’s also the matter of warnings deep inside third-party code. Warnings near a component’s surface you have to be careful with, as those can point to integration issues. Several layers deep, however, you’re looking at third-party code internals that is presumably being actively maintained by someone else. I take a look when I see such a warning, but I will think twice before making changes. On the other hand, abandoned third-party code, such as ibex_wb, I will treat as my own.

    To summarize, here’s how I’m handling Vivado warnings:

    • Critical Warnings are Errors. They need to be looked at and fixed.
    • (Regular) Warnings are not Errors. They need to be looked at, but not necessarily fixed.
    • Focus on project build warnings. Never mind the standalone component synthesis warnings.
    • Think twice before fixing warnings inside actively maintained third-party code.

    With that pragmatic mindset adopted, I was able to make progress. I fixed a bunch of warnings, but not all, for the reasons stated above.

    Lint Checking

    Because Vivado synthesis spits out such confusing warnings, I wanted a second opinion. I decided to add Verilator lint checking to the build system. Verilator lint performs static code analysis and will find coding issues that Vivado synthesis often does not. Moreover, it does this very quickly. Without linting, finding and fixing coding errors is a slow process:

    1. Make some code changes.
    2. Kick-off synthesis.
    3. Wait 20 minutes or more for the synthesis to complete.
    4. Get a bunch of warnings and/or errors.
    5. Repeat.

    With lint on the other hand:

    1. Make some code changes.
    2. Kick-off lint checking.
    3. Wait 10 seconds.
    4. Get a bunch of warnings and/or errors.
    5. Repeat.

    When your design lints cleanly, you still need to synthesize it obviously, but at that point, it...

    Read more »

  • BoxLambda: Make, Tcl and Bender Build System

    Epsilon07/04/2022 at 10:03 0 comments

    The Hello World build in the previous post is a GUI-driven Vivado project. I would like to upgrade to a hierarchical, command-line-driven build system. In a command-line-driven build system, it’ll be easier to automate tasks and it’ll be easier to integrate tools that are not part of Vivado, such as Cocotb and Verilator.

    Terminology and References

    • CocoTB: A Python-based framework for digital logic verification. See https://www.cocotb.org/.
    • Constraints File: A constraints file specifies the mapping of the top-level HDL module’s input and output ports to physical pins of the FPGA. It also defines the clocks used by the given design. See https://digilent.com/reference/programmable-logic/guides/vivado-xdc-file.
    • EDA tool: A software tool to design electronic circuits, e.g. Vivado.
    • IP-XACT: An XML format that defines and describes individual, re-usable electronic circuit designs to facilitate their use in creating integrated circuits.
    • IP Package: A Vivado file encapsulating an IP component using the IP-XACT file format.
    • Makefile: A file used by the Make utility, defining a set of tasks to be executed, and defining dependencies between tasks. Makefiles are commonly used to create build systems.
    • Memory File: A file containing the initial contents of a Block RAM instance used in an FPGA design.
    • OOC: Vivado’s OOC mode or OOC flow lets you synthesize, implement, and analyze design modules in a hierarchical design.
    • Tcl: The defacto standard embedded command language for EDA applications.
    • Verilator: A tool that converts Verilog to a cycle-accurate behavioral model in C++ or SystemC. The performance of the generated behavioral model is generally much higher than that of a traditional event-driven simulator. See https://www.veripool.org/verilator/.

    Vivado IP Packages

    Vivado has an embedded, Tcl-based command-line interface. For every GUI action, there’s an equivalent Tcl command or set of commands. My initial approach to creating a build system was to use a combination of Makefiles and Tcl scripts to get Vivado to generate a so-called IP Package for each component. These IP Packages then constitute the building blocks of our system: IP Packages can be aggregated into bigger IP Packages. A top-level project build aggregates IP Packages into an SoC.

    This approach has some advantages:

    • It’s hierarchical: A big SoC build is (recursively) broken down into manageable components.
    • It doesn’t introduce any new tool dependencies other than GNU Make.

    Along the way, I learned that Vivado IP Packages also have some disadvantages:

    • SystemVerilog is not supported at the top-level, i.e. I have to create Verilog wrappers around SystemVerilog-based components. That’s not the end of the world, but it does feel like a step backward.
    • Vivado IP Packages come in a standard format called IP-XACT. If I want to create a flat list of files that make up a project, e.g. to feed to Verilator or Cocotb, I need a tool to extract information from IP-XACT files. I was able to find one tool, called Kactus 2, but that appears to be a full-fledged graphical EDA application, rather than a command-line utility. As long as I can’t easily interface to IP-XACT files, I’m locked into Vivado and won’t be able to use third-party tools like Verilator or Cocotb.

    That last item is a deal-breaker for me. I start looking for other options.

    FuseSoc

    https://fusesoc.readthedocs.io/en/stable/

    FuseSoc is a package manager and build system for HDL code. HDL builds can be retargeted from one EDA tool to another with the flip of a switch, so to speak. The tool is already in use by projects such as Ibex, and it looks very promising, so I decide to give it a shot…

    Creating a so-called FuseSoc core file, a manifest defining the component, is easy enough. Once you have such a core file, you can instruct the tool to generate, for instance, a Vivado or a Verilator build for it. The problem is, I have no idea how it works. When I kick off a Fusesoc Verilator build, I get a nice OK message at the...

    Read more »

  • First Contact: Hello World!

    Epsilon06/19/2022 at 12:21 0 comments

    After the IRQ post, I started looking for the shortest path to get something simple to work. The idea is to bring up something small, an embryonic version of the project. Iteratively, I then keep growing this small system until I end up with a system that meets the goals. After each iteration, the project should be functioning somewhat better than it was before.

    Iterative Design Spiral Iterative Design Spiral

    Halfway through the first iteration, I realized I needed to figure out my git workflow, or I wouldn’t be able to commit and push my work. Hence, the previous post.

    The Tiny System

    Now, back to taking that first step: I want to bring up the RISCV processor and run a test program on it that can print to the serial port. In other words, I want to run a ‘Hello World!’ program on my Arty A7-35T. Doing so will give us access to print-style debugging, which is sure to come in handy down the road.

    To get to ‘Hello World’, I need to put together a tiny system consisting of the following cores:

    • Ibex RISCV processor (to run the SW).
    • Internal memory (to hold the SW).
    • wbuart32 (serial port console).
    • A Wishbone interconnect to connect the processor to memory and the UART core.

    The Ibex repository includes an example system, called Simple System, that’s similar to the initial system I have in mind, but it does not include a Wishbone interconnect. It shouldn’t be too hard to add a Wishbone interface to Ibex myself, but first I should take a look around to see if a Wishbone-for-Ibex solution already exists. Lo and behold it does:

    https://github.com/batuhanates/ibex_wb

    The ibex_wb SoC Cores

    The ibex_wb SoC includes the following cores:

    • ibex: The RISCV CPU core. The ibex_wb project was pointing to a 3-year-old version. I modified it to use the BoxLambda ibex fork.
    • wbuart32: UART core. The ibex_wb project was pointing to a 3-year-old version. I modified it to use the BoxLambda ibex fork.
    • riscv_dbg: JTAG debug interface. This is a pretty complex core. I ifdef’d it out for the time being. To be revisited.
    • wb_gpio: GPIO core, for sampling buttons and switches and driving LEDs.
    • wb_timer: A timer core, so we can do things like *usleep()* from software.
    • spramx32: Single Port RAM. To be replaced at some point by a Dual-Port RAM.
    • core2wb/core_if/wb_if/slave2wb: Ibex to Wishbone interfacing logic.

    The ibex_wb/soc/fpga/ directory has an SoC build for Cyclone V, the Arty A7-100T, and the Nexys4-DDR. I added an arty-a7-35/ subdirectory, using the Nexys4-DDR SoC code as a starting point.

    This ibex_wb SoC is pretty much a perfect match for the initial system I had in mind. How convenient!

    The ibex_wb SoC Software

    The software is located in the ibex_wb/soc/fpga/arty-a7-35/sw/ directory:

    • libs/soc/ contains drivers for the cores
    • examples/ contains example programs. I tested the hello and the blinky programs.

    ibex_wb/soc/fpga/arty-a7-35/sw/examples/hello/ contains a simple Makefile to build the software and generate a hello.mem file. hello.mem holds the initial contents of the internal memory of the SoC. The file’s contents are included in the FPGA bitstream.

    The same directory also contains a linker script, link.ld, that specifies how much memory is available, and where all the code, data, and stack should go:

    OUTPUT_ARCH(riscv)
    ...
    MEMORY
    {
    	ram         : ORIGIN = 0x00000000, LENGTH = 64K
    }
    
    _min_stack      = 16K;   /* minimum stack space to reserve */
    _stack_start	= ORIGIN(ram) + LENGTH(ram) - 4;
    ...
    SECTIONS
    {
    	.vectors : ...
    
    	.text : { ...
    	
    	.data : { ...
    
    	.bss : { ...
    
    	.stack (NOLOAD): {
    		. = ALIGN(4);
    		. = . + _min_stack ;
    		. = ALIGN(4);
    		stack = . ;
    		_stack = . ;
    	} > ram    ...
    }
    

    ORIGIN should be set to match the CPU’s boot vector. On the FPGA side, the boot vector is specified during CPU core instantiation. I currently have it set to 0 in ibex_soc.sv:

    wb_ibex_core wb_ibex_core (
    .instr_wb     (wbm[COREI_M]),
    .data_wb      (wbm[CORED_M]),
    .test_en      (1'b0),
    .hart_id      (32'h0),
    .boot_addr    (32'h0),
    ...
    

    Synthesizing the SoC

    The original ibex_wb repository appears...

    Read more »

  • Git Workflow and Setup

    Epsilon06/12/2022 at 09:07 0 comments

    Git Workflow

    BoxLambda is a GitHub project that depends on a bunch of other GitHub projects. How do I pull it all together under one umbrella? I don’t just want to copy somebody else’s code and submit it into my repository. That would be impolite and I would lose all tracking with the original project. I want to be able to track the original project, make my own changes, and integrate the project into the BoxLamba repository.

    Git submodules are a great solution for this situation. Submodules allow you to keep a git repository in a subdirectory of another git repository. When you’re making changes inside the submodule subdirectory, those get committed to that submodule’s repository. The parent (supermodule?) repository on the other hand, just tracks submodule commits. From git’s point of view, the submodule subdirectory is not a subdirectory, it’s a file with a reference to a git repository and a specific commit within that repository.

    OK, I guess that sounds pretty confusing. Here’s a much better explanation:

    https://git-scm.com/book/en/v2/Git-Tools-Submodules

    Directories and branches

    I’ll be using the following directory layout in the BoxLambda repository:

    boxlambda/doc
    boxlambda/fpga/ibex (ibex fork git submodule)
    boxlambda/fpga/wbuart32 (wbuart32 fork git submodule)
    boxlambda/fpga/<other FPGA git submodules>
    boxlambda/fpga/<BoxLambda specific FPGA files that don't fit in any of the submodules> boxlambda/sw/<SW fork git submodules>
    boxlambda/sw/<BoxLambda SW files that don't fit in any of the submodules> 

    Each of the git submodules is a fork of a GitHub project discussed in earlier posts. For example, boxlambda/fpga/ibex/ contains my ibex fork, not the original ibex repository.

    In each of the forked submodules, two branches are relevant:

    • master: I’m keeping the master branch in sync with the master branch of the repository I forked from. Having this branch makes it easy to pull in updates as well as to submit the occasional pull request to the original project.
    • boxlambda: On this branch, I’ll be making changes for BoxLambda.

    In the BoxLambda repository itself, I have the following long-running branches:

    • master: I will submit releases to this branch. The master branch should always be in good shape.
    • develop: This is where the work is happening. Things will be in flux here. This branch will not always be in good shape.
    • gh-pages: This branch holds the BoxLambda Blog files. GitHub Pages are by default on the gh-pages branch of a GitHub project.
    • boxlambda-gh-pages-wip: This branch holds work-in-progress Blog updates. This branch also contains some config file modifs specifically for local previewing, which is why this is a long-running branch, rather than a topic branch. When updates are ready for release, I merge them to gh-pages.

    I already pushed this structure to GitHub. Feel free to take a look around:

    https://github.com/epsilon537/boxlambda

    GitHub does a great job displaying submodule subdirectories:

    https://github.com/epsilon537/boxlambda/tree/develop/fpga

    My Setup

    I’m working on Ubuntu WSL on Windows 11. It would be better to work on a native Linux box, but I need to be on Windows for other work, so WSL it is.

    WSL is working well for me. My C: drive shows up as /mnt/c under Linux, so sharing files between Linux and Windows is easy. The clipboard also works seamlessly between Windows and Linux and the Linux apps run right inside the Windows desktop.

    Xilinx’s Vivado installation was straightforward. As a test, I built Ibex’s Arty A7 example using the README instructions. Synthesis, implementation, and bitstream generation went just fine.

    However, when I tried to program the bitstream on my Arty A7 board, connected via USB, I noticed that Vivado wasn’t detecting the board. Ugh. WSL is not perfect after all.

    As a workaround, I installed the Vivado Lab edition on the Windows side. Unlike a regular Vivado installation, the Lab edition is very small. It’s intended for lab machines physically...

    Read more »

  • Interrupts, and estimated FPGA Resource Utilization.

    Epsilon05/29/2022 at 09:06 0 comments

    Our CPU supports the following interrupts (taken from https://ibex-core.readthedocs.io/en/latest/03_reference/exception_interrupts.html):

    Ibex Interrupts:

    Interrupt Input Signal ID Description
    irq_nm_i 31 Non-maskable interrupt (NMI)
    irq_fast_i[14:0] 30:16 15 fast, local interrupts
    irq_external_i 11 Connected to platform-level interrupt controller
    irq_timer_i 7 Connected to timer module
    irq_software_i 3 Connected to memory-mapped (inter-processor)
    interrupt register

    The Timer

    The RISC-V spec includes a timer specification: RISC-V Machine Timer Registers (see RISC-V Privileged Specification, version 1.11, Section 3.1.10). The Ibex GitHub repository contains a compliant implementation as part of the Simple System example:

    https://github.com/epsilon537/ibex/tree/master/examples/simple_system

    We’ll be using this timer module implementation, so we don’t need a separate PIT module.

    The Timer module flags interrupts via signal irq_timer_i. The CPU sees this as IRQ ID 7.

    The Fast Local Interrupts

    We can freely assign 15 local interrupts. I’ve got the following list:

    • 1 interrupt line per Reconfigurable Module (RM), so 3 in total. The default RMs are VERA and a Dual JT49. VERA uses one interrupt line, JT49 uses none.
    • 1 interrupt line each for:
      • wbuart
      • sdspi
      • wbi2c
      • ps2_mouse
      • ps2_keyboard
      • Praxos DMA
      • Quad SPI
      • ICAP
      • DFX Controller
      • GPIO.

      That’s 10 interrupts in total.

    The interrupts are serviced in order of priority, the highest number being the highest priority.

    I have ordered the Fast Local interrupts as follows:

    Fast Local Interrupt Assignments:

    Interrupt Input Signal ID Description
    irq_fast_i[14] 30 RM_2 interrupt (Default: not assigned)
    irq_fast_i[13] 29 RM_1 interrupt (Default: VERA IRQ)
    irq_fast_i[12] 28 RM_0 interrupt (Default: not assigned)
    irq_fast_i[11] 27 Praxos DMAC IRQ
    irq_fast_i[10] 26 sdspi IRQ
    irq_fast_i[9] 25 wbuart IRQ
    irq_fast_i[8] 24 ps2_keyboard IRQ
    irq_fast_i[7] 23 ps2_mouse IRQ
    irq_fast_i[6] 22 sbi2c IRQ
    irq_fast_i[5] 21 GPIO IRQ
    irq_fast_i[4] 20 Quad SPI IRQ
    irq_fast_i[3] 19 DFX Controller IRQ
    irq_fast_i[2] 18 ICAP IRQ
    irq_fast_i[1] 17 not assigned
    irq_fast_i[0] 16 not assigned

    The Platform Level Interrupt Controller.

    One interrupt line is reserved to connect an external interrupt controller. I don’t have any use for it right now, however, so I’m going to leave this unassigned for the time being.

    Since we currently don’t have a use for the Programmable Interrupt Controller, I’ll remove it from the Architecture Diagram.

    Will It Fit? Estimated FPGA Resource Utilization.

    I could keep adding modules and dream up architectures all day long, but some kind of reality-check is long overdue. I’m going to create a fork of all modules identified so far and run them through synthesis, as-is, just to get a sense of the resource utilization on the Arty A7-35T and the Nexys A7-100T. We won’t get more than ballpark figures out of this, but that’s all we need right now.

    Synthesis

    Synthesis is handled by Vivado, Xilinx’s FPGA Design Suite. Vivado is free to download: https://www.xilinx.com/products/design-tools/vivado/vivado-ml.html.

    The synthesis tool turns a module’s Verilog/System Verilog/VHDL source code into a netlist of gates. In the process of doing so, the tool also generates a utilization report, relative to the available resources of the target FPGA. It’s this utilization report we’re after right now, not the generated netlist.

    Here’s an example utilization report, generated during the synthesis of the MIG core:

    https://github.com/epsilon537/boxlambda/blob/main/doc/mig_7series_0_utilization_synth.rpt

    For most of the cores, synthesis was just a matter of pointing Vivado to the core’s source tree and hitting the Run Synthesis button. There were a few exceptions:

    • VERA did not...
    Read more »

  • BoxLambda Architecture, First Draft.

    Epsilon05/22/2022 at 09:44 0 comments

    In this post, we organize the key components from the previous posts into an architecture diagram. Along the way, we identify a few new components.

    None of what’s shown here is set in stone. The diagrams below contain some speculative pieces and there are quite a few loose ends to tie up as we get further into the project.

    The Nexys Configuration

    Nexys Draft Architecture Block Diagram BoxLambda Draft Architecture Block Diagram for Nexys A7-100T.

    This is a draft architecture diagram showing the Nexys A7-100T configuration. Further down, I’ll show the Arty A7-35T configuration.

    Internal RAM

    The system is configured with 256KB of Dual-Port RAM (DPRAM) and 128KB of Video RAM (inside the VERA module). The A7-100T has 607KB of Block RAM in total, so more than enough Block RAM should be left over for other purposes, e.g. for the Black Box Module (see below).

    The CPU has memory-mapped access to DPRAM. As long as no other Bus Masters are competing for access to the same bus, instructions executing from DPRAM will have a fixed cycle count.

    DMA Bus and Processor Bus

    The DPRAM is hooked up to two system buses: a DMA bus and a Processor bus. Bus masters (currently only CPU and DMAC) have access to both buses as well, but the intent is that the DMA Controller uses the DMA bus for MEMC<->DPRAM transfers and the CPU uses the processor bus for DPRAM access. This intent is not hardwired into the system, however. The DMA Controller can set up transfers over the processor bus, and the processor can access external memory over the DMA bus. The two system buses are there to give bus masters some flexibility to stay out of each other’s way.

    Note that, besides access to external and internal memory, the DMA Controller also has access to VERA, the sound cores, and the SD SPI module via the DMA bus.

    Both the Processor Bus and the DMA bus are 32-bit pipelined mode Wishbone buses.

    The Interconnect

    A bus on a block diagram is just a line connecting blocks. In reality, the Interconnect consists of Cross Bars, Arbiters, Address Decoders, and Bridges. I will follow up with an architecture diagram showing the BoxLambda Interconnect details.

    To build the Interconnect, I will make use of the components contributed by the gentlemen below:

    CPU Configuration

    The Ibex CPU configuration is shown as RV32IC, the I and the C indicating Integer and Compressed instruction set, respectively. I would like to include the extensions for integer multiplication and division (M) and bit manipulations (B) into the build as well. Those extensions are going to take up a considerable amount of space, however, and will also have an impact on timing closure. I’m going to defer the decision on those extensions until we have more insight into this project’s FPGA utilization and timing.

    Note that there’s no Instruction or Data Cache. Code executes directly from DPRAM or DDR memory. Data access also goes straight to DPRAM or DDR memory.

    The Black Box, and other Reconfigurable Partitions

    The Black Box Partition is an empty area in the FPGA’s floorplan. This is where you can insert your application-specific logic. Do you need hardware-assisted collision detection for your Bullet-Hell Shoot’em Up game? Put it in the Black Box. A DSP? A CORDIC core? More RAM? As long as it fits the floor plan, you can put it in the Black Box region. The Black Box has bus master and slave ports on both system buses.

    Notice that the Black Box sits inside RP_0, Reconfigurable Partition 0. A Reconfigurable Partition is a region on the FPGA where you can dynamically load a Reconfigurable Module (RM) into. Going back to the previous examples, the collision detector, DSP, CORDIC core, or RAM module, would be...

    Read more »

  • Key Components Part 3: DMA and Peripherals.

    Epsilon05/11/2022 at 14:43 0 comments

    Let’s wrap up the selection of key components for the BoxLambda computer.

    DMA

    I was on the fence for a while, deciding whether or not I should include a DMA engine in our machine. In a previous post, I said I would use DMA to move data between external and internal memory. However, a DMA Controller is by definition a bus master, and having multiple bus masters (DMAC and CPU) adds significant complexity to the architecture: access to shared buses and slaves, impact on timing, etc. In a system with only one bus master, the CPU, you don’t have to worry about any of that.

    Then I snapped out of it and remembered that BoxLambda is intended to be a platform for RTL experimentation. It would be silly to restrict these RTL experiments to bus slave components only. In other words, the BoxLambda architecture is going to have to accommodate bus masters, so we might as well include a DMA Controller.

    Some use cases for DMA in the scope of our computer include:

    • Moving data between external (DDR) and internal (Block RAM) memory.
    • Streaming from memory to the audio DAC.
    • Blitting, i.e. copying data into video memory, taking into account the video memory’s organization. For instance, copying a rectangular block of data into a frame buffer requires striding between rows of pixel data. Another example: Bit planes with 1, 2, or 4 bits-per-pixel color depths require barrel shifting when copying data to a specific pixel offset.

    I spent many hours online searching for DMA Controllers. I was a bit surprised that there were so few options, so I kept digging. I found ZipCPU’s, FreeCore’s, and Ant Micro’s DMA controllers. The Anti Micro DMAC seemed to be the most interesting option, with two Wishbone ports, pipelined mode, striding support, and support for any byte boundary alignment.

    I had this post, with the Ant Micro selection, ready to go. But then I happened across an old post on Reddit where somebody proposed a ‘smart’ DMA concept: a DMAC with a tiny CPU embedded in it. That sounded like a great concept, so I pinged the author to check what became of his idea. In response, the author generously decided to release his code on GitHub! The core is called Praxos. Here is the repository:

    https://github.com/esherriff/Praxos

    Praxos has tiny CPU with a small amount of program and data memory embedded in the core, allowing you to write microcode specifying the DMA behavior you want: word/non-word alignment, incrementing/decrementing/non-incrementing source and/or destination address, strides between transfers, combining sources, barrel shifting… Maximum flexibility!

    It’s not perfect though. Praxos only has one bus master port, an Avalon port at that. It should be doable to slap a standard Wishbone port onto it, but in its current form, I think it won’t be able to take advantage of Wishbone’s pipelined burst mode. That’s unfortunate for a DMAC.

    Still, having the option to hack together my own application-specific DMA microcode sounds like a lot of fun. I just have to go with the Praxos option.

    Many thanks to esherriff for making his code available!

    Storage

    I’m going to use ZipCPU’s SD Card Controller in combination with the FatFs software library to mount a FAT filesystem on the SD card:

    The SD Card Controller has a Wishbone slave port.

    Keyboard and Mouse

    FreeCores has PS/2 keyboard and mouse modules: https://github.com/freecores/ps2

    These cores don’t have a Wishbone slave port, so we’re going to have to add that ourselves.

    Note that the Nexys A7 has a USB HID host interface for keyboard and mouse which, with the help of clever firmware on a PIC24 microcontroller, presents itself to the FPGA as a PS/2 interface. See the Nexys A7 Reference Manual for more details.

    I2C

    The I2C interface can be used to hook up a Real-Time Clock PMOD as well as a Wii Nunchuck Adapter.

    ZipCPU has an I2C core with a Wishbone port: https://github.com/ZipCPU/wbi2c...

    Read more »

  • Key Components Part 2: Graphics and Sound Cores.

    Epsilon05/01/2022 at 11:55 0 comments

    I spent some time researching graphics and sound options for BoxLambda. Here’s what I came up with.

    Graphics

    If you’re reading this, you must be into the build-your-own-computer thing, which probably means you’re aware of the super cool Commander X16 project. Frank van de Hoef created the very elegant VERA (Video Embedded Retro Adapter) module for the X16. Here’s a high-level specification, taken from the Commander X16 website:

    VERA module specifications:

    • Video generator featuring:
      • Multiple output formats (VGA, NTSC Composite, NTSC S-Video, RGB video) at a fixed resolution of 640x480@60Hz
      • Support for 2 layers, both supporting:
        • 1/2/4/8 bpp tile and bitmap modes
        • Support for up to 128 sprites (with inter-sprite collision detection).
      • Embedded video RAM of 128 KB.
      • Palette with 256 colors selected from a total range of 4096 colors.
    • 16-channel stereo Programmable Sound Generator with multiple waveforms (Pulse, Sawtooth, Triangle, Noise)
    • High-quality PCM audio playback from a 4 KB FIFO buffer featuring up to 48kHz 16-bit stereo sound.
    • SecureDigital storage.

    Other features, not mentioned in the blurb, include:

    • Fractional display scaling (scaling lower resolutions up to the 640x480 display resolution).
    • Horizontal and Vertical smooth scrolling

    Lucky for us, Frank recently released the VERA verilog code under the generous MIT license. You can find the code here:

    https://github.com/fvdhoef/vera-module

    I’m not particularly interested in VERA’s PSG (Programmable Sound Generator), or the non-VGA output formats, so I might remove those from the build.

    The 128KB of video RAM will take a big chunk out of our available Block RAM resources, but it’ll be worth it. We’re getting a lot of bang for our buck.

    Note that the VERA is designed as a separate FPGA with a SPI slave interface. Some modifications will be required to integrate it into our SoC.

    Xosera

    I also considered, but eventually dismissed, Xosera:

    https://hackaday.io/project/173731-xosera-fpga-based-retro-video-graphics.

    Xosera is a VERA-inspired video controller, but it is being developed independently by Xarc. I like the Amiga-style Copper processor that they added. Unfortunately, Xosera doesn’t have hardware sprites. That’s a showstopper for me. I’ll keep my eye on this project though. It’s an active project and features are still being added.

    Sound

    A sound core is a perfect candidate for Partial FPGA Reconfiguration. There are a lot of options (Wave-Table synthesis, FM synthesis, PSG…) and a lot of open-source cores available. It would be pretty cool if the software application can just download its synthesizer of choice as part of the program.

    Pretty much any core developed by Jotego sounds like a great idea.

    Technically, I don’t have to select a sound core. We already have sound through VERA’s PCM audio playback. I’m going to select a sound core anyway because I like retro sounds and I’d like to mess around a bit with one of the old-school PSG chips.

    I think I’ll go for a dual YM2149, one for music, one for sound FX, in a game context. The YM2149 was the Atari ST’s sound chip, so we’ll have a large music and sound FX archive at our disposal. Jotego developed an FPGA clone of the YM2149, the JT49:

    https://github.com/jotego/jt49

    Why not VERA PSG?

    The only reason I’m not going for VERA PSG is that, as of yet, very little music has been written for it. I’m sure it is a perfectly adequate PSG implementation.

    Why not SID?

    The SID chip is partially analog, making it much harder to emulate correctly on an FPGA. Also, while I like SID, I’ve probably heard enough SID music to last me a lifetime. I’m currently more interested in finding out what other retro sound chips have to offer.

    Interesting Links

    Read more »

  • Key Components Part 1: Bus, Microprocessor and Memory Controller.

    Epsilon04/23/2022 at 17:08 0 comments

    In the previous post, we discussed top-level requirements. Now we drill down one level, identify key components and apply our requirements to them. We also look around for existing cores or applicable specs that might fit the bill.

    The Bus

    The Bus, or interconnect, is the fabric stitching together the SoC internal components. For this project, the two most relevant SoC internal bus specifications are ARM’s AXI bus and the Open-Source Wishbone bus.

    AXI is very powerful, very popular, and very complex. It scales up well to very big SoCs. However, I don’t think it scales down very well to simple SoCs, such as BoxLambda, where low latency and low complexity are more important than high bandwidth and scalability. Hence, for this project, I’m electing to go with Wishbone.

    We’ll be using the Wishbone B4 specification.

    Sticking to a well-defined internal bus specification certainly helps to meet the Modular Architecture Requirement. Whether we can also accommodate Partial FPGA Reconfiguration using a Wishbone Interconnect remains to be seen.

    The Processor

    Processor Word Size

    Typical processor word sizes are 8-bit, 16-bit, 32-bit, and 64-bit. Which word size is the best fit for Boxlambda?

    • 8-bit: A good word size.
      • Pros:
        • An 8-bit word (i.e. a byte) is a good natural fit for a pixel value, an ASCII character code, or small integer values.
        • 8-bit processors, their programs, and their data are very compact.
        • 8-bit processors side-step some of the alignment issues seen with larger word sizes.
      • Cons:
        • An 8-bit word is too small to conveniently hold the values you need in a typical program - think calculations and table indices.
        • Toolchain support for higher-level languages is limited.
    • 16-bit: A clumsy compromise between 8-bit and 32-bits. Made sense when 32-bit processors were not readily available yet. Now, not so much.
    • 32-bit: Another good word size.
      • Pros: 32-bit words can hold most real-world numbers and cover a huge address space. 32-bit machines generally have good toolchain support.
      • Cons: Much bigger than its 8-bit counterpart, in terms of FPGA real estate, program size as well as data size.
    • 64-bit: A big and clunky word size, way too big to handle conveniently, intended for specialized use cases that don’t fit this project.

    I’ve decided to go for a 32-bit processor. A 32-bit processor (and associated on-chip memory) will take a bigger chunk out of our FPGA real estate, but I think it’s worth it. I like the convenience of 32-bit registers, and a 32-bit processor may come with a regular GCC toolchain.

    Processor Features

    Next to a 32-bit word size, we’re looking for the following features for our microprocessor:

    • Ease of programming, meaning:
      • Easy and well-documented Instruction Set Architectures (ISA). We want to be able to program the machine at assembly language level.
      • Shallow Pipeline: It is relatively easy to reason about the behavior of a processor with a two-stage pipeline. It is not very easy to reason about the behavior of a processor with a six-stage pipeline.
      • Good toolchain support, such as GCC, so we can build a software ecosystem for our machine.
    • An accessible and well-documented implementation.
    • Has to fit our FPGA, with enough space to fit the other components.

    With all that in mind, I think RISC-V is a great option.

    • Great ISA, building on lessons learned from previous popular processor architectures.
    • 32-bit support.
    • GCC toolchain support.
    • Open-Source.
    • Well-documented.
    • Very fashionable. Let’s ride that wave :-)

    There are a lot of RISC-V implementations to choose from. The Ibex project seems like a good choice:

    • 32-bit RISC-V.
    • Hig-quality, well-documented implementation.
    • SystemVerilog based. My preferred HDL.
    • Supports a small two-stage pipeline parameterization.
    • Very active project.

    The Memory Controller

    SDRAM memory access is pretty complicated. Memory access requests get queued in the memory controller, scheduled, and turned into a sequence of commands that vary in execution time depending on the previous memory...

    Read more »

View all 12 project logs

Enjoy this project?

Share

Discussions

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates