Close
0%
0%

VerilogBoy - GameBoy on FPGA

A Pi emulating a GameBoy sounds cheap. What about an FPGA?

Similar projects worth following
Coding for fun - the hard way. Trying to implement a Game Boy with Verilog. This was my course final project for CMPEN275 (Digital Design Laboratory) at PSU, now it is more like a independent personal project for fun (again). I am trying to keep it well commented and documented.


This project is an open source Game Boy® compatible console Verilog RTL implementation. This project was my course project for the CMPEN 275 at Penn State back in 2017.

Original Goals for the CMPEN 275 course project

This project aims to recreate the whole Game Boy gaming system on an FPGA development board, with the ability to play commercial game like The Legend of Zelda with no major glitches.

To be specific, it should be able to run the unmodified Game Boy machine code, produce gray-scale graphics and output to an external monitor, produce the sound and output to the 3.5mm jack on the FPGA board, and accept user input to control the game. Other functionalities like serial communication and IR communication are currently not part of this project.

System Architecture

The main system architecture is designed as follows (outdated):

There are three major parts needs to be implemented: the Game Boy CPU (8-bit CISC Processor, Intel 8080 like), the PPU (or GPU), the Sound unit. Several interfacing modules are needed to support the IO capability provided by the FPGA development board. Game ROM would be stored in on-board NOR flash, and RAM would be implemented with on-chip Block RAM.

Progress

Able to run Is That a Demo in Your Pocket with sound. See demo video! (Please turn down the volume, I found the signal from the FPGA is too hot for my recording device).

Pokemon Yellow and The Legend of Zelda: Link's Awakening DX are also tested to work with DualShock 2 support ( I should have picked a Nintendo(R) controller instead of Sony(R) one... But that's what I have. ).

After finishing the presentation for the course project, I started refactoring the code, starting from the CPU. I have also built a verilated simulator for it so it could be easily debugged on PC.

Currently the CPU is mostly cycle accurate based on various test ROMs. However the PPU is not. It is capable of running many games already. 

pano_memtest.bit

Help me test if the LPDDR controller is working on the Pano Logic G1, see project log for details

bit - 728.81 kB - 02/24/2019 at 16:42

Download

  • 3 year recap and future plans

    Wenting Zhang02/28/2021 at 20:09 0 comments

    Hi everyone. It's now end of February of 2021 as I am writing this. The VerilogBoy project was started back in February of 2018, and it has been 3 years since I have started this. I would just like to recap what have been achieved and the future plans of this project.

    What has been done

    Original Prototype on ML505

    I started this project as a course project for the CMPEN275 (Digital Design Lab) back in Penn State. I implemented the whole thing on the ML505 FPGA dev board. By the end of the term I was able to get Pokemon running on the board as a demo. I wrote my own PSG, PPU, and timer, but reused the CPU from one of the open source FPGA GB project.

    Unfortunately no recording of the presentation was available, otherwise personally I would really love to watch that again... The only thing I have from that time was a recording showing it running (captured via VGA, didn't capture sound):

    Improving the Code

    The original code works, but not great. It fails lot of the test ROMs available. After learning the internal architectures of the GB CPU (SM83), I rewrote the CPU myself, two times. It should be now cycle accurate.

    To help test everything, I have also created a Verilated simulator that runs on PC. It could loads the ROM and runs the code in the simulated VerilogBoy. On Apple M1 Macs, it could runs up to close to real time speed (4MHz). As a reference, I used isim bundled with Xilinx ISE to simulate the design back in 2018 to aid debugging. It runs at around 2kHz.

    Building a handheld

    I have been working on creating an FPGA based handheld device that would work with my VerilogBoy.

    I build 2 revisions of the device in total:

    Rev 0.0 in September 2018. It was the first revision, with dedicated MCU for controlling the hardware. But I had some serious bugs in the design.

    Rev 0.1 in December 2020. I removed the dedicated MCU because I thought I can just use FSMs and reuse the main GB CPU (turns out it was a bad idea) and fixed several bugs. This is the revision I showed people during the VCF and Latch-up conf.

    See a demo here: https://twitter.com/zephray_wenting/status/1119956214752907264

    After that I was working on designing the rev 0.2, with a dedicated MCU and a dedicated MIPI DSI bridge chip, But I didn't had much time in 2020 to really execute the project. So that stays as an unfinished PCB design.

    Porting to the Pano Logic

    I have a whole series of project logs about porting this thing to the Pano Logic G1 device, which is a small FPGA thin client:

    A detour to the Pano Logic G1 (1) - LPDDR

    A detour to the Pano Logic G1 (2) - Cache

    A detour to Pano Logic G1 (3) - UART & Hard fault

    A detour to Pano Logic G1 (4) - USB

    It worked at last, and I am quite happy about that.

    Going forward

    Possible improvements to the code

    The first thing would be the PPU. I never get to refactor the PPU code. It was quite messy. It also fails lot of the cycle accuracy tests.

    The second thing would be extend the code to support GBC mode. This could be potentially combined with the first one, because the GBC mostly demands an enhanced PPU.

    I do not have plan to do any of these in the near future.

    The handheld

    Though I didn't had much time to play with the handheld design back in 2020, I am hoping to spend more time on it in 2021. I have renamed the handheld to Fobu to avoid confusion, also denoting as a shift of the focus. Rather than a handheld designed to run VerilogBoy, but it would be a handheld targeting 2 use cases:

    1. Handheld FPGA chiptune player

    2. FPGA retro gaming in both handheld and docked mode

    The project would have its own page: https://hackaday.io/project/177963-fobu

  • A detour to Pano Logic G1 (4) - USB

    Wenting Zhang02/28/2021 at 17:31 0 comments

    I promised that there would be an update about the USB, here it is. This update would only talk about the USB 2.0 Host on the Pano Logic G1 devices with ISP1760 USB host controller, so it is probably not applicable to other platforms. The goal here is to write a set of RTL and software stack, so it is possible to use USB HID joystick/ gamepad and USB mass storage devices on the Pano G1. Speed would not be the concern here.

    Hardware Connections

    On the Pano G1, an USB 2.0 high-speed host controller (Phillips ISP1760) is connected to the FPGA via 16-bit parallel memory bus. To mitigate one of the controller’s errata, an USB high speed hub (SMSC USB2513) is connected to one of the downstream port of the controller, and all user accessible ports are connected to the USB hub.

    Overall Architecture

    Though it might be possible to write a FSM to implement some basics of the USB host protocol stack, it is just not very practical. (Device side might be more practical, though). The solution I have here is to continue use the PicoRV32 soft-core processor on the FPGA. The host controller would be mapped to PicoRV32’s memory space as MMIO, then software driver and protocol stack can run on the PicoRV32. When needed, certain outputs can be achieved by using MMIO GPIO (for example, output the currently pressed keys). Hopefully, debugging software would be much easier than debugging hardware.

    Generally, there would be several layers. The bottom part is called as the HCD (Host Controller Driver), which is specific to particular hardware, like the controller chip being used. Then it comes the HD (Host Driver), this is the part handles device enumeration and USB Hub communication, this layer is no longer specific to the platform. Higher than that is the class driver, as name suggests this is specific to a device class, for example, HID class or audio class. The driver might be generic to the whole class, or it might just support one of few specific USB devices. Higher than that, usually it is operating system, or in our case, user application.

    In conclusion, there are 5 things I need to do:

    1. Write the RTL to interface ISP1760 with PicoRV32
    2. Port or write the HCD for ISP1760
    3. Port some appropriate host driver over
    4. Port or write the class driver for HID gamepad and mass storage devices
    5. Write some application code to use the driver

    Let’s discuss all these parts.

    MMIO interface

    This is the easiest part. It is a pretty standard parallel async MMIO interface. Though there are DMA modes, but I am not going to support them anyway so forget about these. The address space is 64KB, so there should be 16 address lines. As the data bus width is 16 bit, and the access is always half-word aligned, the lowest line is omitted, leaving 15 address lines. However, here it has A[17:1], in total of 17 address lines. According to the datasheet, the upper 2 bits are used to denote the currently accessing page, and the device allows opening multiple pages at the same time… Sounds like that of SDRAM. As I said speed wouldn’t be the concern here, so they are never used in my code. Otherwise it’s fairly standard. Just to make it even simpler, I didn’t to the bitwidth conversion logic, the low 16bits of the data bus is connected to the low 16bits of the CPU, and high 16bits are left unconnected. Thus it is occupying twice the address space as it should, and all addresses should be shifted by one to compensate the wasted bitwidth.

    HUBs

    Before I talk about the USB protocol stack, I want to talk more about USB hubs. The architecture of Pano G1’s USB is basically, full of hubs. In the hardware, there is one USB Hub chip, but that’s not all.

    According to the EHCI (Enhanced Host Controller Interface) specification (the standard USB 2.0 host controller specification), a USB 2.0 high speed host controller would be only able to communicate with high speed devices. No full speed or low speed devices are possible here. To implement an USB host that is compatible to both types of devices,...

    Read more »

  • Demo firmware for Pano Logic G1

    Wenting Zhang04/14/2019 at 17:11 1 comment

    Usage:

    1. Flash the flash.mcs into your Pano G1 using iMPACT. Warning: This will overwrite the SPI Flash content (like original firmware) back it up if you wish to restore back.
    2. Format a USB thumb drive to FAT16 or FAT32, put game files (*.gb) into the root directory, plug it into the Pano G1.
    3. Find a USB HID compliant controller (like DualShock 4), plug it into the Pano G1.
    4. Power on the Pano G1 and follow the on screen instructions.
    5. If you encounter issues with USB (showing lots of NACK on screen), try reboot the PanoG1 (pressing the button) or try another USB device.

    There are lot of bugs. Feel free to open up issues in the GitHub.

    Download:

    https://github.com/zephray/VerilogBoy/releases/tag/v0.1

  • Upcoming demos and talk

    Wenting Zhang04/10/2019 at 00:04 0 comments

    Hi all,

    It has been another 2 weeks since my last update. A lot of things are going on for this project:

    The USB host stack is working, with working USB HID driver and USB Mass Storage Driver

    The refactored VerilogBoy CPU is working now, passing all unit tests from the first revision, plus Blargg's CPU test.

    These two together, means there are some games actually running on the machine:

    I am planning to do a writeup about the USB, and in the future maybe unit test, but currently, most important:

    I will be doing a demo of VerilogBoy at VCF SE during Apr 27-28 and LatchUp during May 4-5.

    I am also going to give a talk about this project at LatchUp.

    So now the top priority for me is to get these things right, means I would probably not have time to write something here. Anyway, thanks for reading this quick update, and hope to see you guys at VCF SE and LatchUp!

  • A detour to Pano Logic G1 (3) - UART & Hard fault

    Wenting Zhang03/26/2019 at 00:40 0 comments

    It has been a while since my last update. I have been working on the USB stuff for Pano Logic G1, mainly for connecting to joysticks and flash drives. I was concerned that my LPDDR and cache would cause me some trouble when the code is being executed in RAM, but so far they are holding up well. I will talk more about them in the next log. In this log I would like to talk a little about some debugging utilites, namely, the UART and the hard fault.

    UART

    UART is very handy when you want to see logs from the device. At first I thought I can get away just by using VGA text terminal, but it soon turns out that 80x30 text is simply not enough. Unfortunately the Pano Logic doesn't have any serial ports. From the schematics, it seems that they originally have one, but was removed after some revisions. But anyway, I have to repurpose the IOs to create myself a serial port to use.

    This is not new to Pano Logic, Skip Hansen from PanoMan project has already done this: he soldered a wire to the LED pin and get the serial output from there. For me, as mentioned in one previous log, I do not have soldering iron with me currently, so I need to find some other way.

    As an alternate, I used the wire clip come with my logic analyzer. They can be attached to through-hole components easily, such as this VGA connector.

    I am using VGA SCL pin for the serial port. I wrote an extremely simple UART transmitter to transfer the data: https://github.com/zephray/VerilogBoy/blob/refactor/target/panog1/fpga/simple_uart.v. Why I don't just use Skip's UART transmitter core for Pano Logic G1? Or why I don't just work on top of Tom and Skip's project? Well, the (stupid) answer is the same regarding why I picked PicoRV32 rather than VexRiscv: I have decided (long ago) to call this project VerilogBoy, so all the source code should be written in Verilog, not SpinalHDL. Yes I am also aware there are tons of better open-source UART controller written in Verilog available online. I am probably just too lazy to find one considering I don't need other fancy functions anyway.

    Using this UART transmitter also very easy, hook it up to the bus and write to the only register available: data register. It has too possible operation modes depending on the way it is connected. If the ready signal is connected to the ready signal of the bus, the UART transmitter would block the code execution until it finishes the transmission. Means the UART print function can be as easy as:

    #define UART_TXDR *(volatile uint32_t *)0x03000100
    void uart_print(char *str) {
        while (*str) UART_TXDR = *str++;
    }

     There is no need to worry about if the transmission has finished or FIFO overrun whatever, since it simply throttle the write speed to transmission speed.

    Alternatively, if the ready signal is connected to the external interrupt input of the CPU, it can generate a end of transfer interrupt, then a software FIFO can be implemented.

    Hard fault

    Are you familiar with the Hard Fault in microcontrollers, or Segmentation Fault in Linux, or "The program has stopped working" in Windows? Usually they are referring to the same thing: the code is accessing some memory they shouldn't touch. Most commonly, dereferencing a null pointer (which is nothing more than a pointer points to address 0). I hate this error: they are so common, but not easy to debug. However, I have to admit, it is the one who points out the bug in my code, debugging would be even harder without anyone telling me where goes wrong.

    (It simply hangs when such things happens. It can be hard to determine who caused the issue on real hardware.)

    Unfortunately, the processor core itself cannot detect such errors, as it doesn't really know what is right or wrong. Generally, MMU or MPU is in charge of this: the program defines the valid address range (physical address or virtual address, depending on the specific environment.). The MMU or MPU would then generate an exception when an illegal memory access happens. On systems without...

    Read more »

  • A detour to the Pano Logic G1 (2) - Cache

    Wenting Zhang03/02/2019 at 16:48 0 comments

    As one of the conclusion of the last log, in order to use the LPDDR memory in 32-bit mode on the Pano Logic G1, a cache is almost a must. Sure I can just use 16-bit mode, half the capacity (16 MB) isn't really an issue for me... But I still decided to just implement a cache, it shouldn't be that hard.

    So, as a result, I have got cache working on Pano Logic G1. It is a 8-KBytes 2-way set-associative cache. Replacement policy is LRU and write policy is write back. (The whole point of having a cache is because write through is almost impossible given the data mask cannot be used.) It is connected between the PicoRV32 CPU and the MIG memory controller, so all read and writes to the LPDDR is cached. I won't go into details about the cache since I feel like there is nothing special worth talking about except being slow and inefficient. I will add an bus master arbiter between the PicoRV32 and the cache in the future, so the GameBoy CPU could access the LPDDR as well. Though one need to keep in mind this is only a 2-way set-associative cache, having multiple masters would lead to very questionable performance.

    So, what about the performance? Currently:

    • Read hit: 2 cycles
    • Write hit: 2 cycles
    • Read miss: 4 cycles + memory read latency
    • Write miss: 5 cycles + memory read latency
    • Read miss + flush: 12 cycles + memory write latency + memory read latency
    • Write miss + flush: 13 cycles + memory write latency + memory read latency

    So you can see.. The cost of missing is high, and the cost of flush is very high.

    Also, due to my bad coding and the limitation of Spartan-3E's block RAM (it does not support byte enable, which is important for a cache that allows byte enable), compared to 16-bit non-cached version, the whole design uses 1500 more LUTs. I assume mostly comes from the cache, and some from the 32-bit memory controller.

    But any way... It Works™.

    How to use:

    Make sure g++, riscv32-unknown-elf-gcc, and ISE 14.7 are installed. The RV32 gcc should support march=rv32i.

    Clone the VerilogBoy GitHub repo, check out commit b08377d (Merge branch 'refactor').

    Run the following command:

    cd tools/bin2mif
    g++ -o bin2mif bin2mif.cpp
    cd ../../target/panog1/fw
    make
    cp *.mif ../fpga

    Go to target/panog1/fpga, open the project pano_top.xise with ISE 14.7 and generate programming file.

  • A detour to the Pano Logic G1 (1) - LPDDR

    Wenting Zhang02/24/2019 at 16:42 9 comments

    As I have mentioned in the previous update, I am still having some trouble with the MIPI-DSI. Currently I do not have access to any soldering tools, so the plan of making a new revision of prototype need to be postponed. In the meanwhile, I thought it might be a good idea to continue working on the RTL – I started refactoring the code but haven’t finished yet. But, I need a hardware platform to test. Well, I forgot to bring my FPGA development board (Xilinx ML505, I really loved that board) with me when I came back from Christmas holiday… But no problem, I got myself two Pano Logic thin clients (G1 and G2) last year. Though I have to admit, I didn’t do much with these units after I got them. Now the time has come, let’s take a look.

    We have something to hack

    (Image showing is my own Pano Logic G1)

    In case you are not familiar with them, let me introduce them first. They were originally thin clients, used to connect to remote desktop servers. What is special about Pano boxes are, they are powered by FPGAs, rather than ARM or x86 CPUs commonly found on a thin client. They advertise it being a “Zero Client”, means there is no (zero) software running on the client. Well, unfortunate for them, they went bankruptcy in 2013. What is fortunate for us is that, these units now become useless for companies originally bought them, being sold for very low price on places like eBay. It is our turn to repurpose these devices! Of course, hackaday has already featured it for several times: https://hackaday.com/2013/01/11/ask-hackaday-we-might-have-some-fpgas-to-hack/, https://hackaday.com/2018/12/07/racing-the-beam-on-a-thin-client-in-fpgas/, https://hackaday.com/2019/01/11/pac-man-fever-comes-to-the-pano-logic-fpga/, and https://hackaday.com/2019/02/11/two-joysticks-talk-to-fpga-arcade-game-over-a-vga-cable/.


    As far as I know, there are 3 generations of Pano Logic clients, the first two looks very similar, and the third is slimmer. Unfortunately I have never seen a slim model on the eBay. If you know anything about the slim model, please tell me, I am interested. The first generation (G1) model is powered by a Xilinx Spartan-3E XC3S1600E FPGA (1600K system gates, translate to around 30K LUT4s.), with 32MB of on-board LPDDR RAM. The second generation model, depending on the revision, is either powered by a Xilinx Spartan-6 XC6SLX150 (Rev. B) or Xilinx Spartan-6 XC6SLX100 (Rev. C), both with 128MB of DDR3 memory. The one I own is a Rev. C one. Both generations has already been reversed engineered by the community, notably cyrozap, twj42, and Tom Verbeure. You may find more information about details of the Pano boxes here: https://github.com/tomverbeure/panologic, and https://github.com/tomverbeure/panologic-g2.

    Now, which generation should I focus on? Gen 2 is significantly more powerful, however getting harder to find on eBay. The Gen 1, powerful enough for my purpose, and can still be purchased easily on eBay. I decided I want more people being able to play with my VerilogBoy code (if any), so I will go with Gen 1. Also, all the framework I developed for Gen 1 devices might help others looking into playing around with their own G1s.

    Talking about the G1...

    Then, the gen 1 has already been reversed engineered, someone even published its schematics online, it should be trivial to just port the existing code to the G1, right? No. There are still several issues to be solved:

    • The G1 does not have any GPIO for user. In order to attach a game controller, one would need to repurpose some of the IOs (like in the panoman project, he used the I2C from VGA port), or use a USB joystick. Which means I need a host-side USB stack running on a soft core on the FPGA. As I have said, I hope more people can play this, so I will go with the USB solution.
    • The G1 doesn’t have any on-board storage large enough to hold...
    Read more »

  • Revised handheld hardware architecture

    Wenting Zhang02/02/2019 at 14:36 3 comments

    This is a project update about the hardware side of the VerilogBoy Handheld.

    After testing the previous prototype (Rev 0.1), I feel like several changes are required:

    • Add a dedicated DPI-to-DSI bridge chip. My poor implementation of the D-PHY transcevier simply couldn't meet the signal integrity requirement. This is an experimental change to test how much it can improve without doing major change to the board (for example, moving to 6-layer, adding decoupling capacitors that would interfere with overall structure, etc.).
    • Replace the Micro-Type-B USB socket with Type-C socket. No Alt Fn or PD support is planned, just working under USB 2.0 FS slave (sink) mode.
    • Add a microcontroller to handle hardware initialization, RTC, and USB FS communication (for example, flashing new firmware to the on-board SPI flash.
    • Fix various incorrect component footprint.

    Here shows a revised hardware architecture. This is probably overly complicated for a hobby project.

    Due to the closure of PCB manufacturers because of the Chinese New Year, we are probably not going to see the new prototype (Rev 0.2) in the Feburary. I will continue working on the HDL side of the prototype and trying to finish the CPU refactoring within the coming weeks.

    For the time being, thanks for reading.

  • Hello from the DSI screen

    Wenting Zhang01/28/2019 at 17:41 0 comments


    Here it is. 320x320 IPS MIPI DSI screen. DSI running at 268 MHz (256 MiHz), that's exactly 64 times the Game Boy pixel clock rate (4 MiHz). With every GameBoy pixel quadrupled, and every pixel being 16 bits (FYI, the GBC uses 15bpp), it is going to transmit the pixel output from the PPU to the screen perfectly in sync. DSI controller is implemented in FPGA. A custom boot ROM running on the GameBoy CPU takes care the DSI controller initialization as well as screen initialization. More details about DSI is coming.

  • The Assembled PCB.

    Wenting Zhang01/14/2019 at 02:11 1 comment

    Unfortunately due to the system design, I have to finish refactoring the CPU first before I can do some practical tests (which all need the BootROM to run on the CPU within the FPGA). But, at least, here is a nice looking board!

View all 16 project logs

Enjoy this project?

Share

Discussions

Mesbah Uddin Mohammed Arif wrote 03/25/2019 at 18:24 point

the PCB looks great .. what CAD software did you use ?

  Are you sure? yes | no

Wenting Zhang wrote 03/25/2019 at 20:36 point

It was designed using PADS PCB, though I am planing to migrate to KiCAD in the next revision.

  Are you sure? yes | no

ivan003003 wrote 03/04/2019 at 21:52 point

This project is awesome !

  Are you sure? yes | no

Wenting Zhang wrote 03/25/2019 at 20:36 point

Thank you.

  Are you sure? yes | no

Dillon Nichols wrote 02/13/2019 at 19:07 point

This is way more ambitious than my CMPEN275 project. In fact, it's probably more ambitious than I would even tackle now as a hobby project. Regardless, I'd love to make one. I'll have to check back in the future and see how your design is going.

  Are you sure? yes | no

John Beaton wrote 02/13/2019 at 18:15 point

Well done! it's great to see projects like this, and to see your persistence too. Keep up the very interesting work.

  Are you sure? yes | no

Selina Zawacki wrote 01/28/2019 at 18:26 point

Looking forward to following your progress on this!

  Are you sure? yes | no

David Scholten wrote 01/13/2019 at 04:52 point

I'm just a casual viewer that drops by for the "fun" videos every now and then, but I'd just like to comment that I'm looking forwards to seeing your final integrated FPGA-boy unit one day/month/year. Keep up the good work and know that your Hackaday page will be loved by future employers for projects like this.

  Are you sure? yes | no

Wenting Zhang wrote 01/14/2019 at 02:07 point

Thank you. I am still working on this project. I am planning to release some new videos when the unit could at least run some demo, but that is probably still a few months from now.

  Are you sure? yes | no

David Galloway wrote 04/16/2018 at 07:32 point

Firstly. nice effort! You got it further than most.  Second, thank you for calling it an 8080 type of CPU ! kudos for that. Thirdly, for a nice technical article on band limited sound synthesis on the GB look here - > http://www.slack.net/~ant/bl-synth/
 - David

  Are you sure? yes | no

Wenting Zhang wrote 04/18/2018 at 03:11 point

Thank you both for the comment and the link to that article.

  Are you sure? yes | no

Similar Projects

Does this project spark your interest?

Become a member to follow this project and never miss any updates