Close

It just about works, but can’t run DOOM…

A project log for RetroMedleyCard

A retro emulation system on a <=2mm thick business card, using ideas borrowed from other projects and a sprinkling of insanity...

eontronicsEontronics 06/30/2024 at 02:000 Comments

So the last few weeks of trying to patch together some working firmware before the competition ends have been… eventful.

On the plus side, I did finally manage to get the card running an emulator, controlled over USB and displaying to a monitor via a HDMI port!

Robert Doman’s Space Crawler demo for Gameboy running in an emulator on the business card
(Messy) Test setup with PCB connected to USB-C power, a keyboard via a USB-C dongle, a HDMI cable for display output, and full size SD card


However, there were a large number of difficulties along the way, and as a result I’ve had to make some pretty significant compromises (for this version of the board at least). As an overview of what works and what the limitations are (will be explained in detail later on):

  1. Display - the DVI output itself works great, the PCB edge connector seems reliable and my monitor recognises the signal flawlessly. However, the speed of the QPSI bus has been limited to 16MT/s, for 64Mbps throughput, or 52fps at 320x240 resolution in RGB565.
  2. USB - The card is controlled entirely via the USB Host interface of the ESP32-S3, supporting only HID keyboards at present. The RP2040’s USB interface is not configured, and the data wires are the wrong way around on the PCB anyway! Additionally the PCB USB edge connectors are quite temperamental and requires some fiddling to get a reliable data connection.
  3. Retro-Go launcher - about the only thing that works flawlessly is the launcher application built into Retro-Go - probably because I didn’t need to modify it!
  4. Emulators - currently I’ve only been able to get gameboy emulation working under Retro-Go (though there are many emulators I haven’t tested yet). Other emulators crash immediately on startup, which I suspect is due to a lack of memory on the ESP32-S3
  5. DOOM - crashes immediately on startup, which again I suspect is due to limited system memory
  6. Audio output - Retro-Go appears to use the audio stream to govern the emulator speed, relying on the rate the DMA transfers samples to the I2S peripheral. I haven’t been able to get this to run without crashing, and naive attempts to drive the sigma-delta peripheral in software resulted in inconsistent game speeds and horrific sounding audio. So, audio is disabled for the time being :(
  7. “Sockets” -  the audio and SD card “sockets” sort of work. Only certain SD cards actually make contact with the capacitor pads (presumably due to dimensional tolerances), and I misinterpreted a dimension drawing of the 3.5mm TRRS plug, so the tip fails to make contact in the “socket”. 

So all-in-all, it mostly meets my requirements as a minimum viable demonstrator, but there is obviously a long way to go to reach what I originally envisaged when starting the project, and getting to this point at all required overcoming a bunch of hurdles in the development process, which at times felt like complete show-stoppers.


Getting a working display output

Going chronologically, the first major challenge was getting the RP2040 up and running as a video coprocessor, using the ESP32-S3 to program it on-the-fly.

I chose to keep the RP2040 firmware within the flash on the ESP32-S3, rather than on the SD card, as otherwise there would be no-way of seeing errors if the SD card wasn’t connected. UF2 format was used for this as it seemed to give the smallest firmware file (compared to the .bin and .elf files) while also being easy to read data from. 

Difficulties started when trying to connect to the RP2040 over SWD. As mentioned in a previous log, the RP2040 uses a multi-drop SWD interface, which many examples of the SWD interface do not implement. I had been using ataradov’s embedded-swd library, with some modifications to try and generate the initialisation sequence needed for the RP2040’s debug interface. Unfortunately I never had any luck getting a response from the RP2040, and didn’t understand the code well enough to debug what might be wrong with it. So after a sleepless night coding my own bit-banged implementation of the SWD interface (as best as I understood it), simple_swd was born! Which then also promptly failed to connect to the RP2040. After some fiddling with the signal phases, I finally started receiving ACK’s over the SWD interface, and after some researching and copying from the PicoVision’s display driver, I was finally uploading and verifying firmware to the RP2040 successfully. I then never touched that part of the code again out of fear of breaking it.

Unfortunately the firmware itself for the RP2040 also proved to be a challenge. I had hoped to use the RP2040’s direct memory access (DMA) controller to elegantly stream pixels from the QSPI (via Programmable IO [PIO]) interface directly into the frame buffer. However, Retro-Go takes advantage of the ILI9341’s 2Ah and 2Bh commands, which allow it to restrict the region the in frame buffer that it writes to, This has the advantage that a partial display update doesn’t require transferring a full frame buffer to the display, however in my case I would need a way to configure the DMA on-the-fly according to the desired region in the frame buffer, which I simply couldn’t get working. I think this was due to issues synchronising when the CPU reconfigures the DMA channel, as this would need to occur after the DMA had finished transferring pixel data to the current region, but before the PIO buffers started overflowing with pixel data for the next region. This led to lots of interesting graphical artefacts, such as below:

Misalignment of UI elements when trying to use DMA driven frame-buffer. Navigating the menu could also result in discolouration of the image, similar to the 20MHz bus, CPU driven case. Image taken after getting everything else working as a demonstration.


In the end, to resolve this I resorted to directly using the CPU to move pixels into the frame buffer, making it easy to update the display region only once the current stream of pixel data had ended. This “works”, but has a far lower bandwidth than the DMA could offer, and I had to lower QSPI transfer rate all the way down to 16MHz before artefacts were no longer present. At 320x240, 16bpp this gives a maximum theoretical frame rate of ~52fps

Discolouration of image observed when using CPU driven frame buffer transfers at a 20MHz bus frequency. Image taken after getting everything else working as a demonstration.

Dropping the QSPI frequency down to 16MHz results in correct image rendition when using CPU driven pixel transfers to the framebuffer. Image taken after getting everything else working as a demonstration.



Booting into an emulator

With a passable display output, the next hurdle was to actually navigate the launcher and load up an emulator to test.

Of course, as soon as the card started up I was greeted with a convenient “Failed to mount SD card” screen, and no amount of fiddling with the SD card adapter or the PCB “Socket” would change that. All logs on the console simply gave time out errors, meaning a frustratingly possible cause was that the card simply wasn’t making contact in the “socket”, and without any working input on the business card I had no way of dismissing this error to see if the boot process got any further. 

Taking the USB HID host example from the esp-idf, I threw together a quick component to receive input over USB HID, and after some messing about with the code, updating partition table configurations (adding the USB stack meant some apps became too large to fit in the default 1MB partition size) and fiddling with the awkward USB PCB edge connector, I was eventually able to use a USB keyboard as a controller to skip past the error screen.

Finally past the error screen and into the launcher!


Then, while looking for supplies to try and bodge the SD card “socket”, I dug up another SD card which happened to be recognised right away! Presumably the pads on this card were recessed slightly less, so were able to make contact with the pads of the “socket”.


So now that there is working display output, SD card storage and keyboard input, what else is there to do than try running DOOM?

System panic trying to launch DOOM on the business card

Oh.

Maybe try another emulator?

Nope, still crashes.

Gameboy should be easy enough to run, right?

Emulated Gameboy demo running on the business card


Success!

As to why other emulators seem to crash on startup, my current guess is lack of memory, and the Gameboy emulator is the only one I’ve been able to get running so far (though there are many emulators in retro-go that I haven’t tried yet). For a microcontroller, 2MB of RAM seems like quite a bit, but compared to the systems we’re trying to emulate, even a Gameboy Color ROM takes the entire 2MB, leaving no memory for the system to actually run! Other targets for retro-go all seem to have 4MB of PSRAM or more, so having only 2MB is probably pushing it.


Audio performance woes 

While the card has demonstrated being able to run an emulator, unfortunately the initial performance is dreadful. This seems to be due to my naïve implementation of audio output using the sigma-delta modulator peripheral on the ESP32-S3, which blocks the CPU between each audio sample, when it could be updating the emulator state. In retro-go, the emulator speed is governed by the time taken to output a buffer of audio samples, which normally uses the DMA feature of the I2S peripheral to copy audio samples independently from the CPU, relying on the time taken by the DMA to finish transferring a buffer of audio samples (at the specified sample rate) before starting the next emulator tick. 

        for (size_t i = 0; i < count; i++)
        {
            //Get volume
            float volume = audio.muted ? 0.f : (audio.volume * 0.01f);

            //Scale frame outputs by volume
            int16_t left_16 = frames[i].left * volume;
            int16_t right_16 = frames[i].right * volume;

            int8_t left_8 = left_16 >> 8;
            int8_t right_8 = right_16 >> 8;

            //Update sdm channels
            sdm_channel_set_pulse_density(sdm_chan_l, left_8);
            sdm_channel_set_pulse_density(sdm_chan_r, right_8);

            //@TODO: Quick and dirty sleep to next audio frame
            rg_usleep( (uint32_t)(1000000.f / audio.sampleRate) );
        }

Naïve audio output implementation which blocks the CPU between samples. This forces the emulator state to be updated only after all the audio samples have been played, resulting in choppy sound and performance


In my memory limited scenario, trying to use the I2S audio  with DMA causes even the Gameboy emulator to immediately crash, presumably due to insufficient memory to store the audio buffers. This probably isn’t helped by the inclusion of the USB Host stack, either.


So, for the time being, I have decided to replace the audio output with a delay loop controlled by the system timer, giving buttery smooth (albeit silent) performance. While it is disappointing to lose audio output, the stuttering of the emulator and audio output resulted in a terrible experience, and getting smooth gameplay even without audio is significantly better to play. Future versions of the card will inevitably need to have more memory to allow other emulators to run, which should also allow proper DMA controlled audio output to be enabled.

        int64_t audio_period_us = count * (1000000.f / audio.sampleRate);
        int64_t audio_end_time = last_audio_time + audio_period_us;
        while((rg_system_timer() - audio_end_time) < 0) { asm("nop"); }
        last_audio_time = audio_end_time;

Delay loop used to replace audio output. This is governed by a system timer that runs independently from the CPU, so everything runs at a fixed frequency regardless of of how long each emulator tick takes.

Discussions