Introduction

It is common to run open-sourced PC games from the '90s on newer embedded systems. Typically, this requires external LCD drivers like the ILI9341 or an LCD controller integrated on the MCU or MPU. The open-source community tweaked and compiled the DOOM source code to run almost everywhere. Meeting or exceeding the game minimum requirements (megabytes of RAM and storage) allows it to run almost unmodified. Massive optimisation efforts, cutting non-essential features and having plenty of storage can lower the requirements. That is how DOOM can also run on a Gameboy Advance with a 17 Mhz MCU and as little as 256 Kb of RAM.

In this project, I've implemented a high-performance, double-buffered software video adapter that integer-scales a low-res frame buffer on the fly while generating an HD video signal. The outcome was a capable and fun hardware platform and framework that I later named RETRO-CIAA for its 8/16-bit graphics aesthetics and retro-console features.

I was willing to push the RETRO-CIAA hardware to its limits, and a port of DOOM came to my mind. Unfortunately, I almost immediately discarded that idea: the most optimised and stripped-down port of DOOM I know of, the GBA port mentioned above, was megabytes of storage and hundreds of kilobytes above what's available: RETRO-CIAA has only 48 Kb of RAM and 1 Mb of FLASH. Also, RETRO-CIAA uses RGB332 direct-colour pixels, not palettised as in the original VGA hardware. I needed a simpler alternative.

The DOOM predecessor, Wolfenstein 3D, may be called the grandfather of first-person shooters (FPS) since it almost invented the genre. And as a matter of fact, I played it a lot as a kid. So I pulled a popular port of the Wolfenstein 3D source code (mostly written in the "C" programming language) and started investigating the feasibility of running it on a memory-constrained platform.

Optimisations and tweaking details

I started by looking at the source code for structures and buffer sizes. Even in a modern port like Wolf4SDL, the use of fixed-sized C99 data types (int32_t, int16_t...) is a rarity, with standard "C" ones (int, short...) being widespread. The problem is that data type size -and therefore memory requirements- changes by architecture and the chosen compiler. Generally speaking, on an n-bit processor, an int or a pointer is n-bit long. Since the developers designed the game for 16-bit machines using Borland 3.1 (an ancient C compiler), they expected integers and near pointers to hold 16-bit values. The same data type used for the same purpose on a 32-bit ARM Cortex will be 32 bit long and waste half the space. So my first effort was to reassign each data type on global variables, structures and buffers to a fixed size while holding its minimum intended value. I have also stored enums to fixed-size variables according to their maximum values since "C" enums have an unspecified length determined by a given compiler.

I followed by removing all dynamic heap memory allocation ("malloc", "strdup", and similar functions) replacing them with static buffers. That is common practice in embedded critical systems since it allows a tighter control on memory usage and completely removes memory leaks and dangling pointers.

Where applicable, I've replaced arrays of boolean values with bit fields, saving seven bits per boolean, and rearranged struct members to avoid automatic memory padding, that is, the inclusion of unused bytes between members to satisfy a given member alignment in memory. The alignment is four bytes on 32-bit architectures, so in a worst-case scenario of several one-byte members, the compiler would waste three bytes of memory between each other for padding purposes. I converted all runtime calculated lookup tables stored in RAM to static "C" data for inclusion in FLASH instead.

I had to unpack sprites and textures, convert them from 256 palletised colours to RGB332 direct-colour and store them in static "C" arrays. This step was necessary to draw them on the RETRO-CIAA RGB332 frame buffer directly. The game needed a conversion from narrow 320x200 to wide 256x144, including aspect ratio correction and the reworking of game menus and splash screens.

Wolfenstein 3D screenshots running on PC VGA and retro-ciaa.
Wolfenstein 3D screenshots. Left: PC VGA, 320x200 as seen on a narrow 4:3 display. Right: RETRO-CIAA RGB332, 256x144, 16:9 widescreen.

On palletised display modes, the developer can progressively change palette colours to lighten, darken, fade or colour-cycle on-screen bitmaps without touching the actual in-memory pixels, which is an expensive operation. These effects are more troublesome on direct-colour modes since each pixel in the frame buffer explicitly stores its colour information. I implemented these effects on the RETRO-CIAA software adapter by applying boolean operations to the value of each pixel on the fly when emitting the video signal. I've also used the RETRO-CIAA framework tile drawing functions to make a new, overlaid in-game GUI to adapt it to the new widescreen aspect ratio.

I designed the RETRO-CIAA framework to be multi-platform by using a system abstraction layer. To speed up development and test all game aspects while exceeding the maximum RAM limits on RETRO-CIAA hardware, I made the first development steps by compiling the firmware to run on my development PC using a generic SDL system target. When It seemed it was not far to meet the memory constraints, I've started to compile it to run on the actual RETRO-CIAA hardware. First compilations exceeded RAM and FLASH limits by about 300%, so it wouldn't succeed at linking to produce a valid executable.

At this fine-tuning stage on the actual target hardware, I compiled asking the linker to output symbol arrangement (functions, variables) to identify what memory sections they reside and what size they are. GCC outputs that information using the parameter "-Wl,-Map=output.map". I then looked for variables; symbols generally arranged in the ".bss" and ".data" sections. Finally, I've summed them up, identifying candidate symbols to optimise further by using the discussed techniques and some more obscure ones.

I kept iterating around until the game itself fitted comfortably on 48 Kb, including the RETRO-CIAA framework overhead on interfaces and audio buffers.

Results

The original Wolfenstein 3D ported with full graphics, music and sound effects, running on 48 Kb of RAM on a single-chip, software video adapter that integer-scales a low-resolution frame buffer to an HD signal. A low widescreen resolution (ideal for pixel-art) and the optional scanlines generated on the fly by the RETRO-CIAA framework work together to emphasise the retro aesthetics of the original. The game preserves a pixel-perfect resolution and sharp edges when displayed on an HD screen.

Closing remarks

As with every project, I started highly motivated and willing to overcome any odds. The goal seemed almost impossible to achieve at some point, but methodically tackling every constraint has been the key.

No special hardware is required to run this port. The RETRO-CIAA framework can also compile for an SDL target on anything where SDL runs, but running this port (or any IoT or computer graphics project) on the single-chip RETRO-CIAA platform is a lot of fun! I have already designed, prototyped and tested the system, but a community-driven effort is necessary to manufacture it in high quantities at lower prices. I'll be around to listen to feedback and suggestions.

Finally, owners of any microcontroller development kit (especially RP2040) are welcome to contribute a new system implementation to the RETRO-CIAA framework. That will allow your platform of choice to run anything RETRO-CIAA currently runs, including Wolfenstein 3D, given your platform has at least 48 Kb of available RAM.

I'll be releasing the initial version of the RETRO-CIAA framework in the following days. I'll announce it here first, so be sure to follow this project.

Have a great day!