It is common to run open-sourced PC games from the '90s on newer embedded systems. Typically, this requires external LCD drivers like the ILI9341 or an LCD controller integrated on the MCU or MPU. The open-source community tweaked and compiled the DOOM source code to run almost everywhere. Meeting or exceeding the game minimum requirements (megabytes of RAM and storage) allows it to run almost unmodified. Massive optimisation efforts, cutting non-essential features and having plenty of storage can lower the requirements. That is how DOOM can also run on a Gameboy Advance with a 17 Mhz MCU and as little as 256 Kb of RAM.

In this project, I've implemented a high-performance, double-buffered software video adapter that integer-scales a low-res frame buffer on the fly while generating an HD video signal. The outcome was a capable and fun hardware platform and framework that I later named RETRO-CIAA for its 8/16-bit graphics aesthetics and retro-console features.

I was willing to push the RETRO-CIAA hardware to its limits, and a port of DOOM came to my mind. Unfortunately, I almost immediately discarded that idea: the most optimised and stripped-down port of DOOM I know of, the GBA port mentioned above, was megabytes of storage and hundreds of kilobytes above what's available: RETRO-CIAA has only 48 Kb of RAM and 1 Mb of FLASH. Also, RETRO-CIAA uses RGB332 direct-colour pixels, not palettised as in the original VGA hardware. I needed a simpler alternative.

The DOOM predecessor, Wolfenstein 3D, may be called the grandfather of first-person shooters (FPS) since it almost invented the genre. And as a matter of fact, I played it a lot as a kid. So I pulled a popular port of the Wolfenstein 3D source code (mostly written in the "C" programming language) and started investigating the feasibility of running it on a memory-constrained platform.

Optimisations and tweaking details

I started by looking at the source code for structures and buffer sizes. Even in a modern port like Wolf4SDL, the use of fixed-sized C99 data types (int32_t, int16_t...) is a rarity, with standard "C" ones (int, short...) being widespread. The problem is that data type size -and therefore memory requirements- changes by architecture and the chosen compiler. Generally speaking, on an n-bit processor, an int or a pointer is n-bit long. Since the developers designed the game for 16-bit machines using Borland 3.1 (an ancient C compiler), they expected integers and near pointers to hold 16-bit values. The same data type used for the same purpose on a 32-bit ARM Cortex will be 32 bit long and waste half the space. So my first effort was to reassign each data type on global variables, structures and buffers to a fixed size while holding its minimum intended value. I have also stored enums to fixed-size variables according to their maximum values since "C" enums have an unspecified length determined by a given compiler.

I followed by removing all dynamic heap memory allocation ("malloc", "strdup", and similar functions) replacing them with static buffers. That is common practice in embedded critical systems since it allows a tighter control on memory usage and completely removes memory leaks and dangling pointers.

Where applicable, I've replaced arrays of boolean values with bit fields, saving seven bits per boolean, and rearranged struct members to avoid automatic memory padding, that is, the inclusion of unused bytes between members to satisfy a given member alignment in memory. The alignment is four bytes on 32-bit architectures, so in a worst-case scenario of several one-byte members, the compiler would waste three bytes of memory between each other for padding purposes. I converted all runtime calculated lookup tables stored in RAM to static "C" data for inclusion in FLASH instead.

I had to unpack sprites and textures, convert them from 256 palletised colours to RGB332 direct-colour and store them in static "C" arrays. This step was necessary...

Read more »