Prologue

CIAA is the Spanish acronym of "Argentine Industrial Open Computer". It is a collective, open hardware project run by several universities across the country. The goal is to develop and promote cost-effective, industrial embedded systems to fulfil the needs of small and medium-sized enterprises.

The original CIAA hardware integrates an NXP LPC4337 dual-core microcontroller with external SDRAM, Flash memories and standard industrial control hardware peripherals: ethernet, RS-485, Opto-isolated inputs, and relay and FET outputs. A development kit called "Edu-CIAA" targets educational environments. It uses the same microcontroller but has no external SDRAM, Flash or specific input/output capabilities. It is a simple, inexpensive single-chip board with LEDs, pushbuttons, a UART and an integrated JTAG debugger based on FTDI FT-2232.

Edu-CIAA development kit.
Edu-CIAA development kit.

Several local embedded system courses and workshops uses Edu-CIAA as its target hardware. I had a first hands-on approach to this kit when I was taking a Master in Embedded Systems class at the University of Buenos Aires, and I was puzzled by the fact that the second core was unused. I thought it'd be great to work with graphical user interfaces since the entire embedded industry moves in that direction. And it would be awesome to have that capability almost for free by squeezing every bit of power from the same available platform we were already using.

I named that project "RETRO-CIAA" for the resulting graphical retro aesthetics and because an old-fashioned game console concept should be appealing to other students. I have carried it out as final first-year work on said Master, and it is updated and extended ever since.

Development kit microcontroller details

The NXP LPC4337 is a 204 Mhz, dual-core ARM Cortex-M microcontroller. Instead of symmetric cores commonly found on PC hardware, it integrates asymmetric ones: an ARM Cortex-M4F core and an ARM Cortex-M0 core as a coprocessor of the former. The M4F implements a single-precision floating-point unit, some specialized DSP instructions and generally executes processor-intensive tasks; it is where the main application runs. The M0 core is usually employed to assist the M4F in performing asynchronous, concurrent input/output handling without delaying the main application.

136 Kb of internal SRAM and 1 Mb of internal Flash are available, shared by both cores. The entire memory map, including MCU peripherals like the UART, can be concurrently accessed, requiring software arbitration to assure coherent, ordered resource access and usage. NXP provides a generic mechanism for inter-process communication by message passing that is not suitable for this application.

SRAM and Flash divide into various segments of non-contiguous memory areas. There are two 512 Kb Flash banks, Bank-A and Bank-B, and several SRAM banks connected to different microcontroller internal busses. In turn, each core resides on one of those buses. So although nothing is preventing both from accessing all resources since buses are bridged, all accesses do not perform equally fast. Most resources are optimized for (in fact, the official documentation enforces) a given usage pattern that, to make this happen, I will need not follow :)

Project implementation overview

A video adapter is a device that converts an image -or frame- represented in video memory to a signal that a monitor can decode and display. A video mode represents a given frame rate (images per second) and resolution (width and height) measured in pixels. The frame buffer is an area of video memory dedicated to storing the pixels of a single frame. A pixel is a unit of information that contains the colour it depicts, either indirectly by being the table index where the colour information is stored or directly by specifying the additive amounts of red, green and blue components as bits inside the same pixel. A pixel is positioned on the frame buffer memory by its X and Y screen coordinates. The frame buffer may not be the same size as the chosen output signal. In that case, the adapter scales it to fit the video signal resolution.

The primary project goal was to put the Cortex-M0 to work by implementing a software video adapter that outputs a video signal on available GPIO pins. A substantial amount of SRAM is reserved to use as video memory. The Cortex-M4F is in charge of updating the frame buffer by plotting individual pixels or groups in the form of lines, rectangles, text, bitmap tiles, etc. The electronics interface (video connector, signal conditioning) between the video adapter and monitor is inexpensive and straightforward, consisting of buffers and resistors.

HD LCD or OLED TV screens and monitors have been mainstream for over a decade now. These HD screens display VGA (or any standard definition, narrow aspect ratio signals) with black borders or image stretching. Moreover, these screens automatically scale any lower resolution input signal to match the LCD native resolution. The automatic scaling algorithm generally distorts the source image and is even more noticeable if the content is not live video but computer-generated imagery. A signal that closely matches the LCD native resolution will be less distorted; that is why I have chosen to generate a software HD signal on a microcontroller, even if there is no precedent, to the best of my knowledge.

The available microcontroller SRAM is scarce. A full 1280x720 frame buffer requires at least one Megabyte of RAM, so storing a complete video signal frame is out of the question. Besides, the microcontroller is not capable of handling that many pixels. Therefore, I divided the chosen signal width and height by five to set the frame buffer size. That gives an actual maximum drawing resolution of 256x144. The video adapter performs an on-the-fly, frame buffer integer scaling to generate a 1280x720 signal from a source with 25 times fewer pixels. Frame buffer resolution is similar to those on 8-bit consoles and gives the project its "retro" aesthetics. The on-the-fly integer scaling to a higher resolution video signal preserves the original sharp edges and image clarity.

Monitor displaying RETRO-CIAA generated imagery.
Left: Simulation of a low-resolution signal as commonly displayed by an HD screen. Right: On the same screen, RETRO-CIAA preserves the original image fidelity.

The adapter even generates a CRT scanlines simulation.

Monitor displaying RETRO-CIAA generated imagery.
CRT scanlines effect generated on-the-fly by the software video adapter.

It is worth mentioning that some LPC43xx series microcontrollers -but not the LPC4337- have an integrated LCD controller, a peripheral capable of producing video signals on its own. Even so, there is no LCD controller from any MCU model or vendor that can be programmed to perform frame buffer integer scaling, as far as I know. The peripheral requires large RAM and high memory bandwidth to generate a high-resolution signal. Therefore, this software-based video adapter is still an excellent alternative to use on those microcontrollers.

The pixel format is a direct-colour 8-bit per pixel known as RGB332: three bits for the red component, three bits for green and two for blue, giving a total of 2^3 x 2^3 x 2^2 = 256 different colours. I have chosen direct-colour to achieve the fastest hardware implementation possible. Each bit on a pixel can instantly update a group of GPIO pins by writing to the GPIO status register.

Secondary goals were to output analogue audio to a 3.5 mm jack lineout and handle gamepads input. I fulfilled both by using a PCA5001 I2S audio DAC and two DB-9M to connect generic NES gamepads.

A simple and inexpensive PCB that connects to the Edu-CIAA expansion port holds the A/V connectors and electronic components required to interface the new features to the outside world. There is also an SD card reader and WiFi connection through an integrated ESP32 module.

RETRO-CIAA expansion module.
RETRO-CIAA expansion module lying on top of the Edu-CIAA development kit. These expansions are called "Ponchos" in Edu-CIAA terminology.

I have designed the PCB using KiCad, a known open-source PCB CAD design suite.

RETRO-CIAA expansion schematic.
Sample page of the RETRO-CIAA expansion (poncho) schematic.

The firmware, called RETRO-CIAA framework, is written in "C" and designed around a collection of abstraction layers that enforces a clean implementation by following a required interface. There are also standard functions to handle graphics, sound, input, character streams, and block devices. The framework retains the legacy of retro aesthetics from the original hardware, but it is now platform-agnostic: the LPC4337 is just one of its supported platforms. By changing a single parameter at compile-time, one can generate executables of the same application for another supported development kit or even an SDL target running on PC or Raspberry Pi. This feature is convenient since development and testing on the same host computer is faster than constantly uploading binaries to a remote, cross-compiled target.

Implementation details

Drawing on the frame buffer that the video adapter uses to generate the video signal causes image artifacts and screen tearing. As a workaround, single frame buffer systems severely limit the per-frame drawing period to around 5% of the full-frame time. Drawing starts after the adapter ends the transmission of the visible signal and ends before emitting visible pixels again. The most prolonged, non-visible signal period is the "Vertical Blanking", and the start of a vertical blanking triggers a "Vertical Blanking interrupt".

The developed video adapter takes advantage of multicore concurrency using a technique called double buffering that, as the name implies, requires two frame buffers. The Cortex-M0 core generates a video signal from the already drawn contents in one frame buffer. At the same time, the Cortex-M4F will concurrently update a new frame on the other frame buffer. On each Vertical Blanking Interrupt, there is a buffer swap, and the process repeats. This method displays smoother animations, and the Cortex-M4F is unlocked to draw at its full potential. The tradeoff is that video memory requirement doubles. Specifically, two frame buffers require 72 Kb of SRAM.

System features

The NXP LPC4337 supports on-the-fly (applied right on video signal generation) direct-colour operations like bitwise colour filters. It can play back an unlimited (determined by storage size) length of 16 bit, 32 kHz stereo music and full-motion video by streaming data from the SD card.

The framework has support for Unicode text encoded in UTF-8. It includes 8x8 bitmap glyphs for the following code point blocks: basic Latin, Latin 1 supplement, Greek and Coptic, CJK symbols, Hiragana, Katakana and box drawing. So, for example, writing dialogue boxes on RPG games in most western languages and Japanese should be no problem. Support for other languages is easy, just by adding the corresponding glyphs. The firmware implements queues, variants, finite state machines, cyclic buffers and includes "RetrOS", a Cortex-M4/M7 real-time preemptive multitasking operating system developed as part of my Master.

Results

A simple, educational single-chip development kit now incorporates high-performance multimedia capabilities using this new firmware and an inexpensive, open hardware expansion PCB. This project opens a new world of possibilities for thousands of Edu-CIAA boards owned by universities, students or enthusiasts. It also motivated the design of a new, standalone RETRO-CIAA board that integrates the LPC4337 and the expansion components in a single, smaller PCB specifically designed for manufacture.

RETRO-CIAA expansion and standalone.
RETRO-CIAA hardware models. Expansion module for Edu-CIAA on the left. On the right, the newer standalone board.

The framework is platform-agnostic. Others are welcome to port their development kits to it and enjoy programming computer graphics algorithms and retro games without relying on high amounts of external RAM, LCD controllers or display drivers. Using the PC target is also an option: the framework can also perfectly run on a host PC.

Educational value in a pandemic context

Currently, I'm a professor in the UBA Master of Embedded Systems. This project enabled me to outline a proposal for a new course on Embedded Computer Graphics employing the same hardware but with the inexpensive addition of graphics capabilities.

The pandemic cancelled all face-to-face classes. That means every student had to get their materials to follow online courses at home. While getting a locally manufactured Edu-CIAA is convenient in the greater BA area, it is troublesome in remote locations or even abroad. The new, multi-platform framework enables students to use whatever electronics development kit they have or can purchase, giving consistent accesso to LEDs, pushbuttons and even graphics through diverse platforms and development boards.

Future work

It should be simple to port the software video adapter to more dual-core microcontrollers without an embedded LCD controller or external display modules. For example, I hope to see Wolfenstein 3D running entirely on a single-chip RP2040 board alone. Or a port of Tiny2040 DOOM running full-screen on an HD monitor or TV screen.

The LPC4337 microcontroller is powerful enough to implement PlayStation One-like software-based 3D graphics, but this is of limited use since the little RAM left makes it difficult to store large amounts of transformed and projected vertex data. Even so, a software renderer might be an exciting addition to the framework. And newer microcontrollers have more RAM available.

I will continue to improve, refine and document this project. It is certainly fun to work with, and it brings back childhood memories of 8 and 16-bit consoles and arcades. But above all, I believe in the value this project has to offer.

The RETRO-CIAA standalone development kit is a fun and exciting platform for IoT, computer graphics and retro gaming. It's already designed, prototyped and tested, but a community-driven effort is necessary to manufacture it in high quantities at lower prices. It'd be awesome if many people worldwide could enjoy that kit. Let's see what we can achieve! I'll be around to listen to feedback and suggestions from you.


Thank you and have a great day!

------- T-Rex image by Chromium Project, Sebastien Gabriel. BSD License.