Close

MPEG-1 preliminary experiment

A project log for Macchiato DX: Practical video badge with RP2040

USB-uploaded MPEG-1, 240×240 colour @ 30 FPS in a compact form factor

ayuAyu 12/11/2025 at 17:560 Comments

2025-10-26

As I looked into QOI, I found out that the designer of the format, Dominic (a.k.a. PhobosLab), has an article on their attempt to create a simple MPEG-1 decoder library (pl_mpeg). Coincidentally, in the aforementioned attempt, the author Ben approached their solution with exactly this library.

MPEG-1 was the standard for VCDs, which held an entire film on a 700-MB disc. That should be quite efficient, and indeed, at 1 Mbps the visual artefacts are still less disrupting than MJPEG. It is also highly probable that MPEG-1 will decode faster than MJPEG, due to reuse of image blocks reducing the computation-intensive steps (mostly 8×8 IDCT) involved.

That well addresses the storage-efficiency requirement. Decoding is more of a concern, as Ben had to modify the library and switch to greyscale for 240×240. Upon closer look, the biggest hurdle is the three frame buffers used: one for the current frame, one for the forward-prediction (P) frame, and one for the backward-prediction (B) frame. Each takes 240×240 (L) + 120×120×2 (Cb, Cr) = 86400 bytes. While Ben chose to eliminate the Cb and Cr planes, I think we can get by without B-frames, which also reduces memory footprint by a third, but without the compromise in video appearance.

Off to porting. Dropping the B-frame buffer was mostly straightforward. I then modified the library to take a user-supplied large buffer for frames (as that will make memory allocation/reuse, DMA'ing, etc. more flexible later into development and optimization).

Meanwhile, the library apparently was not designed with embedded environments in mind, as the dynamic allocations present hindrance: apart from the large frame buffers, there are smaller bitstream buffers, one of which grows dynamically with differently-sized packets. That is less than ideal, but does not stop us from running a preliminary test.

On RP2040 single-core at 133 MHz, decoding a four-second excerpt (frames 1476~1572) from the Umiyuri animation takes 6521 ms (1.63× actual duration), excluding final conversion from YUV420 to RGB565. (Corresponding commit: ebdc3f2)

This fell short of the real-time goal, but not by much either. First, RP2040 has two cores, which in the best case can cut decoding time by a half; as an optimistic estimate, that already fits into our time budget. Furthermore, pl_mpeg encapsulates the decoder state in a large struct and passes it around, resulting in a lot of redundant pointer indirections in deeply-nested subroutine calls (e.g., the plm_buffer_t * pointer is dereferenced every time in plm_buffer_read(), called from a wide range of subroutines). In an embedded context, we need only a singleton, so that probably can be optimized. It also seemed that there could be a faster, lighter algorithm to decode the bitstreams, without the growing buffers.

I would admit that I am quite visually-driven; I surely aspired for the best possible playback within the constraints. Given the vast possibility of optimizations, I made the audacious decision to diverge onto the path less travelled by: to write my own decoder. The hypothesis is that a decoder working with static buffers, in singleton mode, with optimized bitstream readers, can complete the heavylifting with considerably less strain; we will see whether this works out.

Discussions