2025-10-24

A pin apparently had a manufacturing defect (?) that caused it to open right at the IC package, so the pins had to be manually shorted and mapped in an odd way in firmware. But after that was clear, driving the display through DMA'ed SPI, and the audio amplifier through I²S (PIO) was mostly smooth sailing. The core problem, as we have predicted, would be in efficient video storage and decoding, and in turn, the selection of video codecs.
In the following steps, we will use the animation video for a well-known track, Umiyuri Kaiteitan (ウミユリ海底譚, Tale of the Deep-sea Lily; composed by n-buna, video by Awashima). This animation contains a lot of moving, blurred backgrounds and objects, which render it an ideal sample for quickly profiling codec approaches. (Anecdote: on streaming websites where weekly Vocaloid compilations are released, excerpts from this animation is often taken by the audience as an indicator of video quality.) We further process by scaling down to 240×240 and masking content outside the central circular region (display viewport). Frame rate is kept at 24 fps.
The first experiment is with the QOI format. QOI is a very simple lossless image codec with a decent compression rate comparable to that of PNG. Applying that to our video frame-by-frame, we get a lossless video encoded at ~15 Mbps. A further downscaling by a half (120×120) yields a much more acceptable 4~5 Mbps:

Original scale (240×240) is heavy in computation and only able to run at 12 fps, but at half scale, decoding is fast enough to run comfortably within RP2040's default 133 MHz system clock. Combine that with QOA-encoded audio processed with my previous implementation uQOA, we get a first working prototype. Here is a recording of the result:
Corresponding commit: e6d83ee
We must admit that this is less than ideal. Downscaled video is blurry and still takes a lot of storage (a two-minute video will take 120 MiB), which adds cost and complexity in storage and causes a longer wait time during user uploads.
A straightforward idea is to optimize or modify QOI. QOI works by encoding each RGB pixel with one of the many shortcuts possible, with dedicated optimization for consecutive identical runs. Profiling shows that much of the time is spent in its 64-element hash table serving as the dictionary for recently-seen pixels, but this is largely a tradeoff between space and time (where we aim to optimize both). The encoding scheme specializes in individual RGB8 pixels; modifying this to work in YUV420 will require more extensive work, yet the outcome (performance in time and space) is not easy to predict.
A low-hanging-fruit alternative is MJPEG which achieves 1\~2 Mbps at original scale and, as a rough estimate for now, will be on par with QOI regarding decoding speed (as well as being more flexible and tunable). But as we are already decoding JPEG, why not go for MPEG? Here again, I will be retracing a trodden path.
Ayu
Discussions
Become a Hackaday.io Member
Create an account to leave a comment. Already have an account? Log In.