Entry 29: A Little Bit of This, and a Little Bit of That...

A project log for Aiie! - an embedded Apple //e emulator

A Teensy 4.1 running as an Apple //e

Jorj BauerJorj Bauer 02/10/2022 at 21:450 Comments

As I've been writing in the last few updates: I've been working on support for the RA8875 display - so the next generation of the Aiie will have a display that can accommodate the 560 pixels wide that the Apple //e has in "double hi-res" modes (80-column text, double-low-res and double-hi-res graphics). That's all because without it, AIie has been doing some janky hacks to display those graphics modes on a panel that's only 320 pixels wide.

Most of what I'd think someone uses a handheld //e emulator for doesn't involve 80-column text, so until now I've sort of ignored the problem. It wasn't until I heard from Alexander Jacocks about how he was building an Aiie that the topic came back to the fore. I opened up a discord for us to chat, and we've been talking about what's lacking in the current build... and of course the display was the number one hot item.

We've spent a couple months talking about the hardware and software with a few other people that have also joined our discord, and (as I've written) we've got the 800x480 panel up and running at about 14 frames per second.

It's looking like we won't get past that point. We're pushing the display's (single) SPI bus as fast as it will go. It's possible that some hack can get a little more out of it (if we abandon the display outside of the "apple" screen area then it might be possible to only update the 560x192x2 pixels of the "Apple Screen"); and I've got some framework for only updating the parts of a screen that have been modified by the Apple emulator... but when all is said and done, if a full-screen game is updating the whole screen, there's not much you can do about the lack of bandwidth.

I think that's kinda okay. If I rebuild the PCB so it can accommodate either the ILI9341 320x240 display *or* the HA8875 driver from Adafruit with a 4.3" 800x480 display, then the user can choose -- do I want 30 frames per second with the smaller display and some graphics issues at higher resolution, or do I want all the pixels at half the speed? Putting the choice back to the builder feels like a reasonable trade-off to me.

Which brought me to the next crossroad. I don't want to abandon the folks that have already built an Aiie. The original Mk 1 is a dead end, unfortunately, because of lack of CPU to do what I wanted. But the Mk 2 has plenty of capacity and I really don't want the addition of a new display to strand folks; I still haven't taken advantage of everything those have to offer! How can I continue to support the Mk2 platform without having to fork the software?

Well, that's not too hard actually. Since there is plenty of space in the Teensy, it doesn't mind having two copies of the graphics and two display drivers built in. Take one of the unused pins from the Mk2, turn it in to a jumper or switch, and /Voila/ you've got selectable displays. Buy both if you want, and swap them as necessary. (This may not be ideal when we get to having an actual case, but for now at least it's plausible.)

From there I dropped back to the *nix variants of Aiie. I do most of my development and debugging on a Mac, using SDL libraries to abstract the windowing. It had been doubling the resolution of the ILI panel... but I've undone that. Now that the Teensy code supports two different displays with different resolutions, the SDL wrapper does the same... and when you're running it with the ILI ratio, it's natively 320x240 and not 640x480. Which means that, among other things, it became very ugly very quickly... and now this problem that has existed on Aiie since the start suddenly became a priority for me.

My first take at this was to logically "or" every two pixels together. If either of them is on, then the result is "on".

The text is sort of legible... but that white rectangle with the three dots in it is the letter 'a', inverted. As long as we're talking about black-on-white text it's... meh, probably okay.

Next up we have straight linear average: average the R, G, and B components to figure out what pixel we're going to draw. It winds up looking very similar.

It's hard to see, but there is a subtle improvement there - this is the inverted 'a' when you get up close and personal with it:

Not ideal, certainly. But it's something. And it got me thinking that the right thing to do here has to do with the way we actually see colors.

The RGB color space is what's called an additive color space. You add the RGB lights together in order to get white. It's not intuitive though: if you add red and green, what color do you get? Yellow. If you add Yellow and Blue what do you get? White. (Yellow is Red + Green; so Yellow + Blue is the same as Red + Green + Blue; and that is white.)

The HSV color space (Hue, Saturation, and color Value) was made as an attempt to model how we perceive color. It separates the color (which is basically the Hue) from the brightness of that color (the Value) in a way that lets you manipulate the color more naturally without accidentally changing the brightness.

If you want more of the background, take a look at the Wikipedia article on HSL and HSV color spaces. For our purposes, let's jump to the end here: what I want is to represent, in one pixel, some combination of what two pixels are actually trying to show. In order to do that I need some blended pixel data, and that sounds a whole lot like I should be using HSV.

The problem I had here is that I'll have to do a bunch of math on every pair of pixels, every time we need to draw them. My quick attempts bogged down the Teensy badly, and so I left the ILI panel using RGB averaging as a "close enough for now" solution.

But now I'm looking at the SDL port, where I've got the full CPU of a Mac to play with! How does it look, I wonder? Well a few algorithms later... convert RGB to HSV for both pixels; average the two H and S and Vs; convert back to RGB, and throw that pixel back to the display...

As George Takei might say... "Oh, My."

Here's the same at full resolution, for comparison:

You can see that the full resolution is nicer... but if you had never seen it at full resolution, the half-resolution version is really not all that bad.

So... how do I get that sexiness on to the Teensy?? How do I not overload the CPU?

Let's follow the logic. The reason it's slow is because it's doing a lot of calculation. The reason it's doing a lot of calculation is because the algorithms for RGB/HSV conversion are kind of messy (they're not well optimized for computers to do them). The reason I have to do so many calculations is because the driver is trying to mix two arbitrary colors.

But there are only 16 possible colors. The SDL port might be using 24-bit colors to show them on the screen, but the Apple //e only knew about 16 actual colors. Which means that we only ever have to blend 16^2 possibilities -- with 256 possible outcomes. That's small enough to make a look-up table! Assuming that we start with these 8-bit colors:

static const uint8_t palette8[16] = {
  0x00, // 0 black                                                                                               
  0xC0, // 1 magenta                                                                                             
  0x02, // 2 dark blue                                                                                           
  0xA6, // 3 purple                                                                                              
  0x10, // 4 dark green                                                                                          
  0x6D, // 5 dark grey                                                                                           
  0x0F, // 6 med blue                                                                                            
  0x17, // 7 light blue                                                                                          
  0x88, // 8 brown                                                                                               
  0xE0, // 9 orange                                                                                              
  0x96, // 10 light gray                                                                                         
  0xF2, // 11 pink                                                                                               
  0x1C, // 12 green                                                                                              
  0xFC, // 13 yellow                                                                                             
  0x9E, // 14 aqua                                                                                               
  0xFF  // 15 white                                                                                              

the mixture of each color with one of the other colors precomputes to

static const uint8_t mix8[16][16] = {
0x00, 0x29, 0x28, 0x2D, 0x28, 0x49, 0x28, 0x6D, 0x24, 0x69, 0x24, 0x4D, 0x6D, 0x6D, 0x6D, 0x6D,
0x29, 0xA1, 0x62, 0xA2, 0x02, 0x52, 0x42, 0xAB, 0x0E, 0x3B, 0x8A, 0xC6, 0x0B, 0x37, 0x47, 0x7B,
0x28, 0x62, 0x02, 0x42, 0x0E, 0x31, 0x0A, 0x2B, 0x0C, 0x18, 0x46, 0x87, 0x16, 0x39, 0x32, 0x79,
0x2D, 0xA2, 0x42, 0xA3, 0x0A, 0x56, 0x03, 0x8B, 0x16, 0x3E, 0x8A, 0xEB, 0x37, 0x5F, 0x4F, 0x9E,
0x28, 0x02, 0x0E, 0x0A, 0x10, 0x70, 0x11, 0x37, 0x2C, 0x98, 0x2A, 0x2B, 0x14, 0x78, 0x35, 0xB9,
0x49, 0x52, 0x31, 0x56, 0x70, 0x91, 0x55, 0x9A, 0x8C, 0xD1, 0x72, 0x9B, 0xB9, 0xDA, 0x99, 0xDA,
0x28, 0x42, 0x0A, 0x03, 0x11, 0x55, 0x12, 0x4F, 0x10, 0x58, 0x2A, 0x67, 0x19, 0x39, 0x3A, 0x99,
0x6D, 0xAB, 0x2B, 0x8B, 0x37, 0x9A, 0x4F, 0x93, 0x35, 0x7E, 0x92, 0xD3, 0x7F, 0x9F, 0x9B, 0xDF,
0x24, 0x0E, 0x0C, 0x16, 0x2C, 0x8C, 0x10, 0x35, 0x68, 0xAC, 0x2D, 0x36, 0x94, 0xB4, 0x54, 0xB1,
0x69, 0x3B, 0x18, 0x3E, 0x98, 0xD1, 0x58, 0x7E, 0xAC, 0xED, 0x76, 0x7F, 0xFC, 0xF5, 0xBD, 0xF6,
0x24, 0x8A, 0x46, 0x8A, 0x2A, 0x72, 0x2A, 0x92, 0x2D, 0x76, 0x6D, 0xAE, 0x56, 0x76, 0x72, 0xB6,
0x4D, 0xC6, 0x87, 0xEB, 0x2B, 0x9B, 0x67, 0xD3, 0x36, 0x7F, 0xAE, 0xEF, 0x57, 0x7F, 0x6F, 0xBF,
0x6D, 0x0B, 0x16, 0x37, 0x14, 0xB9, 0x19, 0x7F, 0x94, 0xFC, 0x56, 0x57, 0x7C, 0xDD, 0x5D, 0xFE,
0x6D, 0x37, 0x39, 0x5F, 0x78, 0xDA, 0x39, 0x9F, 0xB4, 0xF5, 0x76, 0x7F, 0xDD, 0xFD, 0x9D, 0xFA,
0x6D, 0x47, 0x32, 0x4F, 0x35, 0x99, 0x3A, 0x9B, 0x54, 0xBD, 0x72, 0x6F, 0x5D, 0x9D, 0x7E, 0xFE,
0x6D, 0x7B, 0x79, 0x9E, 0xB9, 0xDA, 0x99, 0xDF, 0xB1, 0xF6, 0xB6, 0xBF, 0xFE, 0xFA, 0xFE, 0xFF,

the ILI panel is actually 16 bits per pixel and the table is twice as long, but the same set of calculations apply... and all that calculation time disappears in a puff of lookup table glory. Lookups are fast and our problem of "not enough CPU" is solved.

How does it hold up in color environments, though? Well, the best example I can think of is the game AirHeart (which is my go-to for the game that pushes the limits of what the Apple //e was capable of doing). Here it is at full (double) resolution:

And here it is on a 320x240 display, at half-double-resolution:


Now I'm wondering about the effort spent getting that RA8875 working. If I'd made the lookup table years ago, I'm not sure upgrading the panel would have even come up...