Close

Entry 18: Pi Zero W and JOSS

A project log for Aiie! - an embedded Apple //e emulator

A Teensy 4.1 running as an Apple //e

Jorj BauerJorj Bauer 02/04/2018 at 16:280 Comments

One of the most rewarding moments of Aiie, for me, is turning it on. I flip the switch; the screen becomes filled with inverted @ symbols; it beeps; and then I see "Apple //e" across the top of the screen.

It's just like my original; I reached around the back, flipped the switch, and felt the static charge build up on my face since I was now mere inches away from the CRT. The screen began to glow, and beep there we go! Chunka-chunka-chunka the disk drive whirrs and we're off to the races.

I want most or all of that experience from Aiie.

When I turn on a Pi Zero W, however, what I get is a raspberry; a good 40 seconds of Linux kernel booting; and then a login prompt. Eww.

I could rip out most of the init sequence; have Linux run aiie instead of init. Systemd. Whatever. I'll still have the kernel boot time, which is fairly substantial. Sure, I could compile a custom kernel that's got the modules I need built in. Strip out the unnecessary cruft, reducing boot time. I've found a quiet boot that seems to suppress most of the messages. It's still light years away from the experience I'm seeking, though. So how do I get the kernel out of my way?

Well, don't run Linux, of course.

Enter JOSS - Jorj's OS Substitute.

I've got a working, bare-bones OS that can host Aiie with no Linux between. It needs a lot of work. There are many, many things I still don't fully understand in the BCM2835 package (that's the CPU + peripherals in a Pi Zero). It speaks out the mini UART, giving me serial; I've got the GPU initialized, giving me video; I've got one timer running successfully, giving me some timing. I've got read support working for the SD card, along with a simplistic FAT32 layer. Nice.

I don't have FPU support - or rather, I haven't verified that I've got the FPU running properly. I don't have the MMU running at all, and I haven't figured out Fast IRQs yet. But it's enough for me to get a basic Aiie boot screen up, with video refreshing properly.

Which is important, because the number one thing I'm considering right now is: how fast can I drive the display? The same question I've been asking for the last few months. Is this path going to finally give me the freedom I want to build this out in to a hardware emulation platform? And the results are not very encouraging.

The boot time is great. I'm using a bootloader to transfer JOSS + Aiie over serial here, so there's a bit of startup delay that won't be in a final version. Once the binary finishes loading and I tell it to run, it's a near-instantaneous boot.

But for whatever reason, I can only drive the JOSS-based Aiie! video at about 1 frame per second.

Clearly, that can't be the hardware. Can it? It's a 1GHz processor; it seems unlikely that video would have such an enormous performance penalty. I suppose I can find out. This thing *does* boot Linux, after all.

So time for yet another fork: the Raspbian Framebuffer fork.

Ignore the text poking out from under Aiie; I'm not allocating and switching to a dedicated VT here, so the login prompt is peeking out of the framebuffer console. I'm unceremoniously blitting all over the framebuffer directly. And I'm easily getting 52 FPS at 320x240, scaled by Linux. If I take out the delay loop, I see hundreds of frames per second. While driving the virtual CPU at full speed. This pretty definitively tells me that I'm missing something in JOSS; the hardware is plenty capable.

Why is it slow in JOSS, then? There are two major candidates. First: CPU speed itself.

When the Pi Zero boots, it actually doesn't give control to the CPU. The GPU gets control; it throws a rainbow square on the HDMI output, loads up some boot files off of the SD card, reconfigures the hardware, and then throws control over to the CPU.

The BCM2835 brings the ARM up at 250MHz. The GPU, when it's configuring the hardware, can change that. I think it's successfully changing it to 700MHz for me, which should be plenty at the moment. It's hard to tell, though; I've built a couple inconclusive test apps that try to measure the real world time delay of various busy-wait loops, and I'm not seeing the time differences I'd expect.

There's also the MMU. Having not initialized it means that I've also not initialized the caches or the branch predictor. That should be a fairly significant performance penalty.

At any rate, the direction I'm headed in isn't clear. I'm thrashing about on many different avenues trying to get a foothold on which is going to be the best.

The Pi Zero certainly has the performance I want, but the bootup process sucks; even if I replace Linux with JOSS, I think I'll have a couple second delay before the GPU hands over control.

The serial display with DMA requires external RAM, which I don't really like. And with the Pi Zero being both cheaper and faster than the Teensy 3.6, I'm reluctant to put an extra $20 of cost in to this for only 30% of the performance. Sure, the Teensy has better peripherals and a Cortex M4; but this old Arm 6's clock speed and extra RAM trounce it pretty significantly.

Building an NTSC output for the Teensy feels like it might work, but again - back to the cost. Having tasted the combination of faster + more RAM + cheaper really makes me want to go in that direction.

But JOSS still needs significant work. When it's done - or along the way, depending on how JOSS evolves - I'll still need to code all of the peripherals myself. I'm not sure I'll be able to take advantage of the Wifi and Bluetooth on the Pi Zero without massive time investment, or reliance on Linux (and the boot delay that encompasses). The USB seems massive, but achiveable. And will I be satisfied with just 26 GPIO pins? I mean, I've been totally spoiled with the 62 I/O pins on the Teensy, so I've never really thought much about how to conserve them.

None of these options feels like a slam dunk just yet, so I suppose I will just keep tinkering for now...

Discussions