Entry 15: Testing the Limits

A project log for Aiie! - an embedded Apple //e emulator

A Teensy 4.1 running as an Apple //e

Jorj BauerJorj Bauer 01/03/2018 at 15:290 Comments

I left this great bomb for myself in August:

// The Timer1.stop()/start() is bad. Using it, the display doesn't
// tear; but the audio is also broken. Taking it out, audio is good
// but the display tears.


Of course, I'd totally forgotten about this; when I went to play some Ali Baba, the sound was completely junked up and I didn't know why.

So what to do? Well, rip out the audio system, of course!

I may have forgotten to mention: Aiie runs on my Mac. For debugging purposes. It's a little messy; has no interface for inserting/ejecting disks; and it has the same display resolution as the Teensy. It's not ideal for use, but it's great for debugging.

The audio in said SDL-based Mac variant has never been right. I got to the point where audio kinda was close to correct and left it at that; mostly, I've been debugging things like the disk subsystem or CPU cycle counts or whatever. But now that I'm trying to debug audio, I kinda care that it doesn't work correctly in SDL.

The whole problem in the Mac variant was a combination of buffering and timing. The sound interface to the Apple ][ is simply a read of a memory address ($C030), which causes a toggle of the DC voltage to the speaker (on or off). If you write to that address then the DC voltage is toggled twice. And that's it.

For the Teensy that was fairly easy: I wired a speaker up to a pin, and I'm doing exactly the same thing. But for the SDL variant I need to construct an SDL sound channel and keep it filled with sound data. So the virtual CPU runs some number of cycles; when it detects access to $C030, it tells the virtual speaker to toggle; the virtual speaker toggle logs a change at a specific CPU cycle number and puts it in a queue. Then later a maintenance function is called in the speaker that pulls events out of that queue and puts them in to the current speaker value. A separate thread runs constantly, picking up the current speaker value and putting it in a sound buffer for SDL to play.

There's lots of room for error in that, and it turns out that my timing was bad. The calls to maintainSpeaker were not regularly timed, and were calculating the current CPU cycle (from real-time) differently than the CPU itself did. Converting that to time-based instead of cycle-based fixed things up quite a bit; removing the fundamentally redundant dequeueing in maintainSpeaker fixed it up even more. I started adopting a DC-level filter like what's in AppleWin but abandoned it mid-stride; I understand what it's trying to do, and I need to change how it works for the SDL version. Later. Maybe much later.

After doing all of that in the SDL version, I figured I'd make the same fixups in the Teensy version and *bam* it'd be fixed up faster than Emiril spicing up some shrimp etouffee. But no, I was instead stuck on the slow road of Jacques Pepin's puff pastry, where you can't see any of that butter sitting between the layers of dough until it does or doesn't puff up correctly. (For the record: I'd take anything of Pepin's before anything of Emiril's any day. But shrimp etouffee is indeed delicious pretty much no matter the chef.)

No, it took me another few hours of fiddling with this, pfutzing with that, seeing that adding delays in the interrupt routine actually *improved* the sound quality, optimizing more bits of code, and finally stumbling across my commented code above before I realized what was happening. It's the LCD.

The Teensy can run the CPU, no problem. It can run the speaker in real-time, no problem. It may very well be able to run a sound card emulator, which involves emulating something like 8 distinct pieces of hardware. But there's one thing it absolutely can't do: it can't refresh the video display in real-time between two CPU cycles.

One CPU cycle of the Apple //e, at 1.023 MHz, is 1/1023000 second (about 0.978 microseconds). One update of the video, which is 280x192 pixels, can't be faster than 280 * 192 = 53760 cycles of the ARM clock. It's certainly slower than that, but in an optimal world where there is one instruction that updates one pixel to whatever value you want, that's the absolute best case. At 180MHz, that would be 298.67 microseconds. Overclocking the CPU to 240MHz, it's still 224 microseconds. And in reality, we're probably looking at 10 cycles per pixel - so 10 times longer than those numbers.

If a CPU cycle happens in the middle of the drawing of the video display, there's a real chance that the instruction will wind up changing video memory. Which means that even though we've only drawn maybe half of the display, its contents in the bottom half (after the CPU runs) may have changed from what we were trying to draw in the top half (before the CPU ran).

That's what the time bomb above was for. It's a lock: "don't run the CPU while we're updating the display." That ensures the display is whole. And with CPU buffering - running multiple CPU cycles at once, and then waiting until real time catches up to us before running again - it all averages out. It's not obvious to the user that the CPU is running too fast, then paused; then running too fast, then paused. The delta is measured in milliseconds. But when you add the *speaker* to that, well, now you'll notice. You can't run the speaker too fast and then too slow without hearing it. The result is a chirpy high-pitched version of what you should be hearing - where the pitch drifts a little up and down depending on how regularly we're being called.

So there are four fundamental choices.

First: prioritize the display. That's what the code above does. It ensures that the displayed image is a whole image from the video buffer. Audio suffers.

Second: prioritize audio. That's what the code *did* before I put the stops and starts in. You wind up with good audio, but the video "tears" - you get half of one image and half of another. In Skyfox, you'd see this as half a cloud.

Third: put the audio driver in the video loop. I'm really hesitant to do this. It would allow both to work, but it's architecturally inelegant. 

Fourth: add a second interrupt that just runs the audio queue. Then the audio queue can run in decoupled from the CPU, so the video draw won't care.

I think that last one is probably the best way to handle it eventually - but it's going to be messy. I need to be sure that the two interrupts can coexist without breaking each other, and I need to be sure Aiie has enough audio queue RAM to store the audio data. It will take some experimentation to figure out the logistics. I also probably want the audio interrupt gets a higher priority than the CPU interrupt - so that the audio queue can interrupt the CPU while it's running, because the CPU run might be running multiple cycles ahead of time. (Right now there's a realtime stop in the virtual CPU: if it detects a change on the speaker pin, it exits immediately instead of running more cycles. But that doesn't actually solve the problem; it just shoves it off to the next time the virtual CPU is scheduled to run!)

So for now, I've added an option to let the user choose option 1 or 2. Now to start coding up option 4 to see if it's feasible...