Close

Entry 8: a Teensy bit of support

A project log for Aiie! - an embedded Apple //e emulator

A Teensy 4.1 running as an Apple //e

jorj-bauerJorj Bauer 02/20/2017 at 19:290 Comments

(This post refers to git commit 3af0b916d7481305979181e1c307ef40e1b46f19.)

Finally! Here's the important part: all of the hardware model for the Teensy. Well, most of it. There's something I haven't put in quite yet that I'll get to in the next log. :)

This is, for the most part, very straightforward. All of this stuff is in the teensy/ directory of the project.

The file teensy.ino is the glue, just like aiie.cpp was the glue for the Mac-based emulator. It connects together the MicroSD card, keyboard, display, joystick, and Apple //e VM. It's responsible for running instructions on the CPU at the right time. And it's a bit messy.

Let's start down in the virtual 65c02 CPU again. With the Mac version of the emulator, aiie-opencv runs two concurrent threads - one for the CPU and one for the display, basically. The same division happens on the Teensy; there's a function runCPU() that performs some work on the CPU. (Let's assume that's one instruction for now.) It can't do exactly what the Mac version does - which is to run an instruction and then nanosleep() until it's time for the next instruction. The Teensy variant is running from a timer interrupt handler; it has to return, so that the main loop() can continue to redraw the LCD screen. So instead, the Teensy code keep track of when the next instruction *should* run, in microseconds. When the runCPU() function is called, it checks to see if it's time to run an instruction; and if so, it does. If not, it simply returns.

But there's a lot of overhead in calling the virtual 65c02 to perform one instruction. We have several function calls, each of which has to save and restore register values, in addition to the memory jumps and returns. If you run the CPU flat-out in one function, it performs much better than if you call it one step() at a time. And, indeed, if we call one step() at a time we have none of Teensy's CPU time left to draw the screen. We're running at a fraction of the speed of the original Apple //e. Which means that we need to compromise.

Instead of calling the virtual CPU one step at a time, we tell it to execute a few instructions before it returns. The "Run(24)" call tells it to execute enough instructions to take up at least 24 clock cycles before returning. (The number comes from experimentation; the average time used per instruction is about 3 cycles, and this happens to be the largest multiple of 3 that didn't have other unwanted effects.)

Couple that with Timer1. This drives the CPU. Timer1 has a resolution of 1 microsecond, which is *just a little* too slow for us; the Apple //e clock is actually 0.97752 microseconds. And it's actually immaterial, because of the Run(24): we're already sort of driving in a traffic jam: we move a little, then stop. Then move. Then stop. The trick is to look at this from high enough above that it looks like everyone is moving, just very slowly; we want to be correct *on average*.

That also means that we don't care that Timer1 has a maximum resolution of 1 microsecond; in fact, we can back off a little. And I did. It's called every 3 microseconds, with little to no visible impact. That leaves extra time for the LCD to draw, and in practice the current version can draw about 26 FPS reliably.

It only gets that speed because of optimizations, though. Using the stock LCD libraries, we'll only get a fraction of it; the libraries try their best to be *accurate*. Which means they make very few assumptions about how you're likely to draw. Want to draw a pixel on the screen? Sure! We'll set the cursor position, set the ram position, put you in write mode, and draw the pixel. Want to draw the next pixel over? Same thing. Only... well, the LCD automatically incremented the memory address, so all you really had to do was draw the actual pixel. All that cursor/ram/mode nonsense is superfluous. And in teensy-display.cpp, you'll find routines that are probably only useful to this project: they assume that the LCD is going to be used the way that *I'm* using it. They take shortcuts all over the place. And they get fantastic results from my butcher job: I went from something like 2FPS to 26FPS. Very respectable; I'm sure I could clean this up and probably get a little more out of it. There's a clean/dirty flag that I expect to help quite a lot, but it doesn't work right just yet. And the LCD code is messy; I should clean a lot of it up and better document the way I'm initializing the display (which is subtly different than gLCD).

But not right now! I've got a working emulator, and I'm playing with it in many other ways. :)

Discussions