A Teensy 4.1 running as an Apple //e
To make the experience fit your profile, pick a username and tell us what interests you.
The OSH Park boards arrived, and I spent some time Monday assembling! Here's a time lapse of the build, which took me shy of 3 hours (mostly because I hadn't organized any of the parts and had to hunt for several).
The build didn't actually work right away - I'd installed the power boost module upside-down (if you use the same board, don't install it with the components facing up - they should face down). After re-soldering it, everything just booted up fine!
A couple quick thoughts about this new build.
Well, it's official - the r7 (or "v7" as I apparently named it, since I'm mostly doing software stuff these days) Aiie board is off to OSH Park for prototype manufacturing.
There are some substantial changes in there. Amongst them:
I've also been playing with VGA output on the Teensy 4.1, trying to build a FlexIO output with the right timing. However, there aren't enough free pins for me to do that with this layout, and I've been delaying having prototype boards made while I'm fiddling with the VGA stuff - so I've put that on the back burner for now. Maybe v8 will have a serial display-or-VGA hardware option, or maybe the VGA version will wind up being something completely different. Or maybe I'll never get back to it! Who knows. :)
By the end of August, I expect I'll have three prototype boards in my hands with a pile of new components to populate. Then it's back to the software!
Welcome to Redesigns 'R Us, where we come up with new ways to do old things!
Over the last few years, my Aiie has mostly been sitting collecting dust. Sure, I spent some time working on WOZ disk format (which I love), and I've got a half dozen private branches of the code repo where I've been working on various features - but there are two major obstacles that I've talked about before that have kept me from really pursuing any of them:
But now there's the Teensy 4.1! It's a nice bump, from 180 MHz to 600 MHz; and it has pads for an additional 16MB of PSRAM. Those sound enticing - but come at a cost; there are fewer pins available.
The Teensy 3.6 had a boatload of pins available via pads on the underside of the board. Aiie v1 used many of those (lazily, via the Tall Dog breakout board). But with something like 17 fewer pins (if I've counted rightly) on the Teensy 4.1, I've got a problem.
So, redesign decision 1: how do I squeeze the same hardware in to a smaller footprint?
Well, back in Entry 17, I faced the same general question when I experimented with adding external SRAM. My choice then is the same as it is now - swap out the display. I originally picked a 16-bit parallel display because I wanted to throw data at it very quickly. And it did, at first, until I wound up complicating the codebase which dropped the framerate to a sad 6FPS. (This is on my list of "things that bug me about Aiie v1" - as it became more Apple //e-correct, it became much less responsive.)
So the 16-bit nature of the display isn't the problem. It's the code (primarily) and the available CPU (to a lesser extent). The display I picked then is the same one I'm picking now - the ILI9341, an SPI-driven display. In theory I can run it via DMA which will reduce the CPU overhead too. My only beef here is that the version of the ILI9341 that's in the Teensy store is a 2.8" display, where I picked a 3.2" display for Aiie v1 - but there are 3.2" versions of the ILI9341 available, and I have one of them, so I'm pretty satisfied on that front.
And with that much information, it's time to try it out! I've still got the original Aiie prototype board sitting around, and it doesn't seem too daunting to rewire it for this. First step, remove all the stuff I don't need, like that nRF24L01 serial interface that I wound up replacing with an ESP-01 in the final v1 circuit and all of those extra pins broken out from the bottom of the Teensy 3.6...
... not too hard.
There's also this rat's nest on the backside that has to go.
And then I need to figure out how I'm powering it. Lately I've been liking these MakerFocus battery charger / boost modules - they're obviously intended as the core of a battery booster pack, and fairly elegantly handle the charging of the battery, boost to 5v, and display of the battery's state. Single presses of the button turn it on, and a double-press turns it off. So adding one of those and a 3.3v linear regulator to safely drive the display...
Re-add the rat's nest of wiring underneath...
use some velcro to tape the battery in place...
and what do you know, if we hand-wave through the little bit of code that needed adjusting, we wind up with
36 frames per second on mostly-unoptimized code. Oh yeah, I like this.
The new code is in my Github repo, in the 'teensy41' branch. If you look at the timestamps you'll...Read more »
JOSS is going to take a while. Yes, it's seen the light of day, proving to me that it's possible. But there are so many pieces that I still need to code - and I'm not sure how feasible it's going to be to get the Wifi driver and USB stack written. I'm (sensibly, perhaps) keeping this in my log of to-do items as future work, and moving forward with development under Linux until I've at least got the hardware settled.
And to settle that, I need to actually make a decision about the hardware. Which is turning out to be difficult.
First: I've got a pile of different LCDs now lying around. Two different serial 320x240s. One 3.2" 480x320. One HDMI 3.5", and one HDMI 5". Each has its pros and cons.
I've also got designs on three different form factors. Two of them I can envision in cases. The other one is still basically the same device I've already built - but faster.
And continuing with the Pi Zero, I still need to build the end of the peripherals; one serial port for the printer, and two analog inputs for the joystick. Oh, and probably a third analog input for the battery level sensor?
I spent a good deal of this past weekend with papercraft and woodworking to mock up what various things would look like. It was a good use of time, even if there's nothing permanent that came out of it; my thoughts are more solid about what I want, at least.
The three prototypes I have right now are:
1. The "classic" Aiie prototype. Exactly the same form factor as the original. Small 1cm buttons as "keys" for the "keyboard" (button pad). 3.5" LCD -- probably a very nice HDMI jobbie I found on Amazon, this very nice $30 3.5" downconverting display. The LCD it's attached to is only 320x480, but it scales the HDMI input down very nicely. Power draw is pretty reasonable, too. Or, if I spend time on it, this could still be one of the 3.2" serial LCDs with a Teensy 3.6 (and external RAM).
Or, prototypes 2 and 3... a scaled-down Apple //e style -- based around either the 3.5" display or a 5" display. I really like this idea, but I want it to be functional. No fake keyboard, for example. If it's got a keyboard, then I want it to work.
And that kind of drives the direction a bit. The smallest working keyboard I can build right now would mandate a 6.5" wide model, so the 5" display *might* do. Or it might be a bit small.
In the other direction, using the 3.5" display I'd have to have keys that are about 0.2" wide. With SMD switches, I suppose it's possible; it doesn't seem like fun, and I have no idea how I would make key caps at that scale. It would be the cutest option, of course. So i'm still thinking it over (and if any of you have thoughts on a keyboard at this scale, drop me a line!).
Back to the visuals, though! I wanted to see it. And here's the 3.5" scale monitor, in papercraft form.
That's the 3.5" HDMI panel, sitting on top of a Pi Zero, connected to a battery pack and a full-sized keyboard. You want to see it running, you say? Okay...
I went through the exercise of cutting out the whole monitor at that scale before I realized that I couldn't build a working keyboard at the same scale. (I didn't bother taking pictures, sadly.) Then we moved on to the version scaled around the smallest keyboard I can realistically build...
First I mocked up the keyboard, then built up to the monitor. The 5" LCD I've got is a bit small for it but I think it's workable. To get a better idea of it, I threw together a quick plywood mock-up...
Yeah, that might work. There are a lot of details I'll want to build out in that direction, but I'm pretty happy with the display. Out the back comes an HDMI input and power; this panel has no audio, unfortunately. (The 3.5" one does. Have I mentioned I really like that 3.5" display?)
So I'm going to focus on this case for the moment. (I'm starting to think that it's going to be impossible to pick one; it's probable that...Read more »
One of the most rewarding moments of Aiie, for me, is turning it on. I flip the switch; the screen becomes filled with inverted @ symbols; it beeps; and then I see "Apple //e" across the top of the screen.
It's just like my original; I reached around the back, flipped the switch, and felt the static charge build up on my face since I was now mere inches away from the CRT. The screen began to glow, and beep there we go! Chunka-chunka-chunka the disk drive whirrs and we're off to the races.
I want most or all of that experience from Aiie.
When I turn on a Pi Zero W, however, what I get is a raspberry; a good 40 seconds of Linux kernel booting; and then a login prompt. Eww.
I could rip out most of the init sequence; have Linux run aiie instead of init. Systemd. Whatever. I'll still have the kernel boot time, which is fairly substantial. Sure, I could compile a custom kernel that's got the modules I need built in. Strip out the unnecessary cruft, reducing boot time. I've found a quiet boot that seems to suppress most of the messages. It's still light years away from the experience I'm seeking, though. So how do I get the kernel out of my way?
Well, don't run Linux, of course.
Enter JOSS - Jorj's OS Substitute.
I've got a working, bare-bones OS that can host Aiie with no Linux between. It needs a lot of work. There are many, many things I still don't fully understand in the BCM2835 package (that's the CPU + peripherals in a Pi Zero). It speaks out the mini UART, giving me serial; I've got the GPU initialized, giving me video; I've got one timer running successfully, giving me some timing. I've got read support working for the SD card, along with a simplistic FAT32 layer. Nice.
I don't have FPU support - or rather, I haven't verified that I've got the FPU running properly. I don't have the MMU running at all, and I haven't figured out Fast IRQs yet. But it's enough for me to get a basic Aiie boot screen up, with video refreshing properly.
Which is important, because the number one thing I'm considering right now is: how fast can I drive the display? The same question I've been asking for the last few months. Is this path going to finally give me the freedom I want to build this out in to a hardware emulation platform? And the results are not very encouraging.
The boot time is great. I'm using a bootloader to transfer JOSS + Aiie over serial here, so there's a bit of startup delay that won't be in a final version. Once the binary finishes loading and I tell it to run, it's a near-instantaneous boot.
But for whatever reason, I can only drive the JOSS-based Aiie! video at about 1 frame per second.
Clearly, that can't be the hardware. Can it? It's a 1GHz processor; it seems unlikely that video would have such an enormous performance penalty. I suppose I can find out. This thing *does* boot Linux, after all.
So time for yet another fork: the Raspbian Framebuffer fork.
Ignore the text poking out from under Aiie; I'm not allocating and switching to a dedicated VT here, so the login prompt is peeking out of the framebuffer console. I'm unceremoniously blitting all over the framebuffer directly. And I'm easily getting 52 FPS at 320x240, scaled by Linux. If I take out the delay loop, I see hundreds of frames per second. While driving the virtual CPU at full speed. This pretty definitively tells me that I'm missing something in JOSS; the hardware is plenty capable.
Why is it slow in JOSS, then? There are two major candidates. First: CPU speed itself.
When the Pi Zero boots, it actually doesn't give control to the CPU. The GPU gets control; it throws a rainbow square on the HDMI output, loads up some boot files off of the SD card, reconfigures the hardware, and then throws control over to the CPU.
The BCM2835 brings the ARM up at 250MHz. The GPU, when it's configuring the hardware, can change that. I think it's successfully changing it...Read more »
More accurately: many forks, many roads!
It's been an interesting month of tinkering. The video output question has lead me down multiple simultaneous code forks, and I'm slowly gathering enough information to make some sort of decision. I'm not quite there yet, though... so let me recap, and recount the last few weeks.
Option 1: Figure out nested interrupts on the Teensy and keep the existing 16-bit display.
In theory, nested interrupts should save me the hassle of direct display; if I could get it set up right, then the display updates could interrupt the CPU updates frequently enough that I could reclaim the display pulse time for the CPU to run. It doesn't feel like there's much pulse time to work with, though. My programming gut instinct tells me I'd be robbing Peter to pay Paul, and while I might wind up with a small net gain of free CPU time, it would be for such a small gain that I wouldn't be able to use it effectively. Consequentemento, I haven't spent much time on this; I consider it a last-resort option at this point.
Option 2: NTSC output, from Entry 16. I think it would be possible to make a B&W NTSC driver, with nested interrupts, that works. It might buy me enough free CPU time to get the current set of Aiie features running the way I want. I don't know that it buys me enough for what I want to add later. But there is one important possible win: I would be able to correctly draw 80-column text. Right now Aiie drops a pixel in the middle of each 80-column character in order to get them to fit in 280 pixels. The result is just barely passable. With an NTSC output I'd be able to drive the output for the full 560 pixels, showing the full 80-column character set cleanly.
On the down side, I don't have enough RAM in the Teensy to do that. I'm within about 10K of its limits already; I can't double the video driver memory. Which would lead me to the same complexity I'm considering for the next option, which I think is technically superior anyway. So this is, at least for the moment, a dead end.
Option 3: DMA displays. I've got two 320x240 serial displays now. Ignoring, for the moment, that they're 2.8" instead of 3.2": if I can offload the work of the CPU banging data out to the LCD, then I can reclaim all of that CPU time. Unfortunately, to run DMA, I need to tradeoff RAM. I need a video buffer where I can store what's going out to the display. And, once again, I'm staring at the 10K of RAM I've got free and wondering how the heck I'm gonna be able to do that. Which brings me to this prototype:
Under that display is a SRAM chip. This feels like a bad path, generally speaking - but if I use that as the RAM in Aiie itself, then I'm freeing up 128K of RAM in the Teensy, which I can easily use for DMA. With some pretty serious side effects.
Side effect A: code complication. There's now another level of abstraction on top of the VM's RAM, so that I can abstract it out to a driver that knows how to interface with this AS6C4008. I'm generally okay with this one. There's no problem in computer science that you can't fix with another layer of abstraction, anyone?
Side effect B: performance. Every time I want to read or write some bit of RAM, now it's got to twiddle a bunch of registers and I/O lines to fetch or set the data. My instant access (at 180MHz) is reduced to a crawl. I've built some compensating controls in a hybrid RAM model; the pages of RAM that are often used in the 6502 are in Teensy RAM and the ones that aren't are stored on the AS6C4008. Even so: whenever I'm reading or writing this thing, I have to turn off interrupts so I'm not disturbed. So now I've got a competing source of interrupts; I'm back to having to understand nested interrupts on the Teensy better. This damn spectre is dogging me, and I think I see where it's going to lead; I really need to write some test programs around nested interrupts to figure out my fundamental...Read more »
There are a bunch of things I'd like to do with Aiie; I've got a good backlog. Things you've heard about, like Mockingboard support and a more reliable speaker-plus-video driver. Things you haven't, because right now they're totally made of unobtanium. And all of these things rely on the same underlying resource.
Having grown up with machines that clocked around 1MHz, part of my brain screams "it's 180 MHz; there's plenty in there for everyone!" It's got to be possible. I just haven't managed to make it happen yet.
So, for the last couple of days, I've been looking at ways to free up some CPU time.
Step 1: identify what's using all of it. Well, that's easy: it's redrawing the LCD.
Step 2: figure out what to do about it. There's little optimization to be done; I've already basically built my own LCD driver to be fast enough to work. Which means making a more drastic change of some sort. And here's where we go down several different rabbit holes until we strike gold. Or something.
This LCD panel - the SSD1289 - happened to be the largest sitting around my workshop. Since the original project was completely built out of stuff I had lying around, it was definitely the right choice. There was another option - I also have a a 160x128 pixel ST7735 display knocking around. That's not enough pixels for an Apple HGR display, though; so the '1289 won out.
When I originally got this panel, I was looking for fast video options for the Arduino Mega 2560. After a lifetime of working with PIC microcontrollers, I had just picked up a couple of Arduinos to see what the heck the hype was all about; I wanted to know what they were capable of. I used the panel for a small microcomputer build that ran a custom BASIC interpreter I'd thrown together and then set it all aside. (The Mega doesn't have enough RAM for this to be interesting; the display was too slow and klutzy for me; and there wasn't really any purpose behind the project other than generalized research.)
Given the platform, it seemed reasonable that a 16-bit data bus on a Mega would be faster than any SPI bus I could drive on that platform. And so it seemed like it would also be the best option for this project. More Data, More Faster.
Which is true, as long as I'm willing to spend the CPU on it. Aaaand now I'm not.
Yes, all of that means I'm thinking about what I could use to replace the display. And I've got two different ideas. First, obviously, would be to replace the display with a different but similar display. I've seen some great work with DMA on the Teensy that would probably fit the bill - offloading the CPU-based driver work to DMA, freeing up a lot of processor time. I definitely want to try this out. Prerequisite: a different display panel, which I don't have. That'll wait, then (a couple panels are on order; it's not going to wait long).
The other train of thought I've got goes something like this.
The Apple ][ could do this. The //e could do this with color and many display pages. My //e did this with a special video card; emulating that is taking a lot of CPU time. But all //es did this even without that card; they jammed video out their composite video port.
That's the point where the light bulb goes on and the cameras zoom in on our hero, grinning cagily.
Can I get all of this data out a composite NTSC interface? There exist small 3.5-ish inch NTSC monitors. Some of them are even rechargeable themselves, with auxiliary power out - so you can plug a security camera in to this thing for both video and power; set it up; and then plug it back in to its full-time gig. I could use one of those to double as both the display and the battery, which also gives me a built-in charger.
That sounds kind of interesting.
Of course, to do this, I'll have to do some of my least favorite circuit engineering. I really am not a fan of analog signal work. There are all these little bits of EE knowledge that are different depending on...Read more »
I left this great bomb for myself in August:
// The Timer1.stop()/start() is bad. Using it, the display doesn't // tear; but the audio is also broken. Taking it out, audio is good // but the display tears. Timer1.stop(); ... Timer1.start();
Of course, I'd totally forgotten about this; when I went to play some Ali Baba, the sound was completely junked up and I didn't know why.
So what to do? Well, rip out the audio system, of course!
I may have forgotten to mention: Aiie runs on my Mac. For debugging purposes. It's a little messy; has no interface for inserting/ejecting disks; and it has the same display resolution as the Teensy. It's not ideal for use, but it's great for debugging.
The audio in said SDL-based Mac variant has never been right. I got to the point where audio kinda was close to correct and left it at that; mostly, I've been debugging things like the disk subsystem or CPU cycle counts or whatever. But now that I'm trying to debug audio, I kinda care that it doesn't work correctly in SDL.
The whole problem in the Mac variant was a combination of buffering and timing. The sound interface to the Apple ][ is simply a read of a memory address ($C030), which causes a toggle of the DC voltage to the speaker (on or off). If you write to that address then the DC voltage is toggled twice. And that's it.
For the Teensy that was fairly easy: I wired a speaker up to a pin, and I'm doing exactly the same thing. But for the SDL variant I need to construct an SDL sound channel and keep it filled with sound data. So the virtual CPU runs some number of cycles; when it detects access to $C030, it tells the virtual speaker to toggle; the virtual speaker toggle logs a change at a specific CPU cycle number and puts it in a queue. Then later a maintenance function is called in the speaker that pulls events out of that queue and puts them in to the current speaker value. A separate thread runs constantly, picking up the current speaker value and putting it in a sound buffer for SDL to play.
There's lots of room for error in that, and it turns out that my timing was bad. The calls to maintainSpeaker were not regularly timed, and were calculating the current CPU cycle (from real-time) differently than the CPU itself did. Converting that to time-based instead of cycle-based fixed things up quite a bit; removing the fundamentally redundant dequeueing in maintainSpeaker fixed it up even more. I started adopting a DC-level filter like what's in AppleWin but abandoned it mid-stride; I understand what it's trying to do, and I need to change how it works for the SDL version. Later. Maybe much later.
After doing all of that in the SDL version, I figured I'd make the same fixups in the Teensy version and *bam* it'd be fixed up faster than Emiril spicing up some shrimp etouffee. But no, I was instead stuck on the slow road of Jacques Pepin's puff pastry, where you can't see any of that butter sitting between the layers of dough until it does or doesn't puff up correctly. (For the record: I'd take anything of Pepin's before anything of Emiril's any day. But shrimp etouffee is indeed delicious pretty much no matter the chef.)
No, it took me another few hours of fiddling with this, pfutzing with that, seeing that adding delays in the interrupt routine actually *improved* the sound quality, optimizing more bits of code, and finally stumbling across my commented code above before I realized what was happening. It's the LCD.
The Teensy can run the CPU, no problem. It can run the speaker in real-time, no problem. It may very well be able to run a sound card emulator, which involves emulating something like 8 distinct pieces of hardware. But there's one thing it absolutely can't do: it can't refresh the video display in real-time between two CPU cycles.
One CPU cycle of the Apple //e, at 1.023 MHz, is 1/1023000 second (about 0.978 microseconds). One update of the video, which is 280x192 pixels, can't be...Read more »
As predicted, I came back to this project in December. The github code has been updated a few times over the last couple weeks, and here's what's happened.
Having left this on the shelf for so long, I'd forgotten where I'd left everything. So I started with "what do I want the most?"
Answer: hard drive support.
Back in the 80s, while I was in high school, I worked at a software store. (Babbage's, for any of those that might remember it.) While I was there the ProDOS version of Merlin (a 6502 assembler) was released. I bought myself a copy and started writing things. I noodled around with ProDOS - both the external command interface and the internal workings of its disk system. And I have images of a couple of my development floppies.
I'd love to consolidate all of that to a single hard drive image.
So, looking around for ProDOS hard drive support, I stumbled across the AppleWin implementation. One of its authors wrote a simple driver to emulate a hard drive card that ProDOS will use. So I pulled their driver and wrote code in AiiE to interface with it. All told, it took about 6 hours to get this working (3 hours to write the code, and 3 hours of constantly retracing my steps to find the typo while taking cold medicine, ugh).
Well, that was easy! What's next?
I guess I'd like to boot GEOS. Not for any particular reason other than I had run the first version of GEOS for the Apple //e back in 1988. The disks won't boot, though; they give me various system errors. Why, exactly? Well, it's all in the disk drive emulation.
The Disk ][ was a favorite research topic of mine back around 1988; I was fascinated by the encoding of data on the disk in particular. Which makes all the work on the disk emulation so much more enjoyable! Instead of being in the Apple //e and trying to read and write nibbles of disk data, I'm in a virtual Disk ][ trying to send Aiie data that /looks/ like it came from a floppy controller!
My first pass of the floppy controller code was a mishmash of approaches. I looked at how other people had implemented theirs and cobbled together something that looked like it worked. Which lead to code bits like this:
// FIXME: with this shortcut here, disk access speeds up ridiculously. // Is this breaking anything? return ( (readOrWriteByte() & 0x7F) | (isWriteProtected() ? 0x80 : 0x00) );
Now, that piece of code totally doesn't belong there. I had it jammed in the handler for setting a disk drive to read mode. I think I'd accidentally put it in there while writing the code the first time around, noticed the performance improvement, and left it there with the comment for future me to puzzle out.
Rather than starting on this end of the thread, I figured I'd gut the rest of the disk implementation and see what could be cleaned up. First up was the stepper motor for the disk head: a simple 4-phase stepper, where each of the four phases can be energized and de-energized by accessing 8 different bytes of memory space. The drive actually steps in half-track increments, which some disks used as part of their copy-protection schemes; but that's not really useful to me, so I'm only supporting full tracks (as do all of the emulators I've looked at so far).
My first attempt kept track of the four phases and which was last energized; and then divined the direction the head was moving. If we went past "trackPos 68" (which is to say track 35, because 68/2 is 34, and tracks are 0-based) then the drive was bound at 68.
I decided to rewrite it. The first rewrite kept distinct track of the four magnets, so it could tell if something odd was happening ("why are all of these on?"). But again, that's not really useful to me, and I kept confusing myself about the track positions. So the second rewrite keeps track of the current half-track ("curHalfTrack") and only pays attention when phases are energized. It assumes that the de-energizing is being done properly. Then a two-dimensional array is consulted to see how...Read more »
I didn't know it was possible to have a first-run PCB that actually worked.
The silk screen label error is the only one I've found so far. The joystick axes are reversed, which I thought I was probably doing when I wired it up; I intentionally didn't bother checking. And I haven't verified the voltage divider on the analog input to check the battery level (it didn't work correctly in the original prototype, so maybe it's also not working here).
The Teensy is behind the LCD, as you can see - which means the MicroSD is much more accessible (it's no longer jammed up against the joystick).
I picked a random speaker from Adafruit that looked like it would be reasonable, and it's fairly loud, so I'm happy with that choice. It and the battery are now double-sided-taped to the back of the PCB.
Next up is some software cleanup. I left the software in an odd state; I was implementing real-time audio interrupts for better sound card support. To get that, I sacrificed video quality that I'd like to get back now. And there's at least one error that Jennifer found in the original code when she tried to compile it with a newer version of the Arduino environment...
Become a member to follow this project and never miss any updates