Close

state of the cpu address

A project log for Improbable AVR -> 8088 substitution for PC/XT

Probability this can work: 98%, working well: 50% A LOT of work, and utterly ridiculous.

eric-hertzEric Hertz 01/13/2017 at 22:230 Comments

Thanks to @jaromir.sukuba for inspiring this writing...

The point of this project is[/was?] to implement the 8088 chip itself, with an AVR... making use of the other peripherals already on the original mother-board, not because they're *better* than the peripherals available in an AVR, but because *nearly everything* is interfaced to the 8088 in the same way, including the "program memory" (BIOS), the RAM, and the I/Os...

There's been discussion--as well as browsings mentioned in past logs--wherein it's suggested to use the AVR's internal peripherals to *replace* those on the motherboard... That'd be smart. In fact, it'd probably be *much* faster if that was done.

E.G. writing a byte from a string stored in the AVR's internal memory into the TX-register in the AVR's serial-port would probably take something like 2 AVR instructions, maybe 4 AVR clock-cycles. On the other hand, doing-so, even without emulating the 8088's instruction-set, to an externally-attached ISA RS-232 card would require, at the very minimum, 4 8088 bus clocks (one 8088 bus cycle).

I think I can somewhat-reasonably expect to execute 5 AVR instructions per every 8088 bus clock... (4.77MHz * 5 = a slightly overclocked AVR). So we're talking a bare-minimum of 20 AVR-clock-cycles to transact a single byte of instruction/memory/I/O data... And that's assuming no wait-states, assuming the Bus-Interface-Unit (BIU) isn't already in the middle of a transaction (e.g. caching the next instruction-byte), that DMA isn't refreshing the DRAM, etc.

That's a *huge* hit, for *every* byte-transaction. And, again, doesn't even consider the fact that the AVR will be emulating the 8088's instruction-set.

Now, again, consider (as @jaromir.sukuba brought up, and was brought up in previous logs) just how much could actually be implemented *within* the AVR... The PC/XT BIOS-ROM is 8KB... Many AVRs could fit that and still have plenty of space for AVR-code. So, the BIOS itself could be stored in the AVR... Reducing the read of each byte from 20AVR cycles to 2-3, during boot, interrupts, and more. The RS-232 port, maybe I2C for a keyboard... (or even bitbanged PS/2!)... Even SPI for an LCD. Could even load a bunch of RAM in there, as well. And all these "devices" would communicate *significantly* faster when directly-interfaced with the AVR-core, rather than going through the 8088-style bus.

Also, consider that the 8088 contains two "units" which run *in parallel*. The Execution Unit (EU) actually executes instructions, but the Bus-Interface Unit (BIU) grabs data from/writes data to the bus. The BIU runs *in parallel* to the EU, so while the EU is executing instructions, the BIU is often fetching the next data-bytes for the next instructions, simultaneously. This could, in a way, be considered like a DMA transaction... Well, 8bit AVRs don't have DMA, especially one that'd be compatible with the 8088's bus-interface. So, that means each transaction on the bus has to be handled by the "CPU", making the entire system more like a lock-stepped single-core system, rather than a dual-threaded, dual-core system that the 8088's EU/BIU more closely approximate.

Similarly, consider an 8088 bus-transaction's being 4 bus-clocks, being 20 AVR clocks, being ~20 AVR instructions... Worst-Case, a bus-transaction transfers something like 6 bytes. We're talking 3 bytes for the address, 1 for data, and numerous control signals. The vast-majority of the signals need to be set-up early-on in the transaction... That means the AVR "core" would be executing numerous instructions as fast as possible to set up those byte-transactions. But, thereafter, the AVR core will just have to twiddle its thumbs waiting for the various bus-clocks, wait-states, and delays, thereafter.

Those bus-clocks, again, come at something like 1 every 5AVR cycles... So, one might think "use interrupts!". Herein we run into some realities... First of all, some (though few) AVR instructions can take as many as 5 clock-cycles. And, interrupt-processing waits until the current instruction is done executing. If those multi-cycle instructions were limited to "extremes" like "MULT", one could say "well, just make sure you don't use MULT while waiting for that next bus-clock". But, unfortunately, some of those instructions are not that unusual. E.G. Reading/Writing the internal SRAM takes several cycles, despite being single instructions. And, in fact, if I recall correctly, AVR's RETI (Return From Interrupt) instruction, alone, takes 5 clock-cycles.

So, using interrupts to handle bus-clocks isn't really possible. Which means the AVR will be busy-as-heck in those early bus-clocks, setting up the address/control signals, and NOPing during the later bus-clocks, NOPing for the majority of the bus-transaction.

Now, I suppose, if one's clever, those NOPs could be used for other purposes, and indeed some might (e.g. somewhere within there, we'd need to *read* the data coming back from the device). But, whatever's done, there's not enough time to handle e.g. waiting until an edge is detected, then jumping once it is, then performing what needs to be done. So, instead, we need to know the ratio of the number of AVR cycles to the number of bus-clocks, and keep track of exactly how many instructions occur between each clock... (and how many AVR cycles those instructions take!) ("dead-reconning" maybe?). Similarly, there's probably not enough time between each clock to run tests, e.g. "am I writing? Then output the data-byte. Otherwise read a data byte". Instead, we'll have to prepare these instructions *prior* to initiating the bus-transaction, And likely some instructions will have to be run regardless of whether the operation calls for it.

E.G. a bus transaction could be either a read or a write... The read case is easy... Just read the port-inputs *every* time, and if we're writing just disregard what was read. The write case is only slightly more difficult. One way to be able to handle *both* within the same bus-transaction function would be...

Before beginning the transaction, prepare our port-configuration variables... if writing, PORTx = Data, and DDRx = ALL_OUTS, if reading PORTx = 0x00 (no pull-ups), and DDRx = ALL_INPUTs. But, again, we can't write those registers willy-nilly, they have to occur within certain bus-clock timing-requirements. So, we can't do

while(!clock-edge) {};
if(read) { 
   DDRx=ALL_INS; 
   data = PINx; 
} else { 
  //(DDR's already out) 
  PORTx = data;
}

as we've already exceeded our clock-cycle by merely running those if-tests and jumps. Instead, e.g.

//BEFORE beginning a bus transaction: 
registerA = (doRead ? ALL_INS : ALL_OUTS); 
registerB = (doRead ? DONT_PULL_UP : writeVal); 

BeginTransaction...;
<do early bus-transaction stuff>;
DDRx = registerA;
<NOP until the WRITE clock-edge>; 
PORTx = registerB; 
registerC = PINx; 
<do remaining bus-transaction stuff>;
//After transaction is complete:
if(doRead) 
  *readData = register c;

(This should be done in assembly. I've explained why elsewhere, but basically, the optimizer is likely to choose to reorganize the code).

...or, yahknow, you could have two separate functions for read vs write transactions (how'd I not see that weeks ago?). However, the above is only a tiny fraction of the entire bus-transaction, most of which is identical in both cases.

Anyways, at some point, if this system is to interact with *real* 8088-compatible hardware (any), then the system has to interact with external peripherals in an 8088-compatible manner... I kinda want to run a soundblaster and CGA card from my AVR... SCSI... Maybe HPIB... Maybe even build some ISA cards of my own. OTOH, maybe interfacing with ISA directly, rather than the 8088-bus, is a better idea. Anyways, the goal of this project isn't to emulate an 8088 system, but to emulate an 8088 *chip* (otherwise, just run dosbox!).

----------------

So, again, instead of the 8088's separate/simultaneous/parallel BIU/EU system, a single AVR simulating the two would have to be more of a state-machine...

A mad rush to request an instruction-byte, then a bunch of NOPs waiting to "Download" it, then decide if we need to request more bytes for our instruction, then execute, then download more. No caching, etc. It'll be a big performance hit that doesn't really have anything to do with the AVR's comparable computing-power, and something that I'm contemplating whether to continue in that way, or whether to take another path.

One obvious idea is two separate AVRs, one for the EU and another for the BIU. I've a lot of thoughts on the matter... Boils down to whether the two systems can communicate fast enough to justify it...

A regular 8088-bus-transaction would take a minimum of 20 AVR clocks per bus-cycle (to read or write a single byte of data). There's something like 6 bytes of transmission (data/address/control-signals) during a bus-cycle... (maybe five by considering that the BIU tracks the segment-registers)... So, you might think there's a significant improvement in speed (6/20ths!) by communicating to the separate BIU-AVR, rather than directly to the bus, itself.

But there's also some time to synchronize the two AVRs, since byte-wide transmissions can't go on in the background... The EU will still have to wait for the BIU to complete its bus transaction (can bus-transactions be interrupted and restarted?). The BIU will have to enable interrupts, or otherwise test, for when the EU has begun an "inter-processor-communication", then send an acknowledge... The EU will most likely be doing nothing but waiting for that "ACK", so will be able to respond, almost immediately, with the burst of 5 remaining data/address/control bytes (at 20+MB/s!).

But with all that ACK/response and waiting, even sending data in parallel I think we're talking something like a minimum of 15-20 cycles per transaction. Nevermind, again, waiting for a previous BIU transaction to complete before it can turn on interrupts, and whatnot... and we're right back where we started... Maybe multiple simultaneous serial I/O's, since the serial peripherals have buffers...? Or, a byte-wide, six-long shift-register setup? That's a lotta 74574's! (And, still doesn't help when *reading*).

Weeee!

----------

So, the suggested "project-scope creep" of using the AVR's internal peripherals/ROM/RAM may be somewhat reasonable for early-experiments in this project... Besides being a faster system, it'd also, likely, be easier to implement. A fully-functional (albeit limited) *system*, running 8088/86 code, could probably be implemented in the AVR without even touching an 8088 bus, nor implementing a BIU. The peripherals themselves would be much easier to interface with, as well, even at the register-level alone (and ignoring the BIU inbetween) (especially since I'm already familiar with so many AVR peripherals).

And, atop all that, if the BIOS (which'd have to be custom at that point, which really doesn't matter because I planned on custom/limited program-code in the first place while implementing instructions a few/one at a time)... if the BIOS was implemented in the AVR, as well, then it would be flash/in-system-programmable, making early experiments *that much easier*.

And, without all those bus-transactions, the entire system might well be "screaming fast", even in comparison with the original 8088 chip running in an XT.

So, it's something to seriously consider...

---------

If I take a step back and think about what led up to this project, I realize the original goals weren't at all about emulating an 8088, but about interfacing with its hardware; using the motherboard, ISA-slots, etc. as a platform, and with a processor I'm familiar with (AVR). So, this project has already "scope-creeped" quite a bit from my original intentions. On the plus-side I've learned a bit about an architecture that was a complete "black-box" to me (x86). This whole emulation-aspect has been a bit of a side-track, but I've learned to read/understand x86 assembly, and I've tangentially learned about the low-level/register-level interfaces of the sorts of devices I'd originally intended to interface with. (e.g. the CGA card registers are somewhat well-learnt by looking into the XT BIOS listings, which are well-commented).

So, maybe it's time to take a breather and decide where I really see this thing going. Do I really want to emulate an x86, or should I spend more time looking into source-code used to interface with its devices (now that I'd be able to understand it)?

I guess where this project really got started was when I realized just how slow the 8088 was... Reports claim the 8086, running at 5MHz, ran at roughly 300KIPS. The 8088 at 4.77MHz is surely a bit slower. An AVR runs at roughly 1MIPS/MHz, so it's well within an AVR's abilities to execute instructions 40-60 times faster than an 8088! (albeit, 8-bit rather than 16-bit, and in many cases much simpler).

So, I guess, the point is even with an AVR's limitations (like not being able to execute instructions from RAM), it could still, likely, make for a comparably-fully-functional computer to an original PC/XT, even if it has to "emulate" instructions executed from RAM by e.g. having function-calls for everything.

Discussions