Close

Fast AVR intercommunication: Bus Interface Unit = coprocessor...?

A project log for Improbable AVR -> 8088 substitution for PC/XT

Probability this can work: 98%, working well: 50% A LOT of work, and utterly ridiculous.

eric-hertzEric Hertz 01/06/2017 at 10:397 Comments

1-26-17: This rambling was written a while back... apparently I never posted it (was a draft).

-----

There're two "units" in the 8088/8086 that operate simultaneously.

The Execution Unit (EU) handles execution of instructions, etc.

The Bus Interface Unit (BIU) handles interfacing with the data/address bus, etc.

Wherein we come to a bit of a difficulty...

AVRs don't have DMA, so every change to the data/address/control bus requires processing on the part of the AVR. And, the bus-interface requires a lot of changes... basically a single 8088 read/write bus-transaction takes (at least) 4 bus-clocks. Now we're talking about running the AVR at 4-5x the clock-rate, so we're talking 16-20 AVR clock-cycles to handle a single bus-transaction (assuming there are no wait-states).

That doesn't sound like much, except consider that that time *can't* be used for other things (like actually *processing* an instruction).

And, consider that many instructions will take on the order of 20 AVR-instructions to execute, which means that the processor will spend nearly as much time *reading/writing* to the bus as it spends executing instructions...

Also, consider that the bus can't be accessed willy-nilly, transactions must align with the bus-clock, which could slow the thing down while waiting for a rising-clock-edge, up to an entire bus-clock for *every* transaction!

Also, consider that once the majority of the bus-changes have occurred (loading the address bytes), the remainder of the bus-transaction is pretty sparse. But, with only 4-5 AVR instructions per bus-clock, there's really no time for processing other things (e.g. using an interrupt, since an entering/leaving an interrupt-handler, alone, would require 2-3 AVR clocks).

The bus is a bit of a bottleneck.

Also, the BIU is responsible for caching instruction bytes, whenever it can... which, in the case of an actual 8088 means that it can do things *while* the EU is not accessing the bus... a coprocessor, of sorts...

---------

I've an idea of adding a second AVR that will interface with the bus... not unlike the 8088's dedicated BIU.

Somehow these two AVRs have to communicate... and *fast*.

I haven't done *all the math* but it looks like something like 6 *bytes* have to be transferred from the EU to the BIU for each bus transaction (writes)

So, a simple thing might be to have a dedicated procedure, e.g. the first byte always indicates the type of transaction, the second is always the lowest address-byte, the third is the second address-byte, and so-on.

But, somehow the two systems need to be synchronized. We won't *always* be transferring bytes to the BIU, only when a transaction is requested.

So, it might be nice to be able to send a "start-of-transaction" indicator... One idea would be to use a certain bit in the "type of transaction" field. But what when the very last byte transferred *also* has that bit set (e.g. an address-high byte)? Now I need to send a 7th byte just to indicate that the AVR-AVR bus is idle. (And, how's the BIU going to constantly monitor that input byte while it's also handling bus-transactions?)

Similarly, one could dedicate a GPIO (a 9th bit) to the process, to indicate when a transaction is starting. Again, this'd take one AVR clock-cycle to set, and another to clear.

These are all doable, maybe, but I've another idea to throw in the ol' tool-box.

What about setting up a timer to generate a (one-time) pulse at the start of a transaction? Then, technically, the AVR can output *9* bits *simultaneously* during the first transaction.

Discussions

Mars wrote 01/26/2017 at 20:29 point

There's some space for optimizations.  If the next address you read on the bus cycle, do you have to put A8-A19.  I'm not familiar enough with the bus:  On my project I have SRAM and latches, so I only need to change the address lines that actually change.  I don't know if this helps you:  On the cat-644 I have memory broken into 256-byte pages, and I only latch the upper address bits if the next access is on a different page.  For sequential operations, like copying a block from internal SRAM to external SRAM, I can do half bus cycles.  So, when I know I am reading or writing a string of bytes, I have a 'burst' cycle where I have a full cycle followed by up to 255 half-cycles.

For instance, you can optimize sequential access during instruction fetch.  Unless there's a jump, the program counter is reading sequentially.  Update the address lines for A0-A19, and then read the next N sequential bytes:  I don't know, 4 or something.  When reading these, you don't have to update anything but A0 to A7, a single byte AVR port, to read sequentially.  (As long as your starting address is properly aligned).  So have read 4 bytes in less than 4 full bus cycles.  Then you have the next few instructions worth of bytes already in the AVR's SRAM (2-cycle access), or if you can spare them, in 4 of the AVR's registers.  It's a prefetch!  Now you only have to do a bus cycle if any of these instructions touch memory, and then only for the instruction args, not the instruction.  I think even if you only read 2 bytes at a time, you can get some benefit from this.

  Are you sure? yes | no

Eric Hertz wrote 01/26/2017 at 23:49 point

Brain's failing...

Firs paragraph: The only data I have on the 8088 bus is that I've found in the 8088 datasheet, and the 8288 datasheet, and otherwise pieced-together based on what all's connected to them. As far as I can tell, the entirety of the transaction is performed every time, but that does seem a bit ridiculous, doesn't it? Technically, for e.g. SRAM or EPROM, all you have to do is change the address and wait for the data access time. Hmmm... But, actually, they do latching on AD7:0, rather than A16:8, so I guess that answers that. Weird, now that you mention it. And certainly something in there to consider for the AVR-AVR communication.

Second Paagraph: the prefetch thing is basically exactly what the 8088's BIU does... 4 bytes, in fact. (The 8086 has 6). I've my doubts as to whether that really serves any benefit since, basically, the 8088 must constantly be accessing the bus. Sounds a lot like a traffic-jam to me. One car speeds up, the one behind speeds up, the first stops, the second stops, and almost runs into it... That's for instructions, alone. Then so many instructions access RAM or other I/O which aren't cached... The BIU's most-likely fetching another instruction for its queue, so now the RAM access is delayed further. Something about a third car pulling into the lane immediately after the first car speeds up, causing the second car's speed-up to be delayed even further. But worse than that, the two original cars are more like a leader and a follower, so when the second one can't speed up, the first has to slow down to wait for the follower to catch up. Then they want to throw DMA and other bus-masters into the mix... stop-lights between the two original cars... But the first doesn't get to zoom ahead during those stops, it has to pull to the curb and wait for the light to change. (And I never even got my driver's license)

That said, there may be some stuff in there to consider... Not so much on the BIU->bus side of things, since that protocol seems to be pretty well-defined, but between the EU-AVR and the BIU-AVR, should I ever get that far.

  Are you sure? yes | no

Mars wrote 01/28/2017 at 00:23 point

Have you considered coupling the AVR to SRAM, and just dealing with CPU emulation?  You can always tie it to the bus later, maybe even with a separate micro dealing with the bus details.  I have been pondering different emulators for my project.  I didn't think 8086 was in reach, but if your thing runs with any reasonable speed, I might try it.

  Are you sure? yes | no

Eric Hertz wrote 01/28/2017 at 02:18 point

@Mark Sherman I think 8086/88 emulation is definitely doable, I originally gave an AVR drop-in replacement of the 8088 a 70% likelihood of "working well" (as in on the same order of magnitude of speed as a real 8088 at 4.77MHz)... I'm lowering that to 50% likelihood, knowing a bit more about the architecture, now. But, that's including the bus bottleneck, a single AVR with no external SRAM, etc, emulating an 8088 *chip*... I'm guessing, would probably run the PC/XT system on the order of 50% as fast as with a real 8088 chip.

Throwing SRAM and stuff at it would change things up dramatically. And if you *only* used AVR peripherals/memory and never interfaced with the bus, I'd think it'd run on-par with a PC/XT, if not faster. Especially with your fast instruction-parsing method.

I've definitely considered emulation of the *system* rather than the *chip*... I've several logs mentioning that... one of the later logs discusses the original goals that led me to this project wasn't to emulate the x86 instruction-set, at all, but to have access to the peripherals. I think your project is a prime example that an AVR can make for a fully functional personal computer, on par with, if not faster than, ones that were once top-of-the-line. Especially considering all its on-chip peripherals and ability to work with newer technologies like SD-Cards and ethernet.

(One example I used is writing a byte from RAM to the serial port. AVR->USART ~= 3 AVR-clock cycles; load from internal RAM to register, write register to USART TX register. AVR->8088bus->8250UART = *minimum* 8 8088-bus cycles: 4 cycles to fetch from RAM, 4 to write to the UART. And, an AVR can run more than 4x as fast as the 8088 bus, so that'd be 8*4=32 AVR cycles, minimum, more than ten times as long!)

But if you're looking to me for an example of a functioning AVR emulating x86 code, you've got a long and plausibly infinite wait ahead of you ;)

  Are you sure? yes | no

jaromir.sukuba wrote 01/26/2017 at 10:11 point

It's getting complicated, I'm curious how you work it out.

  Are you sure? yes | no

Eric Hertz wrote 01/26/2017 at 10:25 point

Spoiler-Alert!

---------

Haha, well, this was a draft from over a week ago... I've since scaled-back my goals quite a bit, based on new understanding. I'm now planning to run the AVR at 1:1 with the bus-clock, setting up the address-bits *prior* to starting the transaction, and allowing the AVR to insert its own "wait-states..." I think it fits within the specs of the *devices* rather than trying to match the guarantees from the 8088 specs. The result is a pretty simple bus-transaction function consisting of around 10 lines of code, in C, no less. 

Guess I did some overthinking early-on. Not unusual for me. Though these ideas may be useful in later versions.

  Are you sure? yes | no

jaromir.sukuba wrote 01/26/2017 at 10:32 point

Generally, it's easier to fix overthinking than underthinking. Don't ask me how I know.

  Are you sure? yes | no