Close

dumb-luck wins - Color Text is now reliable

A project log for Improbable AVR -> 8088 substitution for PC/XT

Probability this can work: 98%, working well: 50% A LOT of work, and utterly ridiculous.

eric-hertzEric Hertz 02/11/2017 at 17:172 Comments

UPDATE2: MWAHAHAHAHAHAHAHA. See the bottom...

UPDATE: DAGNABBIT DUMB-LUCK AGAIN. See the bottom.

-------------

Long story short: wait for horizontal-retrace between *every* character/attribute read/write. (Or don't... this is all wrong. See the updates at the bottom and the next log)

------------

So, it would seem...

(I tried to write this next paragraph as a single sentence... you can imagine how that went):

The last log basically covers the fact that I made a mistake in my implementing a test for the hypothesis that this clone CGA card ignores bus-read/writes while it reads VRAM to draw pixels. Despite the mistake (and it was a big one), the result was *exactly* what I was expecting, seemingly confirming my hypothesis. That "confirming"-result was, in fact, nothing but dumb-luck. And, in fact, seems completely strange that it appeared *any* different than the earlier experiments for earlier hypotheses, let alone *confirmational* of the latest one.

(This is why I *hated* chemistry labs. "1-2 hours" usually took me two 8-hour days, or more).

So, yes, the results appear to have led to the *right* confirmation, but the way those results were acquired were no more confirmational than any other form of dumb-luck.

------------

So the end-result is that *every* read or write should be prefaced with a wait-for-horizontal-retrace. Once that's done, the write/read/verify/(repeat) process dropped from (at one point) 500% errors to less than 10% repeats, and no on-screen errors.

Still can't explain the need for verify/repeats, but it works.

Also can't explain why *errors* were coming-through on-screen despite the fact I had a write/read/verify/(repeat) loop. The only thing I can think is that maybe a bus-read that occurs at the same time as an on-card pixel-read might result in the PC's reading the byte requested by the *pixel-read* rather than the byte the PC requested. (and, since the memory was *mostly* full of the same data, a read of another location might return the value we're expecting... hmmm).

(Note I refer to "pixel-reads", but that's obviously not correct in text-mode, since the VRAM contains *character*/*attribute* bytes, not bytes of pixel-data. So, by "pixel-read" I mean the CGA card is reading the VRAM in order to generate the corresponding pixels.)

---------------

I should be excited about the fact it's working, and as-expected, no less.

Means my AVR-8088 bus-interface is working!

--------

In reality, I figured waiting-for-retrace was probably a good long-run idea, but I chose not to implement it, yet. I've read in numerous places that writing VRAM while the card is accessing it for pixel-data causes "snow." I didn't care about snow at this early stage. And, yahknow, the more code you put in in the beginning, the more places there are for human-error. This seemed like a reasonably-"educated" trade-off choice.

Also, I didn't just *avoid* it... I did, in fact, look into examples elsewhere... The BIOS assembly-listing shows its use in some places, but *not* in others. (Turns out, many of those examples they shut-down the video-output altogether... man Assembly is dense!)

Though, upon implementing it, it hadn't occurred to me *just how often* it would be required... h-retrace-write-read is too much!

If I didn't take this path, I wouldn't've discovered that some cards don't respond the same as described ("snow" vs. ignored-writes/reads)... wooot!

----------

UPDATE: Dagnabbit! Dumb-Luck again!

New function:

void cga_writeByte(uint16_t vaddr, uint8_t byte)
{  
   while(1)
   {  
      cga_waitForRetraceH();
      bus88_write(S20_WRITE_MEM, 
            cga_vramAddress+vaddr, byte);
      
      uint8_t readData;
      
      cga_waitForRetraceH();
      readData = bus88_read(S20_READ_MEM, 
            cga_vramAddress+vaddr);
      
      if(readData == byte)
         break;
      
      writeFailCount++;
   }
}
The contents of this function were, previously, copy-pasted where needed...

And it worked great.

So now it's in a function...

And now it's called in cga_fill() and numerous other places similarly:

for(i=0; i<FILL_BYTES; i+=2)
{
      cga_writeByte(i+1,attributes);
      cga_writeByte(i, FILL_CHAR);
}

Worked great in most locations (writing numbers 0-9 8 times, indicating that I'm actually running in 80x25, rather than 40x25 like I thought), writing the attribute-value in hex in the upper-left corner... All these look great.

But filling the screen with cga_fill() shows a regular pattern of errors again.

And guess what affects it....

Inserting friggin' NOPs between the two cga_writeByte() calls.

Not just a minor effect, either... Pretty dramatic.

Adding one nop reduces the (visible!) error by half

Adding two nops reduces it to near-zero visible error, but we have 0x900 retries!

Adding three nops increases it again

Four makes it far worse...

Here's waitForRetrace: (Don't want return, etc. to slow it down)

#define cga_waitForRetraceH() \
({\
   /* Wait while H-Retrace status is Active=1 (leave when inactive=0) */ \
   while \
   ((0x1&bus88_read(S20_READ_IO, cga_ioAddress+CGA_STATUS_REG_OFFSET)) ) \
   {} \
   /* TODO: Is this how DMA is prevented from interfering? */ \
   cga_RetraceH_CLI(); \
   /* Wait while H-Retrace status is Inactive=0 (leave when active=1) */ \
   while \
   (!(0x1&bus88_read(S20_READ_IO, cga_ioAddress+CGA_STATUS_REG_OFFSET)) ) \
   {} \
   /* No Return Value */ \
   {}; \
})

So, dumb-luck strikes again. It worked as-expected earlier, and apparently-perfectly, actually... no fine-tuning/calibration necessary... and apparently only because of dumb-luck. Again, the only difference between now and then was that I moved multiple copies of this same routine into a single function (and the small amount of necessary math to do-so). So all that's changed was timing. And I guess the timing I had before was just lucky enough to be perfect without even trying.

------------

So, entirely-plausibly, there's too much time between detecting the horizontal-retrace and actually writing the bus (between function-calls, pushing, popping, writing the three address-bytes to the bus, whatnot). BUT, that should have no bearing whatsoever on puting NOPs between separate calls to cga_writeByte!

---------

OY!

And, frankly, I don't even know what the heck I'm going to use this for...

-----------------------------------------

UPDATE 2: MWAHAHAHAHAHAHA

In preparation for a more-scientific test, to determine whether it's actually related to pixel-accesses interfering with bus-writes... the goal was to turn the video-output *off* then use the same routine cga_writeByte() to count the errors.

The *only* changes I made were:

  1. rename cga_writeByte() to cga_writeByte2()
  2. Add a third-argument: awaitRetrace
  3. Add "if(awaitRetrace)" before each call to cga_waitForRetraceH()
  4. \#define cga_writeByte(arg1, arg2) cga_writeByte2(arg1, arg2, TRUE)

That's it.

Now we have 100% write/verify (no retries necessary)... and obviously the screen isn't showing any visible errors, either.

So then I started experimenting with nops... Because, certainly, the only difference should be that if(awaitRetrace (=1)) would insert a delay...

#define cga_writeByte(vaddr,byte)   cga_writeByte2(vaddr,byte,1)

void cga_writeByte2(uint16_t vaddr, uint8_t byte, uint8_t awaitRetrace)
{
   while(1)
   {
//      asm("nop");
      asm("nop");
      asm("nop");
      if(awaitRetrace)
         cga_waitForRetraceH();
      asm("nop");
      asm("nop");
      asm("nop");
      asm("nop");
      bus88_write(S20_WRITE_MEM, cga_vramAddress+vaddr, byte);
      cga_RetraceH_SEI();
      uint8_t readData;

      asm("nop");
      asm("nop");
      asm("nop");

//      if(awaitRetrace)
         cga_waitForRetraceH();
      readData = bus88_read(S20_READ_MEM, cga_vramAddress+vaddr);
      cga_RetraceH_SEI();
      if(readData == byte)
         break;

      writeFailCount++;
   }
}
As shown, it works perfectly. Removal of all those nops also works perfectly (that was what I originally tested to get to this point in the first place).

Addition of NOPs *between* write and read showed no difference

Also, second "if(awaitRetrace)" is commented-out and still-functions (but note that cga_waitForRetraceH() is still called).

Additions of NOPs *above* write's "if(awaitRetrace)" seem functionless. Addition between waitForRetraceH() and bus88_write() do have some effect, but no number of them that I tried would repeat this 100% functionality.

I mean, this is just weird.

Oh, and if I remove the first "if(awaitRetrace)" (again, waitForRetraceH() is *still* called), and mess with nops, then we're quite-literally getting very similar randomness to the very beginning.... back when the errors weren't even vertically-aligned.

I mean, this is just weird.

(I've tried inlining bus88_read/write, as well... Actually, I think since the last time I updated this log. They're inlined now. There was a definite change, upon inlining them, just as adding a NOP here or there causes a definite change, but still pretty much the same. I've also verified the disassembly of that output, and verified that the optimizer didn't reorganize anything... so the actual timings on the bus, within a single transaction, should be identical, in *all* these cases.)

It'll be interesting to see how the universe--or who/what-ever's in charge of games like these--pulls this one together to anything explainable. Am I allowed to forfeit from boredom, or are there consequences? Do I get a fiddle made of gold if I win this one? And, how do I know if I've won... Am starting to think I already have. I've proven this shizzle impossible, as far as I'm concerned. That's *won* in my mind. Now where's my fiddle?

(We're going on 20minutes without one error).

Discussions

Mars wrote 02/13/2017 at 04:38 point

GCC will try to optimize your code and move (or sometimes delete) the nops around.  I had random problems on the cat 644 until I started using 'asm volatile' blocks instead of plain asm blocks.  C is a bad choice for cycle-accurate stuff.  (Yet I used it anyway, lol )

  Are you sure? yes | no

Eric Hertz wrote 02/13/2017 at 05:18 point

I've some commentary about that in the next log, as well...

In fact, the optimizer's reorganization of code turns out to have led to *the* solution, in this case... as it turns out my cycle-"precise" code was inaccurate. That was sure round-about!

I wasn't aware one-liner asm() statements would be moved-around, though... Hmmm. I'll keep that in mind for the future!

I usually do as much as possible in C, but when timing's necessary I'll use an asm block... Though, that's rare-enough that I usually have to dig out an old project to remind myself *how*. Volatile's one of those that I tend to forget, and passing C-variables to ASM blocks is one of those things that is difficult-enough for me to remember as to *almost* justify doing the entirety in assembly-blocks, including multi-part calculations. OTOH, it's a lot more intuitive to see calculations in C.

The kicker is that, apparently, cycle-accurate stuff is pretty difficult on nearly any other architecture I've looked into. So I guess it's a mixed-blessing I had that available to me on my go-to processor. On the one hand, it's made many of my projects possible. On the other hand, it means my methods aren't repeatable on many other architectures. (e.g. #sdramThingZero - 133MS/s 32-bit Logic Analyzer vs. #sdramThing4.5 "Logic Analyzer" which relies heavily on cycle-accuracy). I have no idea what'd happen with e.g. trying to implement the 8088 bus with a more sophisticated processor like a PIC32, or Pi... Might have to use things like DMA or dedicated synchronous peripherals (like SPI) and a much faster clock

  Are you sure? yes | no