Close

weirdness revisited

A project log for Improbable AVR -> 8088 substitution for PC/XT

Probability this can work: 98%, working well: 50% A LOT of work, and utterly ridiculous.

eric-hertzEric Hertz 02/11/2017 at 05:090 Comments

UPDATE: Significant-ish rewriting...

---------

last time I worked on it... a few days ago, now...

I was trying to determine what was the cause for odd-data. As you may recall, all I was doing was writing the letter 'A' to every position on the screen, along with a color-attribute, then repeating that process, cycling through the color-attributes, incrementing it every second or so.

The result was odd-data. Sometimes the new values would be placed as expected, other times it seems data was not being written to a location. The result was a screen with somewhat random data... Mostly 'A' everywhere, but with various attributes, apparently from previous "fill"-attempts.

In the log before last I wrote a lot on my attempts to explain *why* this was happening.

The first thing was the thought that maybe this cheap-knockoff CGA card was expecting data to *only* be written during the horizontal/vertical retraces... (since that seems to be how the BIOS handles it). The theory being that the card might not have the more-sophisticated RAM-arbitration circuitry of the original IBM CGA card, which would allow writes during pixel-reads (showing "snow" on those occasions). Instead, maybe, this cheaper card's pixel-reads *block* write-attempts from the bus.

Thus, I added a wait for horizontal-resync to the beginning of my process, and suddenly the data-errors aligned in vertical columns. Kinda makes sense... In fact, makes perfect sense.... In fact, exactly what I was expecting. Say the data-errors were caused by a card who's circuitry wouldn't allow write-access at the same time it was *reading* (to draw the next pixel in a line). Then there would be several writes which go-through, then a read of a pixel (and a failed write), then several more writes, then a read, etc. "beating". Makes some amount of sense.

Makes sense, as well, if you imagine that the dot-clock is faster than the bus-clock used for writing data... you'll get several read-pixels, but every once in a while [periodically] a write will come through. If that write happens at the time a pixel's being read, it would be ignored, otherwise they might be slightly misaligned and both would go through. AND, if you believe that to be the issue, that could very-well explain the newest problem which I called "even weirder." which, upon revising this log-entry, I never really get around to explaining.)

And now the "errors" would be aligned in vertical columns because the write-procedure waits until a horizontal retrace... so all writes in each row would be aligned to the left of the screen... right? So I continued my experiments on this theory.

BUT: There are SEVERAL problems with this theory...

Problem One: I didn't write only a *row* of data after the horizontal-retrace signal. Nor did I verify we were still in the horizontal-retrace before writing each byte. In fact, I wrote the entire screen's worth of data. So looking back there should be *NO* inherent guarantee of vertical-alignment of the errors due to the addition of retrace-waiting. In fact, it really shouldn't've changed *anything* regarding error-alignment, except through luck.

320 pixels are drawn in each row and there's some horizontal-porch time, as well, before the next line is drawn. But 40*25*2=2000 data-locations are written after that first horizontal-retrace, to fill the screen's character-memory. Assuming the bus-clock and the pixel-clock were the same, we'd also have to consider that each bus-transaction is a minimum of 4 bus-clocks, so now we're at 8000 pixel-times' worth of data. (And, I think the pixel-clock runs faster than the bus-clock). We're talking each "fill"-process is at least an *order-of-magnitude* longer than a single row's being drawn. Probably more like *numerous* rows' being drawn, maybe even numerous frames. And, that entire fill occurs in one continuous burst after that first h-retrace. Thus, again, any vertical-alignment in the errors would amount to nothing more than luck that my character/attribute-writing routines, bus-transactions, and more just happen to be divisible by 320 pixel-clocks (plus h-refresh).

There should be no guaranteed vertical-alignment of the errors. And yet, adding this horizontal-retrace checking *at the beginning* of the fill-process, caused exactly that. As I *mistakenly* expected it to.

I'm harping on this because, before I added that one mistakenly-placed singular h-retrace wait at the very beginning, the data was *definitely not* vertically-aligned. The purpose of adding testing of the h-retrace was to test my hypothesis that the errors were caused, as described earlier (in part at least), by the possibility of the CGA card's not allowing writes while it's reading. I knew it wouldn't solve *all* the errors, but the point was to test whether those errors aligned... With the understanding that how I *intended* to implement it was such that each *row* of data that I'd write would be synchronized with the h-retrace. But I Did Not Do That. I only synchronized *the beginning*.

In reality, that code-change should've had zero effect on vertical-error-alignment, as once that fill-process begins, it should take exactly the same number of clock-cycles to process as it did before (since, as-currently-implemented, there should be no wait-states inserted by the READY signal, no DMA, etc.). So, basically, it should just amount to nothing more than luck (good or bad?) that waiting for that first h-retrace happened to cause the fill-process to align the errors vertically, illegitimately "confirming" my hyphothesis. In fact, it's something even weirder than luck, because, again, if it was a matter of exactly a certain number of writes corresponding to exactly some number of pixel-clocks, and repeating, then vertical-alignment of the errors should've happened before, as well, because, again, the code that actually *writes* the data was unchanged. It was just delayed slightly, at the beginning.

And, furthermore, it really *shouldn't* have caused a repeatable effect, in the first-place, as I mistakenly chose to look for the "in h-retrace" signal, rather than looking for the *edge* of it. AND even looking for the EDGE of the signal shouldn't cause a guarantee of alignment, at all, as that edge would come-through aligned on the *pixel* clock, but only sampled at a somewhat random rate aligned to the 4.77MHz *bus*-clock divided by some number larger than 4 (bus-clocks per transaction). "luck2".

And, thereafter, if I make even the slightest change to the fill-process (even adding a single nop between each write), the entire alignment of write-errors should become completely skewed, and certainly not vertically-aligned.

"Luck3" happens to be, apparently, that darn-near every change I've implemented since (and I tried *many*, including in the fill process) still allows vertically-aligned errors. Again, good luck or bad luck? I'd say bad, in the end, as it led me down a rabbit-hole which had nothing to do with the reality of the situation. Presumptions of understanding based on a complete misunderstanding that'd been carried through numerous iterations of nothing but dumb-luck. (Why can't I win the friggin' lottery, instead?).

Another problem with the earlier theory: This vertically-aligned-error occurs regardless of whether I wait for the horizontal retrace *or* the vertical retrace (and again, not looking for the *edge*). ("luck 1a", maybe?).

----------------

Now I haven't even begun to scratch the surface of why the thing got "even weirder" thereafter... nor even mentioned what it was. But I haven't the energy at this point, and I briefly touched on it in a comment in the last log.

----------------

So, looking back, it would seem if I reimplemented the system with retrace-checking before *every* write, then it might well work as-expected. And, furthermore, the "even weirder" aspect may well disappear.

(That's another question, though... The explanation of the in-retrace/OK-to-write signals are *vague*... Just because it's OK-to-write at the instant that's sampled doesn't mean that it'll still be OK-to-write numerous bus-cycles later! What if it's sampled *right* before changing to false? What if the DMA DRAM-refresh routine kicked in immediately after sampling OK-to-write, stalling the actual write-procedure indefinitely? The only thing I see that *could* prevent this is the fact that the BIOS routines appear to look for the h-retrace *edge* AND once that edge is detected, the interrupts are disabled. Does the DMA controller send an interrupt before beginning its transfers? Doesn't seem right, it's already got a hold-request output. Does the *timer* cause an interrupt that the processor then uses to trigger the already-set-up DMA controller?)

And, seeing as how all this "dumb luck" occurred, it's entirely plausible the explanations above are *completely* wrong. Maybe the write-errors have nothing to do with the pixel-reads and everything to do with not having the 8088-bus timing accurate-enough, or any number of other factors. Many of which were discussed in that past log (two prior).

The fact the characters apparently are visually-distorted based on the foreground/background colors suggests there may be an electrical-error (short? open-circuit?) on the card's signals, as well... So depending on which signals are shorted, that too could have some effect on written data (except, would most-likely not happen so apparently-randomly!) And more.

Discussions