I've several past logs regarding the possibility the T6A43 Z80-alike makes use of a wait-state on M1 machine cycles.
These are my latest findings:
First, the setup:
I have a 41.667Kb/s serial signal, pulse-width-modulated (a 1 is a 16us high, 8us low, a zero is 8us high, 16us low) and a 4800bps UART serial signal to compare to.
41.6K is too fast to process in realtime, so I first take 1024 samples then post-process. A packet usually consists of 49 pwm-bits, and I generally measure around 272 samples containing a frame. I use the INIR instruction to grab 256 samples at a time, and pad with a nop between INIRs. Thus, each sample should be 21T-States (unless there are added wait states).
Yes, I disable LCD DMA and interrupts.
272samples/49pbits×21T/sample×41667pbits/sec gives 4.857Million T-States per second.
Then, in the same program I "autobaud" by measuring the number of T-States during a "?" received by the UART at 4800b/s. The use of "?" for autobaud makes it easy to see a distinct change between the start bit and the first data bit, and the last data-bit and the stop bit.
I measure the number of loops between detection of the start of the first bit and detection of the end of the last bit. Each loop is 35T-States long, assuming no wait-states. Though, the loop looking for the first bit is 29T-States.
The counting loops look like:
inc hl ; 6T
in a,(7) ;11T
and b ; (rxMask) 4T
cp d ; (rxOne) 4T
jp z, loop ;10T
If M1 waits were implemented everywhere, this loop would be 40T-States instead of 35, and the INIRs would be 23T instead of 21.
The autobaud measurement determines 993T/bit at 4800b/s, which amounts to 4.766Million T-States per second vs. the other measurement's 4.857. That's an error of about 2% which is enough to begin being concerned about UART timing.
So, I started looking into potential error sources and whether they could account for the difference. First and foremost, if an M1 wait-state went unaccounted-for those numbers wouldn't get *closer*, they'd in fact veer even further apart. 5.45MHz vs 5.32MHz, 102.4%, vs 4.77 vs 4.86, 101.8%. A minor difference, I suppose... I could've sworn it was further off than that.
So then I tried to account for the error elsewhere. E.G. what if the edge of a UART bit occurred immediately after sampling, then there'd be nearly 35T of error. And if the edge of the last bit occurred immediately before sampling, then there could be an additional nearly 35T of error.
But... at 993T/bit, or 7945T/8-bit frame, that measurement error is nowhere near enough to account for 2%.
This is important, again, to my project because ultimately I won't have autobaud as an option and will have to generate my UART timing based on the 41.6Khz signal. I looked at it with a scope and measured darn-near exactly 41667Hz. So, it's plausible my 4800bps UART connected to my computer (via USB) is off by some UART-Acceptable percentage. And/Or it's possible M1-Waits are a thing...
But I ran some numbers and found another plausibly-reasonable explanation...
Note, I went through all this because: If I autobaud accounting for M1-waits, and also account for M1-waits in my UART code, it sends/receives garbage.
The other plausible explanation I came up with for the 2% error between the 41.6k signal and the 4.8k signal is that I/O may have wait-states. If there is no M1 wait-state, but there is one wait-state for port reads, then the numbers drop from 2% error to 1%. If, plausibly, INIR has two wait-states (why?) then the measurements from the two sources align almost perfectly at
I did these calcs yesterday, and they did add-up. Why The Heck aren't they now?!
I thumb-typed this whole thing for no reason?!
This Is Really Frustrating.
I spent *weeks* making all the code switchable between M1-waits and no M1-waits, had to recount every friggin T-State in every friggin function *several* times, because at first I was *certain* M1-waits were the culprit, despite the fact autobaud and my UART code worked together without accounting for it. So, at first, I just deleted all the T-State counts that *didn't* include M1-WAITS... Then recalculated everything based on M1-waits, then it stopped working. So then I spent several days /again/ pouring over all that code, again, to put Non-M1 counts and calcs *back*... Then switching between them was too much a pain, so I rewrote the code with .equs that allowed for quickly switching... And, STILL, it only worked with *not* accounting for M1-waits... I mean, I've looked that code over *Numerous* times.
Last night I *Finally* came up with an explanation... Got that error between autobaud and the 41.6k signal down to a fraction of a percent by a reasonable explanation (a wait-state on I/O transactions)... And now those numbers don't add up.
Problem is: 2% error isn't a whole lot, but it can be additive in the case of UART bitbanging. At the end of 10 bits, it could be off by half a bit or more, which could be too much. But, more likely problematic is that that 2% will be added to the error in my measurements and in my bitbanging delays... Which I designed to be as error-free as possible, expecting maybe 2% worst-case. They might cancel out, or they might sum-up, now to 4%. Heh. I could just try it, as is, forgetting the error, and it may or may not work. it may work most of the time, but fail seemingly randomly.
THIS part *should* be pretty easy to "get right"... the pwm and autobaud sampling loops are *tiny* in comparison to the bitbanging loops. Surely I can account for a difference when dealing with 21T and 35T loops on the same source.
Maybe I need to run the sampling/post-processing on the same autobaud source and see how they align. SHEESH. This Just Keeps Dragging Out!
oh, but worse... Because the UART code does NOT work when counting M1-WAITS... Because, even though the error is about the same for the two methods, around 2%, regardless of M1... The error is *reversed*. In one case pwm-sampling measures the higher clock frequency, in the other case autobaud does. What The Heck.
Same source... I need to measure the same source. *Sigh*
Hours later, autobaud now runs both ways, from the same source ('?' at 4800bps); sample-first, then count samples, and count loops between highs and lows in realtime.
Dang-near /exact/ same results.
4.82-4.84 million T-States when sampling, 4.74-4.76 when watching for edges in realtime.
I've checked the realtime edge-detection code several times since designing it, and again, now. It's a tiny bit confusing because the loops *start* with the increment, and exit after sampling. So, technically, if you think of each loop as a single unit, then the counting is off by one... so it's a bit confusing and could lead to an extra or missing count if not handled right... I can't count how many times (and ways) I've drawn a timing diagram to make sure... But, even still... even *Numerous* missing loop-counts don't account for 100kHz difference.
Is it plausible the RC oscillator varies /that/ much, consistently, (and so quickly!) based on the types of instructions executing?! Maybe due to voltage sagging with power-usage or something?!
This sh** be crazy!
So, if today's math is right, there's no confirmation either way regarding M1-waits. It's about the same percentage-error either way (just swapped).
And if today's experiment is right, that the clock frequency may vary somewhat dramatically depending on what instructions are executing, then even my /original/ autobaud/UART results (that the UART code produces/receives garbage when accounting for M1 waits) may be non-indicative, as well....
Again, the difference between the sampling and edge-detection methods is 21-23T states vs 35-40T... just a handful of instructions. But the UART bitbanging code is *huge* in comparison... About 200T-States to handle each bit, calculate the remaining delay, and then the delay loop itself. So, being mostly small instructions, an additional T-State on each results in a much greater difference in the end.
If this thing *does* use M1-waits, my accounting for them in the code may very well be overrun by the newly-discovered plausibility of inter-instruction clock-variance!
Despite ALL those weeks of careful calculations, it may've been just LUCK that my two systems worked together based on a bunk assumption, and *don't* based on the reality of the system!