Let's say my huge code is correct, that I didn't mis-count T-States, etc...
If the huge code works *without* counting M1-waits, but does *not* work when counting them, then most-likely that means the T6A43 does *not* have M1-waits. Right?
But, again, assuming my code and T-State counting is correct, then I should be getting the same results in measuring the T-States in a specific time duration via two different methods, which I don't.
So, now, Here's an odd thing... Regardless of whether I count M1-waits, my two time-measurement methods are off by about 2%. However, what *does* change, depending on if I count the plausible M1-wait or don't, is which timer-function reports having counted more T-States during the same time duration.
So, In the last log I did a lot of wild speculation...
And, here I do some possibly even wilder...
What if there is a half-cycle M1-wait?
This doesn't work with the standard Z-80, since the /Wait input is only sampled on the falling-edge of the clock, *and*, even if it sampled on the next rising-edge (during the previously-requested wait), then all the clock levels/edges thereafter would be swapped.
But... Why not?
As @ziggurat29 pointed out in a comment, the T6A43 ASIC (not VLSI, as I always mix-up... AS != Average Scale, AS=Application Specific).... The T6A43 ASIC is, exactly that, Application-Specific, AND, Far Newer than the original Z80...
So, what could that imply?
Well, first, 6MHz, as most claim the clock frequency to be, was well within the speeds of the era this was designed.
Second, it seems to have been designed for an RC clock... I dunno off-hand how it's implemented, but an easy-enough way would be simply to discharge the capacitor through a transistor, then allow it to recharge through a pull-up resistor, until it reaches some threshold voltage, then discharge it again. Creating a sawtooth wave... which doesn't really do too nicely for a digital circuit like the Z80 which does things on both edges of the clock. So, the simple solution would be to run the sawtooth at Twice the frequency, then use a Toggle-Flip-Flop to divide that in half for a nice 50% square wave.
Now... If one happens to be designing an ASIC like this in some newish era where memories are *much* faster than the original Z80 was designed-for, BUT, there still remain vast quantities of those older slower memories, AND, in that same era tradeoffs were also being made for speed vs low-power... One *Might* consider speeding up the Z80-ASIC a bit for later end-product production-runs, but ALSO leave the option to use slower memories in present runs.
OK... So we come to my weird thought... in a bit... So, maybe the T6A43 is really quite capable of much higher clock frequencies than 6MHz... Then maybe even those newer-faster memories would need wait states...
Alright. Now, unlike a real Z80, we've got control of the Discharge transistor for the RC "oscillator"...
When a /Wait is detected, we could simply discharge that capacitor a little bit early, either at a lower threshold voltage, or, frankly, even *immediately* when that clock-edge is detected alongside the low-active wait. until it is no longer. And thereafter our clock-periods return to normal. Now we can control our wait-state durations externally... Fractions of a typical clock cycle or multitudes.
This, I think, should be a pretty simple thing to do, when you've got gate-level access to the z80-internals. And even those *extremely* fast clock-cycles merely sampling the /wait shouldn't be a problem, since I'm guessing the entire CPU basically halts, holding its state, through wait-cycles.
The next thought, for me anyhow, is whether similar could be done with a real Z80... And, I think I've come up with external circuitry that could at least make for /half/ wait-states. The key, in this case, is whether the Z80 would freak-out if suddenly one clock pulse was missing or shifted slightly...? Why would it? There's no PLL generating multiples of the clock frequency in there, right? Presumably it does most everything based on its two clock edges, and anything it does inbetween is due to propagation-delays.
And, if it's true that the Wait states halt everything in the CPU short of the circuitry detecting when to return to normal, it would seem rather likely the clock signal input itself is basically *removed* from all except that circuitry during waits... So, seems rather likely the clock signal could actually run *much* faster during a wait... say double (since the wait-detection only samples on the falling edge). And that clock-doubling can be done by adding a T-FF and some glue logic. Much easier if you have access to the internal gates, but not particularly difficult if you don't.
Half an M1 wait-state would bring my timing measurements to nearly identical.
It may also explain why my system works without counting M1-waits, but *doesn't* if I do... Because, again, the error may be the same between my two measurement schemes, but it's *swapped*... So, the half-M1-wait measurement-error in my system neglecting M1-waits isn't too bad, but increases quite a bit when using Full M1 waits, due to the facts: the measurement loops are *tiny*, so M1 error *barely* affects their results, while the actual loops which rely on those measurements are comparatively *huge*, tallying up a LOT of additional T-States through a lot of small instructions. (5 additional T-States in 35 is quite a bit less error than 1 in 4).
I dunno, it's a lot of speculation that, really, I don't /need/ to do since it does function, if for the wrong reasons. It'd be nice to hook up #OMNI 4 - a Kaypro 2x Logic Analyzer and get some real answers, heck, that thing was practically designed for this! Alas, that'll have to wait...
Of course, I couldn't.
My Omni4 has only one (of two) pods... which means I can only sample 8 bits, along with a clock and a trigger. Frankly, I'm very curious about the clock, itself... Does it stay relatively steady throughout various instructions? Well, I could be wrong, but I'm guessing the "clock" input is used to latch in data, as opposed to being sampled (multiple times per cycle)... So, 8 bits of data is a bit limiting in a logic analyzer. I don't think I'll be able to use its z80 instruction decoding.
OTOH, I don't *have* to use that software-decoding... and the OMNI4 can allegedly sample at 20MS/s... So, if I'm thoughtful about it, I'm sure I could determine a few things with what I've got.... and, maybe it's good I only have the one 8-bit pod, 'cause soldering 24 "pigtails" to probe seems a bit ridiculous, if not exhausting.
It may happen... If I plan it right, it's a matter of soldering pigtails, doing a quick run, maybe taking some photos, then visual post-processing.