Here's the part I missed in my well over a decade of dismissing my implementing autobaud as "hokey at best":
Ten bits make a typical serial frame. Sampling of the bits typically happens in the middle of each bit. Thus, the last bit in a single frame can be off by nearly half a bit in either direction. That's actually quite a bit of leeway.
Now, most UARTs determine where half a bit is located based on the falling-edge of the Start bit. So, basically each and every byte transmitted restarts/resynchronizes the receiver... so the error doesn't add-up as you transmit more bytes.
Now, for your transmitter, with plausibly some large amount of timing-error, just use "two stop-bits" (or, in other words, throw in some idle time) to allow the receiver to catch-up.
In fact, presuming the devices have a precise time-measurement used to generate its baud rate (e.g. a crystal oscillator, like most systems do, these days), you might still actually be better-off using some sort of frame-duration measuring autobaud than trying to match an exact pre-defined baud rate with a crystal.
E.G. I recall many microcontroller datasheet pages dedicated to tables of error-percentages generating certain baudrates with certain clock frequencies. As I recall, 2% or so error was considered about the reasonable limit.
So, yahknow, if you've got a 16MHz crystal you can generate MIDI baudrates (31.25kbps, as I recall) perfectly, but say that same microcontroller is retransmitting that via RS-232 at 57.6kbps... now the closest you can get is 57.142kbps due to a /8 prescaler...
That limit of acceptable error has to take into account that it's also likely the /other/ device may /also/ be off by 2%, in the other direction!
So, maybe, if your system isn't capable of matching the expected baudrate exactly, it's plausible you'll get better results by e.g. counting system ticks between the start of one frame and its end, then just trying to match that...
In my case with the TI-86, I [FINALLY!] came up with a system for autobaud that is actually quite easy to tack on to my already-developed bitbanged UART code, and accurate despite the fact the CPU clock varies with external factors like temperature and battery level.
The host transmits, once, '?'=0x3f. This gives a bit-level change between the start bit and bit0. As well as a level-change between bit7 and the stop-bit (which is the same level as idle).
'?' = ---<Idle>-----_------__------<Idle>---
Why? Because the loop waiting to trigger on some start-bit's edge--which could take seconds to arrive, or possibly never, requiring a user-abort or timeout--takes more instructions than the loops merely looking for the next edge. Since it only samples the pin once per loop, the delay between the start-bit's arrival and its detection could be 49T-States. Whereas, once the start-bit has triggered the autobaud routine, the loops detecting the first and last bits can only be off by 35T-states. So, even though it'd've been spread-out over more bits, it turns out (49+35)/9 is larger than 35*2/8... fewer bits, more accurate, heh!
(Also, divide-by-8 is easy! Right-shift. Though, I already have my divide function)
OK, now, I measured 224 loops between detecting the first and last edges... multiplied by 35, that's 7840 T-States for 8 bits, or 980T/bit. I was expecting more like 480, but I'd forgotten I set my serial port's default to 4800, and I also forgot to disable the LCD refresh/DMA(!), and I also have a fresher set of batteries.
So, 980T-States/bit at 4800bps gives 4.7million T-States/sec... 4.7MHz CPU (whoops, forgot to disable screen refreshes...)
Now, my delay function is nowhere near accurate to 1 or even 10 T-States (in fact, it's accurate to within 21T), but what it /does/ do is account for the previous overshoot in the next call(s)... So, over 10 bits, though each bit's edges may jitter a bit, the overall frame is accurate to within 21T-states, rather than that error accumulating.
(LCD DMA could have a tremendous impact, though didn't interfere with a single bit in a couple sentences, at 9600bps! I might analyze this later.)
Of course, the measurement might've been off by as much as 2*35T (in 7840!), then that divided by 8, and rounded-down, could add/subtract a few T/States from each bit, and be cumulative... but only within one byte frame.
So, overall, it's actually highly accurate in matching the computer's baud rate... regardless of its setting or the CPU frequency... or forgetting the LCD DMA...
Probably Overkill. But I already had everything /except/ the autobaud, which... was... daunting me for days until I realized it was friggin' easy.
LCD DMA... I wrote about some experiments in a past log that seemed to suggest the LCD driver does DMA bursts of (as I recall) 4 bytes at a time... at 200Hz refresh, it's got to load 1024×200 bytes per second. That seems like a heckofalot, and seems like it'd slow regular processing /dramatically/... Further, it seems to take 6 T-States for each 4 byte burst... which again seems like it'd add up fast. 204800×6/4=307,200T-States/sec used for the screen!? But, actually, that's really not so much when you consider that earlier number: 6T-States per 4-byte burst... well, the fastest Z80 instructions I'm aware of are 4 T-States... most are /much/ longer. INC HL is 6T. LD A,<number> is 7T... So, basically each 4byte LCD DMA-burst is about one instruction's length. I wonder if they chose 4-byte bursts on purpose? (16 would make sense, being one LCD row! And now I'm at a bit of a loss as to how I came up with 4B/6T... was it this site? https://www.cemetech.net/forum/viewtopic.php?t=16765&sid=9b37cfc1a98d95a5a8671e960b68f11f
Well, sheeiiiit... that suggests more like /would/ make sense; 16 byte (full row) bursts, 64T/16B, 4T/byte... I know I didn't just pull 4bytes/6T outta my hat. I distinctly recall being a bit surprised the LCD controller seemed to be doing full byte transactions in a single clock cycle, however also distinctly recall looking into the SRAM timings and seeing that it would be possible, being that the LCD controller is an entirely /seperate/ system than the Z80, despite being in the same VLSI... weird.
OK, then, let's say I was /entirely/ wrong, and 64T it is... that's around an 8 instruction delay inserted almost randomly in my code... though, of course, it happens periodically during autobaud /and/ transmission (bit delays)... 64T out of 980/Bit, I guess that's not a huge amount of added jitter. But, overall, there are /many/ such bursts in a serial frame, right? And, again, happens during measurement as well as transmission... So, overall, that'd be inter-bit /jitter/ as opposed to cumulative error. Hmm... And overall error, I suppose, could then be off by as much as (64+35)*2... 198T/frame... which is nowhere near half a bit at 4800bps, so should be OK (and apparently was). 9600? 980/2=490, now 198T is close to half... could be a problem... hmm... was I just lucky? Hmmm..... and where'd I come up with single-T/byte + 2T for overhead?!)
For the end-goal, actually, I don't need this autobaud system, I already have a means to determine T/bit, there... that was the daunting part, trying to modify /that/ to work here. Wrong Direction. Once I figured that out, it occurred to me this autobaud thing could be /very/ handy, elsewhere...
So, then, why'd I bother with autobaud, if my system won't be using it? Because I really have no means of /testing/ many of the pieces making up /that/ system (like the bitbanged UART) as they're developed... So, for testing the UART, I need to connect to a computer, rather than the end-goal system. Heh. Which is why I was stuck for /days/ trying to figure out an autobaud-like system that works like it should in the end-goal system. But /that/ is completely unnecessary, because /that/ piece of the system was already developed and thoroughly tested long ago. Heh.
And then, after I wrote and tested autobaud and it worked perfectly at 9600bps, I went to back-up my work... and wouldn't yah know it, the dang thing /refuses/ to work with tilp, now. Last time I was near certain the recent flakiness was due to batteries... today I discovered my USB-Serial dongle has a rotted via to the DB-9.
The culprit took long enough to find that I had plenty of time to think about the FLASH backup plan that was /supposed/ to be attacked /first/ just for moments like these, but has long been backburnered. Then thoughts went even to SD-Card, and also to just sending the dang backups via my new UART code.
There seems to be a theme, here...
Ironically, maybe, much of the stuff causing me to backburner the FLASH-backup project has been developed for /this/ project (which needs backing-up!). E.g. I've already got reading/writing screenshots, and the vast-majority of that code is the same for all 'variables' (more like files). It's also one of the main reasons for the initial backburnering, being that I thought it'd be way more difficult than it is... this isn't the page I was looking for, there's another in there somewhere. http://jgmalcolm.com/z80/intermediate/vari)
So why am I hesitating on this? Well, /now/, I suppose, because the memory is near full. Heh. Oh yeah, and because that functionality relies on TI-OS calls to move the data around, which only works with the memories the OS was programmed to work with (RAM). Oh yeah, and because it means writing some UI to select which variables to backup/restore. Oh yeah, and because it means figuring out my own... shall we call it... Allocation-Table for Files... ATF. And probably some checksumming... Heh... Starting to sound like a huge undertaking.
Heh... then there's my absurd level of commenting... Port7's include-file is over 6KB mostly due to my notes... it complies to probably less than 100Bytes. Heck, it probably has only a few hundred bytes of actual code.
So, say I /did/ actually back that up to FLASH... then my project got large (like, say, this one, or maybe the flash-backup one, itself!) Then, it seems it'd be nice to have a way to view those files without actually moving their entirety to RAM... Heh! I mean, this project's getting huge, and I haven't even started! Then, folders... ... am I bouts to write a friggin' shell?!
Resoldering/bypassing that via is going to be tough. And that was slow and flakey anyhow. Maybe I should think about smaller-scale flash, or uart backups. I think I've /maybe/ got a couple KB left to work in... heh! (Starting to understand why many larger projects were done on centralized mainframes back in the day!)