In the log 129. Counters strike ! I started considering the new version of the "counters". It's easier said than made and the many clock domains don't ease the design. But I managed to draft one "bit":
Each byte can be read with one byte select per counter (but the latency is quite high as the data ripple through many gates). Writing is another story because it's asynchronous so a byte clear precedes a selective set. Each byte has their own clock domain, selected among various sources, including the preceding counters. And the counter's value comes either from the local incrementer or the previous counter's value, for the cases of arbitrary frequency dividers (think: baud generator as a trivial example, the previous incrementer is left unused in this case).
This is quite scalable, the size of the pool of counters can be configured at will. A 8×8 bits block seems like a good compromise but nothing keeps one from changing that.
. . . . . . . . . . . .
Looking further, 2 main concerns arise:
- Latency : I don't want the I/O space to be too constrained or constraining. There might be a cacophony of wires, registers and decoders that would probably slow everything down. So let's consider the I/O space as "mostly asynchronous" from the core's point of view. This means that IN and OUT should have a "completion" flag that lets the core resume operation once the IO is done. Which means I must adapt/change/update the instruction's FSM...
- The register map. So far I have identified that each counter byte could have 2 addresses: one for read and write of the value, the other for control and status. Thus there is no constraint on the total number of counters: implement as many as you like. However, timing and scheduling becomes critical at this point so look at the previous point.