In the log 62. Floorplanning the diagram shows a zone on the left with incrementers and event matching logic. Since then, I have expanded and genericised the system. Let's just say I've been traumatised by the timer circuits of chips like the i8253 and microcontrollers: the features are often very anti-orthogonal, obscure, hard to configure... This is often due to "evolutions" of the platforms, such as the PIC16F family for example, which progressively introduced this, that, and oh, this new bit for added convenience... and after several generations, the datasheet and user manual become totally obscure. The circuit itself is a mess of logic gates.
What I propose is simple and flexible, based around 8 identical byte incrementers, and MUXes to select the clock sources. That's all.
Of course, the "smart" parts are in the MUXes and their organisation. The first thing is that the source of one incrementer can be the overflow of the preceding one, such that they are cascaded. This can count events on one byte, or two, or three or more... The 8 bytes become fully useful, because there are much fewer structural restrictions. The whole space can be partitioned into 8 independent counters or form a monolithic counter with 64 bits, or anything between. Each incrementer has its own configuration register that selects the source between zero (default), the neighbour (cascade) or any other source of event (internal or external). To ease resource allocation, a given signal may be available on 3 different incrementers (giving the pattern 2:3:3 that maximises the chances of finding consecutive available incrementers).
Incrementing a byte is not hard. There could be a "ripple" effect in a cascade of incrementers but this can be mitigated by circuits. However reading and writing the 8 bytes, such that they are available with the IN and OUT opcodes, requires more circuits. For example, reading requires 8 MUX8, similar to the register set but with only one read port. A write port can be avoided by only allowing clear/reset commands (for example with a write-only byte-wide register with 1 bit per incrementer). However this is not enough in the case of a frequency generator for example, where the value of the counter is reloaded from another register. For this purpose, one half of the registers can be written, and their value is copied to their neighbour when it overflows. Structurally, that makes 4 pairs of bytes.
Reading a cascade or simply one counter creates its own pile of problems if its clock is not synchronous to the core.
A clear-on-read byte can hold the overflow/saturation flags of the incrementers, if software polling is chosen.
Diagrams are required, I know.