As the system becomes more complex, it's trickier to evaluate the actual propagation times of the signals. Datasheets can't be relied upon because of the variations between manufacturers and batches, the PCBs add capacitance and we never know anyway.
The only way to know is to measure. Inject signals at the input and measure the delays with the scope. But it is still pretty tricky because there are combinational and sequential logic stages, and not all paths have the same lengths, so a little signal could influence the propagation time of the whole rest. Think about the carry input of the ALU, or the control signals of SHL unit.
My solution is almost simple and it requires the fabrication of a special circuit. A huge LFSR made of a lot of of shift registers (74HC164 for example) and some XOR gates. For suitable LFSR polynoms, look no further than http://ygdes.com/GHDL/lfsr4/
The clock input can be selected from several oscillators, and the LFSR output are scrambled/shuffled to randomize a bit the generated bits. Each output of the examined circuit can be checked with a 'scope and jitter can be measured. Since the project is a 16-bits CPU, it's easy to cycle through the 64K input combinations and see "offending" (lagging) signals a few times per second, which is suitable for visual observation on a 'scope.
From these observations, subcycles from the main clock can be allocated for each unit. For example I'm curious about the REAL propagation time of a carry through a chain of 74HC193. I can only estimate the time with the datasheet but the real delay will influence how long the instruction time will take.