So in my last log I said my DTACK# generation was "dumb, but working", however it turns out that it was just dumb. Yes it did work in the sense that it would pull DTACK# low after a configurable delay based on the shift register, but the problem was that it didn't go high again when it should, it simply delayed the signal (we'll call this BCYCLE# as per Alan D. Wilcox's 68000 Microcomputer Systems):
BCYCLE#: ‾‾‾‾\______/‾‾‾‾‾‾‾‾‾‾‾‾‾‾‾ DTACK#: ‾‾‾‾‾‾‾‾‾\______/‾‾‾‾‾‾‾‾‾‾
This might not look to bad on the surface, but really DTACK# should be negated (made high) as soon as the CPU has indicated that it's stored the data on the bus. The effect of not doing so can be seen on sequential bus reads/writes:
A B BCYCLE#: ‾‾‾‾\______/‾‾\______/‾‾‾‾‾‾‾‾‾‾‾‾‾ DTACK#: ‾‾‾‾‾‾‾‾‾\______/‾‾\______/‾‾‾‾‾‾‾‾
In this second scenario, the first bus cycle completes at time A, but because DTACK# doesn't follow suit it's still asserted low for sometime into the next cycle, starting at time B. The upshot of this was that it appeared continuously grounded after the first cycle in any given run and using later and later outputs from the shift register had no effect on the operation.
Clearing DTACK on Time
I studied the circuit diagram in Wilcox's book and pondered the apparent complexity of it... initially I'd not seen a good reason for negating various signals and the like but having encountered and diagnosed the issue just detailed I finally realised what was going on. The circuit in the book (which is now what my solution is based on (not identical, but close) uses the shift register in a different manner. The core concept revolves around using the clear input of the shift register to reset it's output at the moment required.
By holding the serial data input (SDATA) high, the chip's output pins will always be high, except for the first few clock cycles after a reset, at which point the 1 values need to propagate through from QA to QH, one clock at a time. This gives us the delay control needed, and then an immediate reset to 0 is available by asserting CRCLR# low. The final piece of the puzzle is to realise that these values are the opposite of what's required, by running BCYCLE# through an inverter and the output of the shift register through another, we get the values we desire.
Talking it through: when BCYCLE# goes low, CRCLR# on the shift register will go high, starting the propagation of 1's through the output pins. This gives us a delay based on which pin we tap in to, and by inverting that 1 we'll get a delayed 0 which we can use for DTACK#. When BCYCLE# returns high indicating the end of the bus cycle, CRCLR# will be driven low, resetting the shift register, meaning all pins will output 0. Again, inverting this gives us our DTACK# high, essentially at the same time as BCYCLE# does, not after a delay causing the previous bug.
BCYCLE#: ‾‾‾\______/‾‾\______/‾‾‾‾‾‾‾ CRCLR#: ___/‾‾‾‾‾‾\__/‾‾‾‾‾‾\_______ QC: XXXX__/‾‾‾\_____/‾‾‾\_______ DTACK#: XXXX‾‾\___/‾‾‾‾‾\___/‾‾‾‾‾‾‾
The key lines to look at above are BCYCLE# and DTACK#, with those alone it's easier to see the difference between this setup and the previous one:
BCYCLE#: ‾‾‾\______/‾‾\______/‾‾‾‾‾‾‾ DTACK#: ‾‾‾‾‾‾‾‾\______/‾‾\______/‾‾ BCYCLE#: ‾‾‾\______/‾‾\______/‾‾‾‾‾‾‾ DTACK#: XXXX‾‾\___/‾‾‾‾‾\___/‾‾‾‾‾‾‾