Having finally had a chance to get a 74AHCT139 I've been repeating my tests. With the 74AHCT139 instead of the HCT139 in the 512K ROM/RAM card all appears to be stable at 7.37MHz
I also now understand why my timing calculations were off. I was not allowing for the skewed clock used to give a write hold (the 65C816 having none). That of course takes about 30ns off the available time. Possibly that could be shortened but some of the other devices like the 165C550A seemed to get upset without a reasonable duration.
Nevertheless at 7.37MHz even running 8bit mode code generated by cc65 the board feels fast at most things. If there was a usable 16bit mode compiler for this it would be way faster. Not only does the 65C816 execute instructions in far fewer clocks but in 16bit mode it can do things like a 16bit indexed add of memory to accumulator, whereas the Z80 takes a significant hit the moment you can't juggle most stuff in registers.
The other option would be to replace the clock skewing with additional buffer chips and direction logic but given skewing works I think this project is done for now.