It should have been obvious from the beginning but it is easy to double the speed of scanning the state space if only the crossings are considered. It comes from the well known fact that all the orbits come in pairs !
Each iteration must detect both X==0 and X==-1, with 0 going to the "primary orbit" and -1 (or "max") going to the "secondary orbit".
The problem is that now, the iteration has 2 tests and 2 conditional jumps (though rarely used). That makes it a bit slower, though the speedup is a solid 2 when branch prediction is working well.