It only occurs to me now that the algorithm I use in v2.9 can be way faster and efficient. The trick is to serialise the number of the output port on the respective net and each sink de-serialises it in parallel. The number of runs is then proportional on the log2 of the number of gates, and this can be even further reduced because std_logic has 9 states, and can encode 3 bits per cycle. The 9th state ('U') is then used to force the update of the signals.
A circuit with 4 inputs and 4 gates is mapped in 1 cycle, 2 cycles can test 64 signals, 3 cycles 512 signals... It's a crazy speedup !
The funniest side is that i came up with this algorithm almost 20 years ago now. I wonder why I didn't consider it until now...
A new twist can be added with the inclusion of some "signal integrity check". A linear function extends the range of the numbers to transmit, and adds a majority of numbers that are not valid. This should catch cases with driver conflicts because the library uses std_logic_vector and not std_ulogic_vector. If we consider that one more probe cycle is not a significant cost, then we can extend the coding space by a factor of 8. The linear function 5x+3 should work well enough. The function has another offset because the input number ("gate number") can be negative (for a port).