The problems that I saw with usbboot seem to be a result of not being designed for booting multiple targets at once. rpiboot would end up serving bootcode.bin to all of the nodes at once, then serve the complete set of files to each node one at a time until no more Broadcom ROM boot devices are found on the bus. This would be fine, except that the Pi seems to hang after a certain amount of time after sending bootcode.bin if the transfers haven't completed. Since the failing node seems to die in the middle of a file transfer, I suspect that it's a watchdog timer or maybe a bug in one of the binaries involved in boot (bootcode.bin or start_cd.elf).
Whatever the root cause, it goes away as long as usbboot services only one node at a time. I came up with a fix that prevents rpiboot from talking to any other Broadcom devices in USB boot mode until it has finished booting the current device. Once it sends bootcode.bin successfully to one device, it will ignore other devices when scanning for the second stage boot device until it has completed serving files to the current device. A timeout prevents it from hanging indefinitely in the case where the current device fails to re-enumerate after bootcode.bin is sent.
I think I have it working reliably now, but it will take some overnight testing to prove it works.
A better solution, perhaps for the future, would be to make usbboot multithreaded, spawning a handler thread for each ROM boot target. This should be straightforward since the USB IDs are different for a Pi waiting for bootcode.bin (2763) and one that is re-enumerated after bootcode.bin executes (2764). This should also speed up the boot process.