Sorry about the lack of updates. This was not because of a lack of interest or because of a lack of progress. Instead, this was due primarily due to a lack of advances. I've been working on the project but have not found anything particularly interesting.
VisualSFM has proven to be an annoying, if capable program. It's buggy, it's slow and it's extremely frustrating due to a lack of control or error messages. It seems to have been developed in a haphazard way by someone who doesn't care about providing information to the user. It doesn't fail gracefully and often requires redoing large amounts of work after a crash.
It's still the best solution I've discovered so far. But ideally, we could abandon it, as it's poorly designed and an awful package designed for research more than practical use. Further, it poorly supports native stereo vision and is extremely slow. It can take 24 hours to process a stream of photos. This is in contrast to the ZEDfu which can do real-time analysis of a video stream.
ZEDfu has proven itself to have serious drawbacks as well. It seems to scale poorly, if you use it for a sustained period of time, it starts "skipping" and losing tracking, this can be remedied by recording a video and processing it later, but that leaves open a problem of not knowing when you lose tracking.
The program also only goes through a video as a stream, meaning that even if it regains tracking later on, it cannot "recover" the middle chunk to create a contiguous map. Though you can start the mapping at any point in a video but it will only continue on from that point. This would give multiple maps ZEDfu is obviously best for short live maps.
Other tools may be useful to combine the multiple maps created by ZEDfu. I've been checking out point cloud systems that might automatically combine maps. This is an equally difficult problem to creating the maps (involving a lot of the same problems) but have found that Meshlab can do a decent job once it's been given some comparison points manually. This is less than ideal as it requires manual intervention, but it may be possible to advance their algorithms to create a better automated process.
On the hardware front, little work has been done. I'm primarily waiting for the next generation Realsense cameras and keeping an eye out for new SBCs that support multiple camera inputs at high speeds. Most of the work done could be done with limited CPU, just so long as two cameras could be encoded together into a single stream. The Compute module can do this for relatively low resolution video (1920 width is maximum, leaving 960 px per camera in sbs). It's also possible to run the Compute module at a lower framerate, alternating left or right frames (Giving 15fps per camera at 1080p). Neither of these are perfect, but due to 4k video, future SBCs may support a better video option.
The funny thing, is that nearly all chips on these dev boards support multiple cameras, they're just not exposed to headers. A custom board would easily solve this problem, but is beyond my current ability. Another option is an FPGA. Even a basic FPGA should be able to read in from 2 cameras, though they may not be able to combine and compress the video without going to a higher end FPGA. Further investigation will need to be done to get the hardware side figured out.