VisualSFM and related

A project log for DIY Stereo Camera

Creating an open source and economical Stereo camera dev kit. For VR and 3d video use.

Bryan LyonBryan Lyon 06/07/2016 at 17:220 Comments

My initial analysis of a few available software suites follows. For those not interested in reading, VisualSFM is interesting for recreating static scenes, but cannot (effectively) be used for more advanced reconstructions. Other software has serious limitations that prevent it's use in various circumstances.


A major player in the "Structure From Motion" (or photogrammetry) is VisualSFM .

VisualSFM is a free for non-commercial use application for Windows OSX and Linux which stitches many images together into a 3d pointcloud. The interesting thing about VisualSFM is that it does not require any information about where the cameras were situated. VisualSFM itself isn't actually doing any of the work in the photogrammetry, except for providing a GUI to the software it uses which include SiftGPU and PMVS. SiftGPU finds the camera positions while PMVS creates a pointcloud from the matched photos.

When you run the program, it will go through all the data files and match points in them to find where the cameras were in 3d space. This means that it can work with stereo images just as well as single images. I thought that time to process would be the problem, but it turns out that finding where the cameras were located is one of the quickest processes in the whole pipeline. Using a second camera immediately to the side of the first camera for each shot seems to add mere fractions of a second for each camera. This really doesn't seem like you need optimization, but VisualSFM does have an advanced system that lets you submit photos in "pairs". I haven't examined this feature in too much depth yet due to the negligible gain at this point. As we develop the camera further this might be a good feature to understand.

VisualSFM cannot natively support video files and cannot handle SBS images on their own. This is not a major problem as this the process of converting SBS videos into images splitting the left and right views into new locations is easily automatable using other software.

The major issues with VisualSFM are as follows:

In general, Visual SFM is a good tool for taking something static and importing it feature-complete into a 3d environment, but it requires additional processing and skills to make a perfect environment. There is a great tutorial from a guy names Jesse who learned VisualSFM in a few weeks and created a complete workflow. You can find the tutorial at . I highly recommend the tutorial as it gives you a very good idea of what is going on during each step of the process while also guiding you clearly through two extremely complicated programs.

In that same tutorial he mentions 123d Catch as a similar tool. Yes, the tool works almost exactly the same, but quality of the final results are much lower and restrictions are much worse. For these reasons I haven't examined the 123D catch very extensively. I did feed it the same data, and it returned workable data, but the errors were much more evident in the 123D catch data as well as being much lower quality.


I downloaded and ran Reconstructme, but it requires cameras which I've already discounted as untenable. It is unable to operate with the ZED or any other cameras I currently have, so I will have to address this one later. I do have a Kinect camera on the way just to see how it compares, but it will not meet our needs due to the single camera.

StereoLabs ZED

The StereoLabs ZED SDK includes a SFM demo application. This application, unlike the other demos, does NOT provide source code (though all other code examples are tied to the SDK anyway, which essentially makes the camera itself a DRM dongle). However, like the other demos, it runs in real time. You simply walk around an area while pointing the camera and it creates a full point cloud and mesh automatically. This is very impressive as VisualSFM can take an hour to work on a few dozen images (Thanks in large part to all the manual steps required).

The trade off is that the mesh seem to be highly inaccurate. Pointing at any solid color surface tends to confuse the camera. This is especially bad with visually reflective surfaces, which would obviously pose a problem to any camera system relying solely on visual data for depth perception. But worse than reflective surfaces is the fact that images seem to be processed in pixel chunks, this leads to inaccurate depth maps as any object tends to "pull" part of the background up to where it is sitting. In addition, the errors are additive. Meaning that as you record more video, instead of the mesh getting more and more accurate, it in fact loses accuracy until it becomes a completely undifferentiated mess.

These problems probably come from the "real time" nature of the application. The good news is that the program supports importing an SVO file. The bad news is that even when running off of a video file where real time would be unnecessary, the program still runs everything as if it were time critical. There are very few configurable options, but perhaps this is something that could be fixed.

The way forward

I will continue to examine the existing tools. VisualSFM especially seems like a valuable avenue of examination, especially because a lot of the tools that it relies on are open source. Perhaps changing out the VisualSFM gui for our own SFM toolchain could speed up the processing time. In addition, I'm going to do some research on the time/quality tradeoffs. The fact that StereoLabs has live mesh generation is very interesting as well, perhaps given some time I could discover a way to clean the output of their algorithm, or speed up VisualSFM using some of the techniques from StereoLabs (Note, that while VisualSFM is closed source, the tools it relies on are in fact open and can be changed as long as the commandline interface remains the same for VisualSFM to interact with them.

Another option, would be to implement our own system using OpenCV and the other resources out there. If we choose this path, I would want an abstraction library to remove the StereoLabs SDK requirements and allow other tools to fit in. This could be done immediately, but I think it might make more sense to make some prototypes using the SDK before implementing an abstractio;n library, as creating a library will always work best with a more complete understanding of what interfaces need to be exposed.