Raspberry Pi powered optical tracking of human motion, and a software solution for processing and mapping data to animated characters.
Motion capture is typically used to capture human performances that drive animated characters in films and video games. These capture systems are often incredibly costly (far out of reach of small independent studios or hobbyists). There are some very interesting cheap alternatives, but quality is a prominent issue. I'd like to explore a middle-ground solution that achieves decent production quality results at a relatively low cost.
How does it work?
This motion capture system triangulates the 3D position of reflective spherical markers that have been placed in strategic positions on the human actor. It is called an optical system because it uses cameras to 'see' the markers. A typical system will consist of 6 or more cameras. Once the 3D positions of the markers have been recorded they can be used to reconstruct the motion of the actor in a 3D digital space.
In previous logs I wrote about active markers which were powered by an LED and a coin cell. In this log we'll look at the options I explored when trying to build my own passive markers. Note that professional markers are readily available and can be purchased online, for a hefty price as with everything in the motion capture industry.
A marker consists of a core, a coating and a base. The core can be of various sizes and densities, it's useful to have a softer core that can be squished if an actor impacts a surface. The higher resolution the cameras, the smaller the markers can be, and really small markers can be used to track facial and finger movements. The coating is often a highly reflective tape that is hand applied to the cores. A very good choice for the tape is 3M 7610 due to its super high reflectivity and flexibility. The base of the marker allows it to be attached to the subject in various ways.
I thought it would be easy to find some sort of soft core, but it was really difficult to find something consistently sized and appropriate for tape adhesion. In the end I went with 3D printing as it was cheap, quick and controlling size is simple. I couldn't get my hands on 3M 7610 so I experimented with different tapes and settled on some that has good flexibility and reflectivity (First marker in the above photo). Below is a shot from the camera feed of the above 3 markers. Clearly the tapes have very different reflective properties.
The combination of a 15mm diameter 3D printed core and the reflective tape gives very promising results. In a future log I'll discuss marker detection.
In my setup I have a dedicated gigabit POE switch that connects 4 cameras and the host PC, of course the switch also supplies power to the cameras. The control software runs on a single PC that is connected to all the cameras on the network and handles tasks such as configuring cameras, managing recordings and triangulating markers.
I decided to use the Qt framework with C++ as the primary platform for writing the software. I always tend to favour Qt when it comes to interface heavy projects and there is a nice bonus that it has many useful libraries such as network communication, threading and image manipulation.
There are two modes in the software: Live and Take. Live mode lets you view the currently connected cameras and initiate recordings, while Take mode lets you go over and process previous recordings. There is a constant ping broadcast from the host PC that allows any cameras on the network to locate the host software and make a connection. You can see in the above screenshot I have two cameras hooked up and each is in video mode. Video mode allows a constant real-time video feed from the cameras so you can see exactly what the camera sees. However this is purely for setup convenience as when recording the cameras won't broadcast the full video frames, but rather just the marker positions.
The video feed coming from the cameras is almost completely latency free and when running at 100fps the result is ungodly smooth. As mentioned above you don't actually use the raw feed in recordings, but it was an interesting experiment to gather and broadcast the frames anyway. I used the FFMPEG library to handle the h264 encoded video that was being sent from the Pi.
The red blocks you see on the video feed is manually placed masking. There will inevitably be some areas in the video that are bright enough to be considered markers, so masking tells the camera to just ignore that area completely. Typically you will mask out the IR reflections on walls close to the camera and also IR LEDs from other cameras.
Take mode lets you view and process previous recordings. I have used OpenGL to drive the 3D viewport where hopefully finally triangulated marker positions will end up. The timeline allows you to scrub through all frames in a recording, as well as selecting a time range to perform various operations.
I decided to write the software running on the Pi (which I will refer to as the firmware) in Python. Primarily this was because the excellent picamera is a Python library. Picamera gives you advanced access to Pi camera functionality and makes it very quick and easy to get a camera project running.
Of course Python is not capable of processing data in an image at any reasonable speed, and on the Pi Python is painfully slow. Luckily it is very easy to write and integrate native C code into Python modules. In a further log entry I will discuss the marker processing which was written as native C integrated with Python.
As I've mentioned previously, this type of mocap system requires multiple cameras. You need at least 2 cameras to see a marker for 3D triangulation to be possible. When capturing a human there is a high chance markers will be obscured, requiring at least 6 cameras to maintain visibility. You could get away with fewer cameras if trying to capture something simpler, like a quad copter or other robotic vehicle. With that said, I have 4 cameras almost complete and I plan to do some testing before committing to another 6 (My ideal setup has 10 cameras).
The PCBs have arrived! I ended up ordering from PCBWay and received the boards within 5 days. I had no idea it was so quick and cheap to get some PCBs made, this really changed my whole mentality about making boards for future projects. I spent some time soldering on the components in stages and testing each stage as I went. The first relief was that the POE module was delivering 5V after plugging in the Ethernet cable. The Pi had no problems booting up and connected to the network just fine. I then hooked up the IR LEDs and no problems. Finally the RGB status LED magically worked too. Overall I was super happy that everything went as planned.
Besides the IR ring PCB hat there are some additional parts required to bring the build together. I had the camera/lens holder, spacers and mounting bracket 3D printed. There is also an extra long header attached to the Pi for connection with the hat. The hat could be 5mm lower to the Pi but I'm not space constrained and I'd rather have the extra space for potential heat sinking. The small black square is the IR pass-through filter which sits between the camera and the lens. It's a glass filter and has some beautiful characteristics but probably overkill for this use case. In the image below you can see the camera holder comes together nicely, all prepared for being attached to the hat.
I realised I needed some way to mount the cameras that allowed easy aiming. Originally I had planned to use a standard 1/4" 20 camera socket and purchase some cheap wall mounts. But in the end it was far cheaper, quicker and closer fit for purpose to design something simple and print it. The mount uses bolts to clamp each axis while still allowing movement, the hold is surprisingly strong. To mount it to a wall you could use double sided tape.
In the previous entry I stated that I had no idea how to hold all the components together, and my original plan was to use some wiring to connect the POE module and Ethernet jack to the Pi and IR ring. But as I got more confident with my PCB layout skills I decided to put everything onto a single board, and create a Pi Hat.
After playing with component orientations for hours I think the assembly comes together quite nicely.
The Ethernet jack connects to the POE module which feeds the main 5V supply to both the Pi and the IR ring. The Ethernet data lines come out at a connector that will require feeding back into the Ethernet jack on Pi3. As a fun extra the Pi3 can drive each channel of an RGB status LED. I've attached an exploded view so you can see how the components fit.
(I will write more about the IR pass-through filter in a future log.)
I found some IR LEDs from Digikey that would do the job: 475-1459-1-ND. They are high powered and have a 120 degree field of view, perfect for covering the entire area that can be seen by the camera. I whipped up a quick prototype using the LEDs to verify their effectiveness.
After some research I ended up picking Altiuim's Circuit Maker to layout the PCB. It was fairly easy to use, even for someone with zero PCB design experience, and there is a wealth of information online to help out. I must admit though, I'm a little surprised at how 'clunky' Circuit Maker felt, definitely a much older school take on user interface experience. Nevertheless the first draft of the IR board is done!
It is a very simple LED/resistor network with a connector on the back for the 5V power rail.
One feature I would love to have for the cameras is power over Ethernet. Only requiring a single cable for data and power is really appealing, especially because a typical setup will require 6+ cameras. After some digging around I found a fairly noob friendly POE module, the PEM1305. It requires very minimal support circuitry and this model can deliver up to 2.59A over 5V. I experimented with the physical arrangement of the components in Fusion 360, not quite sure how everything is going to be held together.
The LED active markers are a very viable solution for motion capture, but I'd like to explore the alternative of using reflective passive markers. These markers are small spherical objects covered in a retro-reflective material which means they don't need any batteries to operate, which is nice when you need ~40 per actor. The idea is to shine a light source on the markers so they become highly visible when compared to the background. For the retro-reflectivity to be effective the light source must be as close to the camera sensor as possible. It is popular for motion capture cameras to use infra-red illumination (IR) instead of visible light, which gives an advantage when it comes to cutting out unwanted light sources. It also simply creates a nicer environment to work in since you don't have all these spotlights shining at you.
This test rig uses a very powerful IR LED array. I don't know the exact specs (I got it from a friend who had it lying around) but it happily consumes 12 Watts continuously - hence the rather generous heat sink. I found some really cheap reflective tape and attached a few squares to the back of the room which is 4m away from the test rig.
You can definitely make out the reflective tape in the background. In another test I attached a couple squares to my hand and moved as fast as I could to check exposure issues.
Note that these recordings were made in the dark (no visible light) so the only light you see is infrared. There is definitely more light than needed coming from these LEDs, they light up a significant portion of the room. The reflective tape could also definitely be of higher quality.
As a side note, I was driving the LEDs with the PWM on the PI and using the camera syncing trick I wrote about in a previous log. The black band that you see in the video is due to my really budget synchronization algorithm, and represents time where the camera caught the LED while it was off. It's a very nice trick to reduce power consumption on the LEDs and the visible brightness to the camera is identical as if the LEDs were continuously held on.
So the question becomes, what sort of IR solution would be appropriate for the Pi? I picked up a https://www.pi-supply.com/product/bright-pi-bright-white-ir-camera-light-raspberry-pi/ as it was easy to get hold of locally. It's a self assembly kit, and was the first thing I've ever soldered. As a software guy primarily I don't get the opportunity to mess with building electronics very often. For this project the Bright Pi doesn't generate nearly enough illumination and suffers from unacceptably narrow beam LEDs. There are a few nice IR rings available that are intended to be used with security cameras, but this all got me thinking. Why not make my own IR ring circuit board? The Bright Pi is incredibly simple and even though I know very little about electronics I could follow what was happening on the board. I've always wanted to design my own pcb and this seems like a good opportunity to learn.
Since the capture system will support relatively small rooms (~3m X ~3m), it is important that the cameras have a large field of view. The default Pi camera has a 62 degree horizontal viewing area, but something closer to 90 degrees would be more reasonable here. Another thing to keep in mind is that only a portion of the sensor is being used in the 1280x720 mode, so the lens will need to compensate for that.
The mobile adaptors act differently to the M12 lenses in that you don't remove the original Pi camera lens. So you end up running a needlessly complex system that will be worse off when it comes to distortion and optical artefacts. However the results were promising anyway.
It was really fun testing out the lenses, and it gives me ideas for other projects. The macro lens is especially interesting.
Default Pi camera:
Wide angle lens:
Fish eye lens:
The fish eye lens gives a significant view angle boost to the Pi Camera, and even though the lens is 180 degrees the camera only sees a portion of that. The final viewing angle is roughly 90 degrees horizontally.
Although I would prefer to have a smaller, more compact M12 mount lens, the fish eye lens worked really well. It's also incredibly cost effective compared to the alternatives.
Due to this technique requiring multiple (at least two) cameras to see the same marker before triangulation can occur, the more in sync the cameras are the more accurate the triangulation can be. If the system runs at 100fps then the worst case frames can be out from each other is by 5ms. After doing a few tests by waving the active markers around, I think frame discrepancies of 1ms or less will provide acceptable results.
Commercial motion capture systems have various methods for syncing cameras, such as over Ethernet or by using dedicated cables. Many camera sensor chips have a sync line which is used to indicate the exact moment a frame should be taken. The Pi camera v2 uses the Sony IMX219 and there is indeed a sync pin on the module. Unfortunately it isn't exposed to the Pi in any way.
There is another option, and its a very interesting feature exposed to the Pi camera drivers - minor frame rate adjustment. You can modify the current frame rate by tiny amounts causing a slow down or speed up of the overall camera timing. This means that you can adjust the frame rate until you are taking frames in sync with a master clock. If all the cameras adhere to the same master clock, then they will all be taking frames at the same time. The drawback to this method is that it will never be as precise as a direct sync pin, and it requires a small warm up period to achieve initial sync. However this is plenty to get within the goal of 1ms.
My test setup involves filming an LED that is being controlled by a PWM on the Pi. The LED is switching on 80 times a second and each time for 3ms. The camera is filming at 80fps. The PWM is acting as the master clock, so the camera is trying to sync each frame to the PWM cycle.
So what the heck is going on in this video? First let's cover the flickering in the background, it's due to the lighting of the room I filmed in and has no bearing on the actual test. Second, the Pi camera uses a rolling shutter which means that each row of pixels in the image isn't taken at exactly the same time. Starting from the top, each row is sent down to the Pi one line at a time. It seems to take roughly 6ms to record an entire frame. This is not really desirable behaviour, but I'm willing to work with it. The camera starts capturing the frame while the LED is off. About 3ms later the LED switches on and the camera starts picking up the lit LED. 3ms later the LED switches off again, and this is also about the same time the camera finishes the frame. The whole system is idle for the remaining 6ms of the 80fps cycle. This is what you're seeing in the video. The thing to note is that the LED timing and the frame timing are almost exactly in sync (There is drift if you compare the beginning and end of the video, but that is just due to my poor PID controller ;).
This test proves that by using the frame rate control on the Pi camera you can steadily hold the frame timing in sync with some form of master clock. I'll discuss the methods of syncing a master clock between all the cameras in a future entry.