Demo Video

Everyone wants to see the demo video! This video was rushed, so there's not much about the actual construction, just a demo of the functionality.

Software

I used python, openCV, and a small app from OBS Studio to allow for the camera to track faces and send the video over to a video call

Python spies on the video from the webcam. To make sure the video can get through to zoom, WebEx, ect, I had to create a virtual webcam using OBS studio. It took a couple tries but after using the installer many times, I was able to create a "virtual" webcam that my python code could send images to. Just select the virtual camera source in Zoom and it works.

To track the faces in the video, I used OpenCV, which has built-in face tracking. I also implemented tracking my headphones (which have bright blue bars on them, perfect for tracking) and aruco markers (similar to QR codes).

Tracking Faces

The face tracking algorithm is in my python main program, here on Github.

To track my face reliably, I had to implement multiple steps of tracking:

  1. Front-face tracking 
    1. Python variable primary_cascade
  2. If 1 doesn't find a face, check with a different front-face tracking algorithm 
    1. Python variable secondary_cascade
  3. If 2 fails, try looking for a face in profile (from the side)
    1. Python variable tertiary_cascade

Now:

  1. If the face is in the deadband (center rectangle of the screen) do nothing.
    1. This prevents the camera from moving every time my head moves an inch
  2. If not in the deadband, move towards it
    1. I do a lot of filtering here. I implemented a crude PID system with "velocity" averaging
  3. If no face was found, but one was found recently, keep moving in the direction the camera was already moving
    1. The problem: Whenever I walk too quickly out of frame, it usually caught me as I started moving but then it lost me. This makes it a tricky control loop problem to solve - if your controller is too far off of the real value, no more measurements can be taken.
    2. The solution: I implemented a "momentum" value so that the camera proceeds in the direction the last face was seen. This momentum lasts for 1.5 seconds after a face was seen.

Sometimes, no face is seen for a while, and the camera is staring at the ceiling (very unhelpful). I implemented a timeout so that if after 7 seconds a face has not been seen, the camera turns back to center (it's power up default). This sometimes helps it find my face again, if I'm somewhere near the middle of the room.

Hardware

This project is pretty straightforward with hardware:

  • 2 Servos (MG996, $5 each, similar to these on Amazon)
  • Arduino Nano
  • Logitech 720p webcam
  • 3D printed gimbal parts

The hardware wasn't supposed to be the interesting part. The gimbal parts were just the first versions, I didn't iterate and perfect them - I'm sure there are improvements to be made to these.