Soln #4: Object Localization Methods for Gesture Recognition

In order to do gesture recognition, first, we need to identify or localize the object used to signal the gesture. 3 different methods are implemented and compared as below.

i) Object Detection

I have used hardware optimized YOLO to detect, say a cell phone, and easily get 5-6 FPS on 4GB Raspberry Pi 4B with Movidius NCS 2. We trained YOLO to detect a custom object such as a hand. But the solution is not ideal as NCS stick will hike up the product price (vanilla YOLO gives hardly 1 FPS on 4GB RPi 4B).

ii) Multi-scale Template Matching

Template Matching is a 2D-convolution-based method for searching and finding the location of a template image, in a larger image. We can make template matching translation-invariant and scale-invariant as well.

Generate binary mask of template image, using adaptive thresholding.
Grab the frame from the cam and generate its binary mask
Generate linearly spaced points between 0 and 1 in a list, x.
Iteratively resize the input frame by a factor of elements in x.
Do match template using cv2.matchTemplate.
Find the location of the matched area and draw a rectangle.

Hand Detection using Multi-Scale template matching

But to detect gesture, which is a sequence of movements of an object, we need stable detection across all frames. Experiments proved hand template multi-scale matching is not so consistent to detect an object in every frame. Moreover, template matching is not ideal if you are trying to match rotated objects or objects that exhibit non-affine transformations.

iii) Object Color Masking using Computer Vision

It is very compute-efficient to create a mask for a particular color to identify the object based on its color. We can then check the size and shape of the contour to confirm the find. It would be prudent to use an object with a distinct color to avoid false positives.

This method is not only highly efficient and accurate but it also paves the way to do gesture recognition using pure mathematical models, making it an ideal solution on the Edge. Hence, this method is chosen for object localization.

Soln #3: Anti-Spoofing Techniques for Attendance Registration

Soln #3: Mathematical Gesture Recognition Techniques

Discussions

Become a Hackaday.io Member