Hand Gestures Recognition

CS 585 - HW 3
Thien Nguyen

Problem Definition

- Design and implement an algorithm that recognizes hand shapes (fist, thumb up) or gestures (hand movements, waving, drawing something, etc.)
- The program must take as inputs live video frames from a webcam and output informative graphical display according to the shapes/gestures detected within the video frames.
- The algorithm must detects at least 4 different shapes/gestures and must also be able to detect up to 2 shapes/gestures at the same time.

Method and Implementation

The program is mainly based around the template matching function provided in the OpenCV library.


- 8 binary template images.

- A lower and upper threshold value for skin color.

- A threshold value to accept or reject a matching result. (0.8 was used)


1. Each frame rate is processed into a binary frame based on the thresholds for skin color.

2. A sequence of noise reduction techniques is applied in attempt to remove as most noise as possible from the binary frame. This includes erosion, dialation, and Gausian blur.

3. Using resize() and matchTemplate() from the OpenCV library, each template image is resized and matched with the input frame 8 times (resizing from half the size to double the size of the template). If the result is above the set threshold for matching value, the program "recognizes" the shape, else it continues to the next template.

4. After acquiring the coordinates of the best matches. The program draws a square around where it found the template and some texts above it according to the name of the template.

5. The frame is then returned and streams continuously in real-time.

Function descriptions

clean (img) - Takes a frame, binarizes it, and removes noise

find_template (img, template) - Takes a frame and a template, checks if anything in the frame is recognized as the template (return the "box" coordinates if anything exceeds the set threshold for matching result)

size_invariant (img, template) - Takes a frame and a template, resizes the template 8 times, check each resized template with the frame using find_template()

process (img) - Takes a frame, check each of the 8 templates using size_invariant . Draws boxes and texts accordingly.


Testing video


Example outputs


Confusion Matrix

Confusion Matrix


The biggest downside of this program is that it depends very heavily on the lighting condition and would work for only a very specific range of skin colors. Another big downside is that it matches the templates with skin, not with the hand. Thus, the program works well as long as the hand shapes/gestures display is prominent in the frame, otherwise, it would likely match the templates with something else and the results would not be likable.

Credits and Bibliography

Skin Detection using Python and OpenCV by Adrian Rosebrock (accessed November 9, 2018)
URL: https://www.pyimagesearch.com/2014/08/18/skin-detection-step-step-example-using-python-opencv/

OpenCV template matching documentation (accessed November 9, 2018)
URL: https://docs.opencv.org/2.4/doc/tutorials/imgproc/histograms/template_matching/template_matching.html

OpenCV - Image Geometric Transformation (accessed November 9, 2018)
URL: https://docs.opencv.org/3.4.3/da/d6e/tutorial_py_geometric_transformations.html