Assignment Title

CS 585 HW 3
Kaihong Wang
Teammate name: Kaihong Wang, Yuankai He
Date Oct 10 2018

Problem Definition

This goal of the homework is to identify at least 4 different hand gestures in a video. The algorithm should be able to recognize one hand gesture, two hands gestures and if hand is waving. The result is important as we can apply this algorithm to situations where only video input is available. Because our algorithm is dependent on skin color detection, we need to make the assumption that the background contains less or no segments of skin color. We also need to make sure the environment is properly supplied with natural light, as the color captured by the camera will deviate if different wavelength of light is projected on to the object.

Method and Implementation

We used skin color detection and template matching for this assignment. First we opened the camera and each frame of the video is analyzed as a single still image. After obtaining the original image, we processed it using skin color detection, eroded and dilated, and have it return a uint8 binary image with the skin as white and background as black. We then used the binary image to find where the hand is. For this step, we used find contours, then we picked the biggest contour, found its bounding rectangle, cropped the bounded area out and showed the bounding rectangle on the original image. After obtaining a cropped image of just the hand, we used image pyramid to resize the templates. Then we converted all the template images to binary using the same skin color detection algorithm. Finally, we compare the pixel values of the template and the cropped image and see how much of the areas match. The one with the highest match (highest NCC output) is printed out on to the terminal. For two hands, we came to the conclusion that the second hand should be similar in contour size as the biggest contour. Therefore, if a contour in the found contours is close to the biggest contour, then draw its rectangle and crop it out for recognition.


In this system, We used the front camera of our laptop to record the image data of each frame and perform template matching
(static and dynamic operation) on the image of each frame.
We used three different templates for each gesture and use largest NCC output to identify gesture so that we can enhance the robotness of model. Templates are shown below.

We use the naked eye to visually observe whether the system runs smoothly,
and use the confusion matrix to judge the system performance


Results of different gesture and motion detection are shown below:

Hand Shape NameResult
Seperate detection for gesture
Motion detection
Seperate detection for motion

A confusion matrix for gesture recognition is shown below:

Hand ShapePeaceFistThumbupThumbdown

A confusion matrix for motion recognition is shown below:

MotionWavingNot Waving
Not Waving08


Discuss your method and results:


In this experiment, we build a computer vision system which can identify some basic hand static or dynamic gestures

Credits and Bibliography

(1) Qitong Wang took charge of dynamic template matching work.
(2) Kaihong Wang took charge of image processing and computational optimization.
(3) Yuankai He took charge of static template matching work.
(4) Qitong, Kaihong and Yuankai often discuss this assignment together and exchange opinions on handling video images and template matching.