CS585 HW2: Gesture-Detector

Yanru Chen
Yunchong Fu

Feb.11, 2020

Problem Definition

This homework requires us to implement and design an algorithm that could recognize at least 4 certain hand shapes or gestures from a live video read from a webcam. With the rapid growth of face recognition technology, this assignment enables us to look into this developing computer related field from a technical aspect. Also, by developing this project, we get to learn about how to use openCV library and utilize the functions as desired.

Method and Implementation

The 4 gestures our system could recognize are rock, paper, scissor and three. As required, we use skin-color detection to find the hand and binary image analysis to locate the palm. For the main method of recognizing the gestures, we began with implement template matching, but the result was not very ideal: if something similar to the templates are presented, then the program may be very likely to mistake it as a gesture, while it is actually not. Such limitation actually affects the accuracy of hand shape detections to a great extent. Therefore, we tried another way to do this program.

The basic idea of our method is very simple: we count the vertices of the fingers when doing gestures if the number fulfills some standard we set: 0 vertex is rock, 1 vertex for scissor, 2 or 3 vertices is three, more than 4 would be paper; then we could define it as some gestures. Also, we took the angles between the two fingers into consideration as well: we only count the vertices when the angle is greater than 30 degree and less than 70 degree. This method gives a relatively high accuracy result, thanks to all the useful methods embedded in openCV library. In the next section we will explain how we implement these functions to achieve the final result.

OpenCV Methods

erode, dilate
Use erosion and dilation to improve the quality of the grey-scaled image. Note that we applied twice dilation here to ensure that all the noises in the foreground are eliminated.

Gaussian Filtering is applied in order to get rid of gaussian noises. We chose gaussian but not other filtering methods because it minimizes the blurring.

The cv2 function used to find the contour of the object.

Use the countour found from findCountour to depict the countour of the object.

Find convex hulls of the object. (Sklansky's algorithm)

Find the convexity defects: a part that doesn't belong to the object, but depicts the "farthest boundary" of the object.

Calculate the moments up to the 3rd order of the rasterized image.

Write down the result that the program determine based on the given standard of the calculated number of convexity defects


With the above functions we could go over the steps we took in completing the program. The program begin with the skinDetect, which checks the color of the image to find the set of pixels that are within the limitation, and define the collection of the pixels as the "object". Then with the recognized object, we process it with erosion, dilation and some blur to make it smooth and filled. After that, we use the findContours and drawContours to locate and depict the countour of the object to help the computer process with the latter calculation. Once we get the contour, we can go on to calculate the angle of the fingers, locate and draw the defect points, and count for the "vertices" of finger angles. Lastly, using the counts of vertices, we could determine which ones are fists, which are papers, or scissors or threes.


To assess the performance of the classifier, we performed 30 tests and calculated the confusion matrix for each gesture. The tests are performed individually, with the same background and same camera. Accuracy, recall and precision are also as listed.


Trial Success Detection Failed Detection

Confusion Matrix

Rock(Actual) Scissor(Actual) Paper(Actual) three(Actual)
Rock(predicted) 28 2 0 1
Scissor(predicted) 1 23 0 3
Paper(predicted) 0 1 27 1
Three(predicted) 1 4 3 25

ROC Analysis

TP FP FN TN Accuracy Recall Precision
Rock(Best) 28 3 2 0 93.33% 93.33% 90.3%
Paper 27 2 3 0 90% 90% 93.1%
Three 25 8 5 0 83.33% 83.33% 75.76%
Scissor 23 4 7 0 76.67% 76.67% 85.18%


What are the strengths and weaknesses of your method?

The strengths of detecting fingers using calculating angles between start coordinate, end coordinate and far coordinate and setting range from 10-70 degree (used the cosine rule taught in class) and d > 50 is really time and space efficient. The weakness of this method is sometimes it will detect defects that is not between two fingers (ex: wrist and sometime even on fingertip).
For the skin detect method that we implemented is strong in easiness of implementation, but its result is not that accurate and running time for this algorithm is too long, which results in significant lag between frames.
For the noise cancelation, we first erode the frame (after skin-detection) for once and dilate the eroded version for 2 iterations to maximize the contour of hand. Finally, we blur the image using gaussian blur for color translation smooth. We specifically choose the gaussian blur because it keeps edge texture of the original image. This work well for noise cancellation.

Do your results show that your method is generally successful or are there limitations?
Describe what you expected to find in your experiments, and how that differed or was confirmed by your results.

From the accuracy and detection rate provided above, most of them are equal or above 80%, which indicates the methods used are successful. However, background and time efficiency are limitations of these methods since the program does not cancel background really well if there is too much noise (that are similar to skin color) and the skin color detection method takes too long to run. I expected the program to detect all gestured rock (horizonal and vertical), but the system fails to detect due to some limitation in the defect method.

How could your method be improved? What would you try (if you had more time) to overcome the failures/limitations of your work?

Our method can be improved by using a better, more efficient skin detection algorithm/implementation, and also a more advanced noise cancellation algorithm with better masks and filter. If we had more time we would use template matching and probably try to use motion detection and frame to frame differencing to detect moving objects.


This program is indeed still very limited: it cannot recognize other gestures with the same number of convex defects; but still, it has a fair accuracy as a student assignment. From our perspective, combining template matching may be a good approach to increace the variety of gestures the program could recognize. If given more time, we could try play with more different algorithms and methods.

Credits and Bibliography


1. Vezhnevets, Vladimir, Vassili Sazonov, and Alla Andreeva. "A survey on pixel-based skin color detection techniques." Proc. Graphicon. Vol. 3. 2003.
2. Kakumanu, Praveen, Sokratis Makrogiannis, and Nikolaos Bourbakis. "A survey of skin-color modeling and detection methods." Pattern recognition 40.3 (2007): 1106-1122.


OpenCV - Getting Started with Videos
OpenCV - Eroding and Dilating
OpenCV - Smoothing Images
OpenCV - Contours : Getting Started
OpenCV - Contours : More Functions
OpenCV - Drawing Functions

Gesture Recognization Program Based on OpenCV
Hand gesture recognition using python and opencv
Real-time Finger Detection