Your team will design and implement algorithms that recognize hand shapes or gestures, and create a graphical display that responds to the recognition of the hand shapes or gestures.

Qian Xiang

Tong Li


10/04/2018 - 10/09/2018




Visual Studio 2017

OpenCV 3.2

Problem Definition

Design and implement algorithms that recognize hand shapes (such as making a fist, thumbs up, thumbs down, pointing with an index finger etc.) or gestures (such as waving with one or both hands, swinging, drawing something in the air etc).
Create a graphical display that responds to the recognition of the hand shapes or gestures.
Your algorithm should detect at least four different hand shapes or gestures.
It should also work on cases when two hands are both in view and recognize each one separately.
Template matching is required for this assignment.
1. Pre-process frames in order to separate hands and the background and detect skin;
2. How to use template;
3. Pre-process the templates;
4. Match image(frame) and template;

Method and Implementation

1. Pre-process image (frame) and detect skin

MedianBlur the image (the frame of the video) first. Then convert image to HSV. Next, the HSV space was quantized to obtain the binary image, and the bright part was the shape of the hand. To get clearer contour of hand, use morphological operation to remove noise and make the boundary of the hand more clear.

2. Pre-process the templates

Convert templates to gray image. Give threshold to them and use findContours to find the hand contours of the template.

3. Match image (frame) and template

Match the contours of the templates and the contours of the frame using matchShapes(). The lower the result, the better match it is. Output the match one. If there is no contours in the image, output none.

4. Interesting part done after the assignment

Detect gesture --- wave hands
Since waving is a gesture that is moving, there is no template to match with it. So the idea is to use the position of the hands in and out to detect hands in and out. Record the position when there is a contour in the frame, and record the latter position of hand until there is no object in the frame. If the position changed, then there is a wave in the frames.


1. Use the camera of the laptop to get video.
2. Output results:
There are three windows to show result. The first one is used to show the original frame and draw the contours of the hands. The rest of the windows are used to show result. If there is a match, then show the hand shapes, else show none.
3. The precision:
The precision is pretty good. We calculated the situation in the video in two and a half minutes, having 40 times of gestures changes with one hand and two hands, and the accuracy rate was 100%.
4. Wave hand:
For this part, the output will tell there is a wave in the frame and also the direction of the wave, from left to right/up to down.



Images to show match result

Result: the screenshot1

Result: the screenshot2

Result: the screenshot3

Result: the screenshot4

Result: the screenshot5

Result: the screenshot6

Result: the screenshot7

Result: the screenshot8

Result: the statistical data1

Result: the statistical data2

Result: wave hand1

Result: wave hand2

Result: wave hand3


1. Strengths:

The program runs smoothly and responds quickly to changes in hand shape.

2. Weaknesses:

Every hand shape has only one template, so if the hand shape in the frame changes greatly, our program can not tell.
Since we use skin test first to separate skin and background, if there is not hand in the frames but face or arm, the program will also detect the object, get its contours and match it with other four templates.

3. Limitation:

During the experiment, we use the camera on laptop to get video. It is called Lenovo easy camera and it is a photosensitive camera which means when the light is good (under good lighting conditions), the contours of hands will be clearer.

4. Expectation:

Use some other method to detect hands only.
Try another camera and see if the lighting condition is not good, there will be a good result (precision).

5. Potential future work:

Use more templates to match.


When get this assignment, we tried to use templateMatch() function to match the template and the frame, but the result is not good. If the size of template is bigger than the frame or just a little bit smaller, the result is very bad. Many examples show that when the template is a part of a screenshot of the frame, the result is perfect.
After this failure, we tried to use contours to match and we found the function matchShapes(). At first, we want to separate the background using difference between the first frame and the latter frame, but the result is not so good, the contours is always not clear enough, and the speed is really slow.
Then we convert frame to HSV version and give threshold to detect skin which has great precision (Of course before the match process, we erode and dilate the frame to delete some noise in the frame).
In good condition, the precision of the program can be 100%. Really really happy about the result in the end!

Credits and Bibliography

1. Help offered by TA (The idea of how to get two hands using area threshold), 10/09/2018
2. OpenCV documents to find some methods and functions: https://docs.opencv.org/3.2.0/annotated.html, 10/04/2018-10/09/2018
3. Some functions of OpenCV to compute contours: https://blog.csdn.net/wangshuai610/article/details/79913600, 10/06/2018
4. Matchshapes(): https://docs.opencv.org/2.4/modules/imgproc/doc/structural_analysis_and_shape_descriptors.html, 10/06/2018
5. An example of matchshapes(): https://blog.csdn.net/luoyouren/article/details/65633170, 10/06/2018