Final Project
Hand Posture Recognition against Complex Backgrounds

Qian Xiang

Tong Li


11/5/2018 - 12/7/2018




Visual Studio 2017

OpenCV 3.4.1

Problem Definition

1.Recognize different hand shapes for human-computer interaction;
2.Detect hands against complex backgrounds, such as that the hand is facing the camera at some angle or that the hand is crossing other part of the skin;
3.Detect hands and recognize hand postures in video.


In the previous programming section, we’ve designed and implemented an algorithm to classify hand shapes using template matching. Although the result turns out to be pretty good, it has quite strict limitations, such as that the background should be simple and clean enough (since we use skin detection to detect hands, if there are other parts of human body appearing like face or neck, it will also be detected), and the lighting condition should be good. So, we want to improve the project into a higher level by recognizing the hand against more complex background. Also, find a more efficient way to detect more hand shapes rather than just four. Meanwhile, we are going to detect hands in video, since it could lay the foundation for developing an automated sign language translator.

Background Research

People have studied hand tracking and posture recognition for a long time. As what we saw in papers, traditional way for hand tracking is usually image differencing segmentation with filters while for hand posture recognition is graph and template matching. Though they’ve laid a solid foundation for this topic, their results are not accurate or precise enough to satisfy our current needs. Nowadays, researchers use convolutional neural network to first train different postures or gestures and get feature construction automatically. However, using this method, they still have to do segmentation and remove noise from background in each frame, which may decline the accuracy.

Our method

1. Use pre-trained neural network model credited to Satya Mallick based on the paper “Hand Keypoint Detection in Single Images using Multiview Bootstrapping” from CMU Perceptual Computing Lab which can help us solve the complex background problem and also give us 21 hand key points for us to compute and define postures. 2. Detect straight finger by comparing the distance from key points on fingers to the point in the middle of the hand and using least square method to compute the fit coefficient to see whether some key points can form a line. 3. Record the number of straight fingers and which finger are straight. 4. Palm in – palm out: compare the key points on thumb and little finger. 5. Same hand shapes in different angles having different meanings: compute the slop of a line formed by some key points and give threshold to separate them.


Images or video clips (actually the frames of video)

video output



Input image-posture number 1

Output image-posture number 2

Input image-posture number 3

Output image-posture number 4

Input image-posture number 5

Output image-posture number 6

Input image-posture number 7

Output image-posture number 8

Input image-posture number 9

Output image-posture number 10

Input image-posture letter A

Output image-posture letter B

Input image-posture letter C

Output image-posture letter D

Input image-posture letter E

Output image-posture letter F

Input image-posture letter H

Output image-posture letter I

Input image-posture letter K

Output image-posture letter L

Input image-posture letter O

Output image-posture letter Q

Input image-posture letter S

Output image-posture letter V

Input image-posture letter X

Output image-posture letter Y

One of our failure recognizing letter R


1. Strengths:

Separate hand against complex background;
Distinguish palm in/palm out hand;
Some postures with same shape but hand angles having different meanings;
As long as we can detect key points accurately, the accuracy can reach 90%.

2. Limitation:

When the finger is blocked by some other fingers, the key points cannot be detected that accurate and accuracy may fall to 40`50%.
We want to do something related to hand gesture but we didn't find a good way to define when a new gesture starts or ends in a video.
Since we run the neural network on our laptop, the speed is quite slow.

3. Potential future work:

Run our code on GPU; Use 3D model template to solve the blocking problem; Combine our method with other technology in order to combine hands and other parts of the body to detect more sign language posture and gesture.


We successfully define 27 hand postures in our program and as long as the key points are detected accurately, we can have very good accuracy about 90%.

Credits and Bibliography

1. Hand key point detection:, 11/9/2018
2. Least square method:, 11/18/2018
3. Some functions of OpenCV:, 11/15/2018