Final Project Proposal
Hand Posture Recognition against Complex Backgrounds

Qian Xiang

Tong Li


11/5/2018 - 11/19/2018




Visual Studio 2017

OpenCV 3.4.1

Problem Definition

Implement an efficient method to recognize different hand shapes for human-computer interaction and also track hands against complex backgrounds in the meantime. This could lay a good foundation for developing an automated sign language translator.


People have studied hand tracking and posture recognition for a long time. Long before 2005, researchers have implemented Finger Counter Interface using classifier and Posture Classification using elastic graph matching for hand posture recognition. For the hand tracking part, implemented Visual Tacking of Bare Fingers using Image Differencing Segmentation (IDS) and Fast Rejection Filters (FRF) and Automatic 2D Hand Tracking using a temporal filtering framework. Though these works were meaningful at that time, their results are not accurate or precise enough to satisfy our current needs.

As the paper from 2008 shows, researchers have developed a powerful hand and finger tracker: Dynamic Hidden-State Shape Models (DHSSMs). The system can track and recognize the non-rigid motion of human hands. Their method is using a recursive Bayesian filtering method, called DP-Tracking, and combining an exhaustive local search for a match between image features and model states. Though the main idea is similar to our project, we use a neural network (convolutional pose machines) and 3D hand motion capture as mentioned in the paper “Hand Keypoint Detection in Single Images using Multiview Bootstrapping”, which is totally different from the previous method. Unlike DHSSMs, where every possible structure change is previously described, we are able to recognize unpredicted hand postures in real-time video.


Hand Keypoint Detection can solve the complex background problem and using key points for hand posture detection can adapt to many different conditions, such as that the hand is not facing the camera but is deflecting at an angle. Also, by judging how open the hand is we can distinguish the number 5 from the stop signal, etc. As long as we have the key points of the hand, we can do some computation of these key points to define what the posture it is. What's more, since we have key points for every frame, we can compute the movement of the key points to define not just postures but also gestures, such as waving hands and punching with a fist.


We are now testing the key point detection on video and testing hand posture detection on images. We will use hand posture detection on the video later.

video input and output


Output key points


Input image-posture number 8

Output image-posture number 8