Hand Shape and Gesture Recognition

CS 585 HW 2
Nam Pham

Problem Definition

In this assignment, we need to build a system that takes in a video stream and recognizes hand shape/hand gestures in the video. This is useful if we want to use body movement to control a machine. A difficulty we can run into with this problem is with skin color detection: How do we define skin color, and how do we make our algorithm robust against different lighting setting? We also need to take into account hand distance from the camera since this can affect our recognition.

Method and Implementation

For this assignment, I mostly utilize skin color detection, motion detection and template matching to recognize hand gestures.
For stationary shape, I use skin color detection to get a binary image of the hand (and potentially the face as well). Then I use templates of the shape I want to recognize to find the shape in the image, if any. I use normalized correlation, and threshold based on a certain value (usually 0.65 or 0.7).
For moving gestures, I use frame-to-frame differencing to get the movement of the hand, and then use template matching to recognize the movement.
I have also attempted to use pyramid image sampling to account for varying distance of the hand from the camera, but this makes the video processing too laggy, and so I have removed it.

I use mySkinDetect(src, dst) to detect the skins, and myFrameDifferencing(prev, curr, dst) to get the movement of the hand. I then use opencv functions matchTemplate() and minMaxLoc() to get the maximum matching location and value of the image, and use rectangle() and putText() to show appropriate graphic feedback to users.


I try presenting the handshapes and gestures to the system from multiple locations, orientations and distance.


In ideal circumstances, my system can recognize appropriate hand shapes and gestures:

We can see that the code works even when we change locations of the hand (for thumbs up).

However, in different lighting skin color detection won't work well and we can't recognize the hand.

Also, sometimes we end up with false detection.

I have also conducted trials and here's the resulting confusion matrix:

True class
Hypothesized class Handshape Open palm Victory Sign Thumbs up Waving/Flapping None
Open Palm 15 2 0 5 0
Victory Sign 3 13 0 0 3
Thumbs up 0 0 13 0 0
Waving/Flapping 0 0 0 15 0
None 6 9 7 3 //


Discuss your method and results:


This assignment has been a good chance for me to try different basic computer vision techniques. They generally work pretty well; however I would want to try to make my algorithm more robust against some other factors. I regret that I cannot have more time to do more research on different other computer vision techniques for this task; as it is, I can only try to implement techniques readily available to me. Still, I think it works well for my first try.

Credits and Bibliography