This part of the homework involves programmatically designing and implementing algorithms that recognize hand shapes or gestures, and creating a graphical display that responds to the recognition of the hand shapes or gestures.
The algorithm should detect at least four different hand shapes or gestures. It must use skin-color detection and binary image analysis (e.g. centroids, orientation, etc.) to distinguish hand shapes or gestures.
Method and Implementation
Here, I implemented the program in python using the helper skeleton code given to us in C++ from lab.
I analyzed a live video feed from the webcam to solve the problem mentioned above.
Some notable techniques used:
1) template matching (created templates of three static hand positions, and one dynamic gesture)
2) background differencing: D(x,y,t) = |I(x,y,t)-I(x,y,0)|
3) frame-to-frame differencing: D’(x,y,t) = |I(x,y,t)-I(x,y,t-1)|
4) motion energy templates (union of binary difference images over a window of time)
5) skin-color detection (thresholding pixel values)
6) horizontal and vertical projections to find bounding boxes of ”movement blobs” or ”skin-color blobs”
7) tracking the position and orientation of moving objects
The program handles 3 static hand signs:
1) A closed fist
2) An open palm
3) A peace sign (two fingers held up)
It also handles one dynamic hand gesture: hand wave
The program is executed by running "python hw2_gesture.py static" to obtain results for the three static hand signs, and "python hw2_gesture.py dynamic" for the hand gesture.
static hand signs:
First off, pre-captured templates are loaded in and converted to black and white, and resized appropriately. The video camera is started, and the program reads in the current frame and overlays a rectangle in which the hand sign is recognized.
Then, I use background subtraction (get the average frame sequence for 30 frames) to prepare the area for hand detection.
A contour, or bounding box, is drawn on the hand which follows the shape.
Opencv's matchTemplate function is used with every template to generate scores based on how accurate the match is of the user's hand sign and the 3 templates. These scores are stored in an array and the max of these is the final result which is displayed in the same live video feed, if the value is above a certain threshold.
dynamic hand gesture:
The video feed is read and resized the template size.
The frame differencing function is used on the previous and current frames detect dynamic motion and ignore the rest.
Then, skin color detection attempts to match the current frame to our template and is thresholded.
After this, a contour is drawn that bounds areas in the fram that match the skin color. I then find the motion history to accumlate the frame differnces for a certain number of pairs of frame. A feed is displayed that depicts the motion history, and another feed depicts the bounding box drawn to detect object movement that matches the skin color.
Opencv's matchTemplate function is used with the template to generate a score based on how accurate the match is of the user's hand wave and the template hand wave, generated separately (done using similar techniques in the lab). If the value is above a certain threshold, we can inform the user that it is a hand wave.
Here are the template images used:
3. Fist Sign:
4. Hand Wave:
Here are some results (please look at the top left of the image to see what the program idicates the hand sign is):
But sometimes, a wrong result occurs due to some errors in contour detection and threshold values:
My program did a fairly well job of detecting hand signs and gestures, usually provided there's enough contrast in the images (a dark background works best)
This is powerful in image analysis to be able to detect hand positions and general motion!
Credits and Bibliography