Our objective for this assignment is to create a program that detects user's hand shapes or gestures, and then create a graphical display that responds to the recognition of the hand shapes or gestures.
Method and Implementation
In order to recognize user's hand shapes/gestures, there are many ways to approach this problem and perform optimization.
- Obviously, the most straightforward way of achieving hand shape detection is template matching. It is easy to implement and produces reasonably good results.
- On the top of that, the skin color deteection will be a great help when optimizing the template matching method.
- Motion energy template matching will enable the detection of a moving gesture instead of a static hand shape.
- Using the pyramid method, the hand's size on the screen will not be a factor in template matching anymore, thus optimizing the template matching method by a great deal.
To implement template matching, I posed with different hand shapes and took screenshot of these hand shapes as my templates. After reading these templates into my program, I will store every frame of the image from the camera as a "Mat" variable using the OpenCV library. Every frame of the image will be used to compare with every template I have, and the template that is the closest to a portion of the current camera image will be selected as the result.
For the result, the metric of success will be the confusion matrix. The confusion matrix is obtained through performing many trials and check whether my current hand pose is interpreted correctly by the program. The higher the accuracy is, the more successful the program is.
For this confusion matrix, I have conducted over 100 trials and recorded the results from the program. The overall accuracy for the program is 75/105 = 71.42%, which is reasonably successful.
From the spreadsheet, we can see that the "finger man" pose is almost perfectly accurate, which is due to the fact that this pose is the most distinct from the other ones.
Also, note that the "Gotcha" and "Fist Hold" poses have high chances of being mistaken as one another. This is because the two poses look very similar to the computer.
I think this assignment is an overall success, and there is more room for optimization if more time is given.