In this assignment, we need to build a system that takes in a video stream and recognizes hand shape/hand gestures in the video. This is useful if we want to use body movement to control a machine. A difficulty we can run into with this problem is with skin color detection: How do we define skin color, and how do we make our algorithm robust against different lighting setting? We also need to take into account hand distance from the camera since this can affect our recognition.
Method and Implementation
For this assignment, I mostly utilize skin color detection, motion detection and template matching to recognize hand gestures.
For stationary shape, I use skin color detection to get a binary image of the hand (and potentially the face as well). Then I use templates of the shape I want to recognize to find the shape in the image, if any. I use normalized correlation, and threshold based on a certain value (usually 0.65 or 0.7).
For moving gestures, I use frame-to-frame differencing to get the movement of the hand, and then use template matching to recognize the movement.
I have also attempted to use pyramid image sampling to account for varying distance of the hand from the camera, but this makes the video processing too laggy, and so I have removed it.
I use mySkinDetect(src, dst) to detect the skins, and myFrameDifferencing(prev, curr, dst) to get the movement of the hand. I then use opencv functions matchTemplate() and minMaxLoc() to get the maximum matching location and value of the image, and use rectangle() and putText() to show appropriate graphic feedback to users.
I try presenting the handshapes and gestures to the system from multiple locations, orientations and distance.
In ideal circumstances, my system can recognize appropriate hand shapes and gestures:
We can see that the code works even when we change locations of the hand (for thumbs up).
However, in different lighting skin color detection won't work well and we can't recognize the hand.
Also, sometimes we end up with false detection.
I have also conducted trials and here's the resulting confusion matrix:
|Hypothesized class||Handshape||Open palm||Victory Sign||Thumbs up||Waving/Flapping||None|
Discuss your method and results:
- The advantage of this method is simplicity in implementation and it works generally well with common hand shapes and gestures
- However, this method is dependent on various factors (lighting, etc) to work well. It is thus not too robust
- For future work, the first thing I'd improve on is the skin color detection. I want a more reliable way to detect skin colors. Also, I want to support the code so that it is invariant to distance. As it is, my method is still distance dependent. Finally, I would want to try other methods other than template matching, to see how well different methods perform in this task.
This assignment has been a good chance for me to try different basic computer vision techniques. They generally work pretty well; however I would want to try to make my algorithm more robust against some other factors. I regret that I cannot have more time to do more research on different other computer vision techniques for this task; as it is, I can only try to implement techniques readily available to me. Still, I think it works well for my first try.