The goal of this assignment was to build a model using image analysis algorithms/methods learned in class to recognize hand shapes or gestures, and display dynamic changes upon recognizing the hand shapes or gestures. We'd anticipated troubles from filtering out just the hand, and were ultimately just able to detect 2-5 fingers up, and not gestures.
Method and Implementation
In creating the recognizer described above, we used segmentation/skin detection, contour extraction, hull and defect extraction, and the cosine theorem. For the skin detection, we converted the image into the HSV colorspace, and segmented out the skin colored areas of the frame. After bluring and thresholding the image, we extract the contours using the cv2.RETR_EXTERNAL specifically to ensure only external image contours are detected. Assuming that the hand would have the largest contour, we then get the bounding box, convex hull, and defects of the largest contour (hand). Lastly, with this contour and hull/defect data, we use the cosine theorem to determine the number of fingers present (assuming they're separated and not held together) by analyzing the angle between them. The prediction for the number of fingers is then displayed in the top left corner.
During data collection, we collected 10 samples for each of the working shapes (2-5 fingers up), for a total of 40 samples.
Overall, our recognizer achieved an accuracy of 77.5%, staying above 70% for each of the four finger possibilities. Here is a video demo of the GUI in action.
|Confusion Matrix||True Class|
In general, I believe our methodology was on the right track, but could have used a few more clever augmentations to increase the accuracy. The pipeline of segmenting out the hand, finding its contours, hull, and defects were correct, and allowed the model to perform up to expectation (in that it was generally good at determining the number of fingers held up). However, we could have implemented more rigorous background differencing, or template matching of a hand with different numbers of fingers held up in order to improve the performance of our recognizer.
As for next steps, to improve the hand shape recognizer we would implement the algorithms mentioned above. As for gesture detection, we could utilize motion energy along with the contours of the hand to tell whether the hand was waving or motioning to come over for example. With more time, we'd hope to build the robustness of our model in recognizing both shapes and gestures alike.
Using the Computer Vision techniques we've learned in just the first couple weeks of class, we were able to build a decently accurate finger counting/detection model. While it is far from perfect, we are aware of how we can go about improving it's recognizing capabilities, and could do so given more time. Lastly, learning how to do this is just a few weeks, we are both extremely excited for the rest of the semester to come.
Credits and Bibliography
Hand Gesture Recognition, Date of Access: Feb 15, 2021
CS585 Spring 2021 Piazza