Problem Definition
Use template matching algorithm to detect different types of hand shapes. And that should be a real time detection.
Method and Implementation
At first, use skin detection method we used in lab 2 to find out skin part in raw image. Then use black pixel to fulfill all the other background pixels. Apply the same method to hand shape templates. Then we get images only with skin color, all the other pixels are black. Please pay attention here, the new images (include those templates) are still RGB images.
Secondly, resize hand shape templates. At here, we only detect hand shape at certain distance. So its enough for us to use certain size of templates only.
Then apply template matching algorithm to the raw image read from camera using background removed template. For each hand shape, we have a template for it. So we have to run template matching 4 times. Then we will get four result matrixs. Because we used Correlation Coefficient Algorithm here, so the values in result matrix is from -1 to 1. The larger the value is, the better the template matches to that sub image.
Next step is to find out which type the hand shape in raw image belongs to. Take highest value in each result matrix (All four result matrixs are not normalized). Then find out the largest value in these four values. If the largest value is greater than 0.5 (A threshold set up manually) then find out where is the max value point in result matrix, and find out the location of that point in raw image. Draw a rectangle which have that template's size on raw image. If largest value is less then 0.5, program will show that it cannot find a hand shape it knows in raw image.
Experiments
Run program and do different hand shapes in front of the camera.
For each test, last at least 1 seconds then change of keep that hand shape. Repeat test 00 times, record result. If any frame in that test is not recognized correctly, treat it as fault detection.
Build a confusion matrix for this. The size of that matrix is 5 * 5. The column is test's true hand shape, the row is the detected hand shape result. And there is a extra column and exrta row to record not hand shape exist result.
Detection rates are frames that this program can handle in one second.
Use true positive and false positive to describe accuracy for a hand shape. The true positive is the true positive number of test divide by sum of all number of that shape in test. False positive is the sum of other cells except well detected cell and divide by the all other hand shape's total test number. For hand shape i which is at the i row and col in matrix M, TP = M(i, i)/sumofCol(i), FP = \sum M(i, j)/\sum sumofCol(j) (j != i)
Running time is just the reciprocal of detection rates.
Special Requirement, instruction and explaination:
1. Hand shape in captured raw image should have same size as templates
2. The lighting of that room should be cold light. Warm light will affect detection result.
3. Face showed in camera may affect the result, especially when there is no hand shape in front of camera.
4. Using template matching function in opencv. Self implemented one is too slow to use.
5. Using correlation coefficient function to calculate template matching.
6. If that room is so dark that skin color cannot be detected, it will affect detection result.
Results
Truly Shape 1 | Truly Shape 2 | Truly Shape 3 | Truly Shape 4 | Truly Not Exist | |
---|---|---|---|---|---|
Detected Shape 1 | 12 | 2 | 1 | 1 | 5 |
Detected Shape 2 | 0 | 10 | 0 | 0 | 0 |
Detected Shape 3 | 2 | 1 | 13 | 2 | 2 |
Detected Shape 4 | 1 | 2 | 1 | 12 | 1 |
Detected Not Exist | 0 | 0 | 0 | 1 | 7 |
Shape 1 ROC coord = (0.80, 0.15)
Shape 2 ROC coord = (0.67, 0.00)
Shape 3 ROC coord = (0.87, 0.12)
Shape 4 ROC coord = (0.80, 0.08)
NotExistROC coord = (0.47, 0.02)
Frame per second = 2 (i7-6550U, 2.2GHz)
Estimate Running time 0.5s for a cycle
Results | ||
Trial | Real-time Image | |
---|---|---|
Shape 1 - Hand Closed | ||
Shape 2 - Hand Opened | ||
Shape 3 - Thumb Down | ||
Shape 4 - Thumb Up | ||
Report Terminal |
Discussion
Discuss your method and results:
This method is generally successful but still with some limitations. I want the program can find my hand with random distance to the camera, and can be more precise in location my hand, even in a bad condition (With warm lightness and lots of object with skin color as background). Also the speed of this method should be improved, current version is too slow.
- This method can still detect hand shape very well even when there is a face in the image.
- The background will not affect the result very much, because thie method will remove the background before matching.
- When there is actually no hand before camera, this method will not treat other things as hand shape easily.
- Will be affected by the light color and other object that have a similar color with skin
- Cost a long time in running (Because we use RGB image to run template matching, which is more complex than gray scale image).
- Use gray image rather than RGB image to run template matching. This will improve matching speed.
- Use better algorithm to build a mask for background. Just using skin color is not a good choise in building mask, which can be easily affacted by environment light. Use motion detection to get a boundary first then combine it with skin detection will be a good idea.
- May be CNN will be a good choice for this problem. It can be helpful in solving hand shape rotating and distance problem. Worth a try.
Advantages:
Disadvantages:
Potential future work:
Conclusions
This method is a generally success method. But failed at hand rotating detection and must have a certain distance to camera.
Combine with other techniques this can be better implemented.
Credits and Bibliography
Credit any joint work or discussions with your classmates.