Image Processing

CS 585 HW 2
Shuzhe Luan
Long Guo
02/ 12/ 2020

Problem Definition

Give a concise description of current problem. What needs to be solved? Why is the result useful? Do you make any assumptions? What are the anticipated difficulties?

Method and Implementation

The image was captured from the default camera. Firstly, a bilateral filter was applied to reduce noise yet preserving the characteristic of the image. To detect skin from the video stream, we transformed the image into YCbCr color spaces. Y is the luma component and Cb and Cr are the blue-difference and red-difference chroma components. Then we apply Gaussian Blur to the Cr components, and use Otsu thresholding to transform it into binary image. After transforming into binary image, we use findcontour to find the convex hull of the “white” dots, and determine the boundary of the gesture, then cropped and scaled it to a predetermined size.

We tried to use some sort of background subtractor to improve the performance on a more general environment. We used createBackgroundSubtractorMOG2, a gaussian mixture-based background/foreground segmentation algorithm that used history frames to compute a foreground mask. However, during the testing we’ve found that it adds lots of noise on clean background, and didn’t perform very good on general environment, which makes skin detection even harder.
Gesture Recognizing After cropping the gesture from capture, we compared it with prestored gestures (thumbs up, thumbs down, one, victory, five) by using matchTemplate to calculate correlation coefficient, and return the one with the highest confidence. This function has several method to calculate correlation, and we’ve found that the TM_CCORR_NORMED matching mode gives the best correlation results.


Describe your experiments, including the number of tests that you performed, and the relevant parameter values.

Define your evaluation metrics, e.g., detection rates, accuracy, running time.


With five candidate gestures, we’ve achieved good accuracy on clean backgrounds (without other parts of body). We can also detect if there’s a gesture on the capture or not. Since the algorithm make recognition on every frame (30 fps), it is difficult to calculate the actual accuracy/recall values, and it makes no sense to calculate that by static images as it would be inaccurate as well.


Trial Tamplate Image Result Image
trial 1: Thumbs up
trial 2: Thumbs down
trial 3: One
trial 4: Victory
trial 5: Five

Confusion Matrix

Accuracy = 52.4%


Discuss your method and results:


Our algorithm works really well if the background is simple and other parts of our body are not included in the camera. However, we need to change our classification algorithm and try to use other methods to find the face, maybe CNN.

Credits and Bibliography