bu-logo

Assignment Title

CS 585 HW 2
Yize Xie U14485891
Date 02/12/2020

Problem Definition

Design and implement algorithms that recognize hand shapes (such as making a fist, thumbs up, thumbs down, pointing with an index finger etc.) or gestures (such as waving with one or both hands, swinging, drawing something in the air etc.) and create a graphical display that responds to the recognition of the hand shapes or gestures.

Method and Implementation

We use template matching to recognize different hand shapes. For the criteria, we choose normalized correlation that is introduced during lecture, which is a robust and accurate method. Our algorithm has several stages.

Preprocessing:

we choose 5 templates: thumb-up, thumb-down, yay, palm and fist. They are converted to grayscale image by our skin detection algorithm (introduced on lab, use RGB value to decide whether a pixel belongs to human skin).

That is , we convert each skin pixel to grayscale, but background pixel to 0(black). For example:

Before processing:
fist
After processing:
bfist

Template matching:

We capture image from user camera per 1/30s in order to have 30FPS ideally. For each captured image

  1. we first use the same skin detection algorithm as used in preprocessing to convert it to grayscale.
  2. we calculate normalized correlation by the following formula for each template. r_=_frac_1_n_fra
  3. we select the best match. If the correlation coefficient is greater than some threshold that is set previously, we report that the gesture of the corresponding template is detected.
  4. we draw a rectangle that reflects our detection. Different gesture is reflected by different color of rectangle that we draw. For example, thumb-down is red and palm is blue.

Sample detection:
屏幕快照 2020-02-12 下午11.11.39

Experiments

I conduct following experiment: test the program with 4 different true gestures, each 10 times, and record the detected result.

Results

The confusion matrix is drawn as following. Columns are true labels, and rows are detected labels. The order is: thumb-up, thumb-down, palm and fist.

6-1-0-0_4-8-3-4_

Besides, we compare our calculation with OpenCV library: matchTemplate, and they give same output. However, the algorithm of OpenCV is much faster because it uses Discrete Fourier Transformation to accelerate the multiplication process.

Discussion

The interesting points are, our algorithms is somewhat negative and pessimistic. It tends to recognize everything as thumb-down, which may be caused by light condition or our template.

Credits and Bibliography

My teammates are Wenxing Liu and Weifan Chen.