Assignment 2

CS 585 HW 2
Sameer Chaturvedi
February 12, 2020

Problem Definition

This part of the homework involves programmatically designing and implementing algorithms that recognize hand shapes or gestures, and creating a graphical display that responds to the recognition of the hand shapes or gestures.
The algorithm should detect at least four different hand shapes or gestures. It must use skin-color detection and binary image analysis (e.g. centroids, orientation, etc.) to distinguish hand shapes or gestures.

Method and Implementation

Here, I implemented the program in python using the helper skeleton code given to us in C++ from lab.
I analyzed a live video feed from the webcam to solve the problem mentioned above.

Some notable techniques used:
1) template matching (created templates of three static hand positions, and one dynamic gesture)
2) background differencing: D(x,y,t) = |I(x,y,t)-I(x,y,0)|
3) frame-to-frame differencing: D’(x,y,t) = |I(x,y,t)-I(x,y,t-1)|
4) motion energy templates (union of binary difference images over a window of time)
5) skin-color detection (thresholding pixel values)
6) horizontal and vertical projections to find bounding boxes of ”movement blobs” or ”skin-color blobs”
7) tracking the position and orientation of moving objects

The program handles 3 static hand signs:
1) A closed fist
2) An open palm
3) A peace sign (two fingers held up)
It also handles one dynamic hand gesture: hand wave
The program is executed by running "python static" to obtain results for the three static hand signs, and "python dynamic" for the hand gesture.

static hand signs:

First off, pre-captured templates are loaded in and converted to black and white, and resized appropriately. The video camera is started, and the program reads in the current frame and overlays a rectangle in which the hand sign is recognized.
Then, I use background subtraction (get the average frame sequence for 30 frames) to prepare the area for hand detection.
A contour, or bounding box, is drawn on the hand which follows the shape.
Opencv's matchTemplate function is used with every template to generate scores based on how accurate the match is of the user's hand sign and the 3 templates. These scores are stored in an array and the max of these is the final result which is displayed in the same live video feed, if the value is above a certain threshold.

dynamic hand gesture:

The video feed is read and resized the template size. The frame differencing function is used on the previous and current frames detect dynamic motion and ignore the rest. Then, skin color detection attempts to match the current frame to our template and is thresholded.
After this, a contour is drawn that bounds areas in the fram that match the skin color. I then find the motion history to accumlate the frame differnces for a certain number of pairs of frame. A feed is displayed that depicts the motion history, and another feed depicts the bounding box drawn to detect object movement that matches the skin color.
Opencv's matchTemplate function is used with the template to generate a score based on how accurate the match is of the user's hand wave and the template hand wave, generated separately (done using similar techniques in the lab). If the value is above a certain threshold, we can inform the user that it is a hand wave.


Here are the template images used:

1. Palm:

2. Peace:

3. Fist Sign:

4. Hand Wave:

Here are some results (please look at the top left of the image to see what the program idicates the hand sign is):

But sometimes, a wrong result occurs due to some errors in contour detection and threshold values:


My program did a fairly well job of detecting hand signs and gestures, usually provided there's enough contrast in the images (a dark background works best)
This is powerful in image analysis to be able to detect hand positions and general motion!

Credits and Bibliography

Assignment 1

CS 585 HW 1
Sameer Chaturvedi
January 29, 2020

Problem Definition

This part of the homework involves programmatically modifying an image of a face. It has three parts:
1. Create a grayscale image of your face by converting your color image using one of the conversions we discussed in class last week.
2. Flip your face image horizontally, i.e. left to right, right to left.
3. Come up with a third way of manipulating your face that produces an interesting output. For example, you may create a blurred image of your grayscale face by assigning to each pixel the average grayscale pixel value of itself and its 8 neighbors. Hint: You may have to run your program a few times to make the blurring noticeable.

Method and Implementation

Here, I implemented the program in C++ and analyzed the image (my face) as a matrix of pixel values. These values were manipulated to solve the three above mentioned parts.

1. The grayscale() function converts the image into grayscale by taking the BGR pixel values and using them to calculate the grayscale value using the formula V = (B+G+R)/3.
2. The flip() function works by swapping pixel values from the first half of the image to its complement in the other half.
3. Here, I chose to mirror the face to create a new face that looks like a cyclops with one eye, and then tinting the whole image red.


Below are the resultant images generated by my code for the source image for each part:


1. Grayscale:

2. Flip:

3. Red Cyclops:


Images can be analyzed as matrices of pixel values, and these values can be programmatically manipulated to perform, say, face manipulation.

Credits and Bibliography