CS585 Assignments

Assignment 1

The original image was converted to the grayscale image using the ITU-R 601-2 luma transform. The edge detection was performed via convolution using a 3x3 edge detection kernel.

Assignment 2

Team Members: Rishab Nayak, Jason Li

Problem Definition

The goal of A2 is to design and implement algorithms that recognize hand shapes or gestures, and create a graphical display that responds to the recognition of the hand shapes or gestures. This helps us learn how to read camera frames in real time, perform some computation on it, analyze image features and create a graphical application to tie it all together. I anticipated difficulties detecting skin color, segmenting out the image, and distinguishing hand shapes. I didn't identify gestures, only shapes.

Method and Implementation

I implemented an algorithm that recognizes the >1 fingers in the frame of the camera via convexity defects and the cosine theorem.

The algorithm only works when the hand is the only object in the frame. First, I perform skin detection by converting the image to the HSV colorspace, and using the cv2.inrange function to segment out areas of the frame that are skin colored.

I then apply a 10x10 blur to the image (to improve thresholding output), after which I threshold the image. I then find contours on this binary image, using the cv2.RETR_EXTERNAL to ensure only external image contours (those outside the palm) are detected.

Then, I find the largest contour, which ideally is the contour of the hand, and calculate its bounding rectangle and convex hull. I use cv2.convexityDefects to find any defects in the convex hull (fingers), and then use the cosine theorem to calculate the angle between these defects. I assume that if the angle is less than 90, it must be a finger.

I then display the number of fingers along with the detected hull on the graphical display using cv2.putText.

Results

The algorithm correctly detects 5, 4, 3 and 2 fingers. A confusion matrix composed of 40 tests, 10 tests per shape is shown below. Columns represent True values, Rows represent Predicted values

x 5 4 3 2
5 9 1 0 0
4 0 8 1 1
3 0 2 7 1
2 0 0 3 7