# Assignment 4

CS 585 HW 4
Jiangshan Luo
Teammate: Shijie Zhao, Jamie Nelson
Oct. 31. 2018

## Problem Definition

There are three main parts to this assignment:

1) In the first part of the assignment we are given frames of a person playing the piano. Our task is to segment the left and right hand from each frame.

The second and third part of this assignment is to track objects in a video.

2) In the second part, we are given the frames of a video of bats flying through the sky. We are also provided the segmentation of those bats in the image by (x,y) coordinates in a CSV file. We need to use the Kalman Filter to track the bats through each frame.

3) Finally, the third part we are given a frames of a video of cells moving in a petri dish. We have to provide a segmentation of the cells and use the Kalman Filter to track each cell through the frames of the video.

For each of these sections, we have to create a visual representation of our results.

## Method and Implementation

### Part 1 - Segmentation of Hands

1. Pre-processing
2. Firstly, we load all the image frames into a list of matrices. Then, we calculate the average pixel values for all the frames and store it as a new image. The average image is shown below:

3. Segmentation
4. There are two major steps in the segmentation part. First, we compute the differences between image frames and the average image to get the movement energy in each frame. By doing this, we can easily get rid of the static regions (refers to the Figure below).
As we can see, there are still lots of noise blobs and the useless body regions left. In the second place, in order to segment the hand regions, we apply the skin color detection method on the result we get (Figure below). Finally, we pick the top three largest blobs, which is the hand and head regions, since these regions are significantly larger than the noise blobs.

5. Visual representation
6. Bounding boxes are drawn with the labels of the hand above them (left/right/overlap hand(s)). The example shown below is a special case where two hands are overlapping.

### Part 2 - Tracking Bats

1. Pre-processing
2. Load the frames and localization data and save them into data structures (list/panda dataframe).

3. Tracking
• Prediction - Kalman Filter
• A Kalman filter is implemented. There are many exceptions and special conditions that needs to be taken care of. Such as, if the object is moving fast enought that it is estimated to be out of the frame, and doesn't have a measurement in the next frame, will be discarded.

• Data Association - Hungarian/Greedy Algorithm
• First, we implemented the greedy algorithm to match the predictions with the measurements found in the image. We used the euclidean distance between both sets of (x, y) values to create a cost matrix. Then given the cost matrix we determined the pairs by selecting the smallest for the first prediction and continued through the list of predictions.

We also implemented the Hungarian algorithm using a library. However, the algorithm had low efficiency due to the time complexity (O(n^3)) and the pairing did not work better than greedy. This is why we we decided to use greedy.

4. Visual representation
5. After the tracing process, we get a list of sequences respecting to time, which contain the traces for the objects in the image frames. Then, we use ` cv2.line() ` to visualize the trace of each object step by step. Finally, we stored the visualized frames into a video which is provided in the Results section.

### Part 3 - Tracking Cells

1. Pre-processing
2. Same as Part 2. Here, since we only have cell images available, the segmentation method is required.

3. Segmentation
4. For the segmentation of the cells, we used the OpenCV function findContours. The function returned small sections of each cell. We combined the small sections by filling the bounding box with a solid color then using findContours again. This returned the combined sections for one area of the image which we considered one cell. Then we took the center of the bounding box and used that as the (x, y) position to track the cell.

5. Tracking
6. Same as Part 2.

7. Visual representation
8. Same as Part 2.

### functions created:

`skinDetect`: find the pixels with the skin color
`three_largest_blobs`: retrieve the top three largest blobs in the image
`preprocess`: load the frames and location data
`visualize_track`: draw the traces of objects in the frames
`output_visualization`: generate a video representation for the output
`calculate_cost`: determine the cost matrix given two sets of points
`greedy`: first predicted_value (object tracked for the longest time) gets highest priority so assign smallest value distance to measurement to that prediction and so on through all predictions
`data_association`: Formulate the 2D assignment problem and obtain a global optimal
`Kalman_Filter`: Predict the measurements and their covariances to estimate the validation gates. Perform tracking by updating the state of each object and its covariance from the assignment result.

## Experiments

The experiment is completed in Python. Source Code can be found here:

The resources we used:

1. Piano Images
2. Bat Images /li>
3. Bats Localization
4. Cell Images

## Results

The results for the three parts are shown below:

## Discussion

• In part 1, we detected the positions of hand in all image frames provided, including the overlapping cases. However, it's still difficult to completely segment the hand blobs due to the variances in pixel color. Different strategies other than skin color detection might be used to gain a higher performance.
• Kalman Filter
• We implemented our own Kalman Filter. It is however very limited. For example, validation gate is not implemented. So for data association we have to use greedy method. Also, there is no easy way to estimate several error covariance matrices. And since we are using the ground truth position of the objects for the bat tracking problem, we set measurement error to be small. For the cell tracking project, because we are not sure how different measurement error and prediction error are, the measurement and prediction covariance matrix are set to be identical. The implementation is far from perfect, due to limitation of time. However it still produced satisfactory result.

• In part 3, the segmentation was extremely difficult given the brightness values and the shape of the cells. The edge of the cell was brighter then the inside which made is hard to distinguish the cell using traditional methods.

## Conclusion

We successfully recognized and visualized the hand positions of the pianist in part 1. For part 2 and 3, the Kalman Filter combined with the Data Association methods works fairly well and we have segmented (part 3), tracked and visualized the traces for all the detected objects.

## Credits and Bibliography

CS 585 - Lab 7 Solution - Teaching Fellow Yifu Hu
Classmates: Shijie Zhao, Jamie Nelson