Segmentation of hands & Tracking bats and cells

Qian Xiang

Tong Li


10/17/2018 - 10/30/2018




Visual Studio 2017

OpenCV 3.2

Problem Definition for Part 1: Segmentation of hands

In this part, you are given a sequence of video frames in which a person is playing the piano with both hands. Try to develop an algorithm to identify the pianist's hands. Portions of the hands are sometimes in deep shadow, which creates a challenging imaging situation.
1. Pre-process frames in order to separate hands and the background and detect skin;
2. Locate the hands with rectangles

Problem Definition for Part 2: Tracking bats and cells

The goal of this part of the programming assignment is for you to learn more about the practical issues that arise when designing a tracking system. You are asked to track moving objects in video sequences, i.e., identifying the same object from frame to frame:
You may consider two frames at a time (or, more ambitiously, use multiple hypothesis tracking (MHT) with more than two frames).
You may use a greedy, suboptimal bipartite matching algorithm (or, more ambitiously, implement an optimal data association algorithm).
To estimate the state of each tracked object, you may use an alpha-beta filter (or, more ambitiously, a Kalman filter).
Bat Dataset
The bat dataset shows bats in flight, where the bats appear bright against a dark sky. The dataset includes both grayscale and false-color images from this thermal image sequence as well as their segmentation and localization data. The segmentation of the bat dataset is provided in a set of label maps. The detections are given in a comma delimited file, one for each frame.
Cell Dataset
The cell dataset shows cells moving on a petri dish, the brightness of the pixels within the cells are very similar to the values of the background. The dataset includes both the original raw data and the normalized data. Images in the raw data have different overall brightness levels and contrast.
1. Get the centroids of objects
2. Predict the position of an object on next frame based on its positions on previous frames using Kalman filter
3. Associate each of the predicted positon with a real detected object in the next frame using Hungarian Algorithm
4. If there are too many bats, we first consider occlusion; if the point reappears according to what the Kalman Filter predicts, we jump there. If this does not happen within 10(this value can change) frames, we consider the bat lost and stop tracking it.

Method and Implementation For Part 1:

1. Pre-process images to get ROI

Since the scene in the images is static, we used absdiff() to get the difference between each images and the average of 19 images(using add() to add all images together and divided by 19). Convert the difference to gray scale, give threshold to it and convert it to binary image. Then use binary image to get the ROI of the image(the pianist).

The average image of 19 images

The difference between the first frame and average

Region of interest of the first image

2. Detect skin

Use the ROI image to detect skin. In this part, we add another condition for the skinDetect method: (R-B)>100, Before adding this condition, many other parts can be detect, such as hair, clothes and keyboard. By using Photoshop, we found that some part of the hands have specific value, that is R-B>100. By using this, we eliminate many noise in the image and can always detect some part of the two hands and some part of hair.

Skin of the first image

3. Find Contours and draw the bounding rectangle

Use findContours() to find the contours of each white part of the skin image and then use boundingRect to locate the parts. Eliminate the rectangle that is on the right side of the image. For the left side of the image, pick the largest two rectangle to draw(larger the rectangle properly to locate hands)

Hands located in image

Method and Implementation For Part 2:

We implemented Kalman filter and Hungarian algorithm to track the bats and cells.
1. Convert File To Mat(method is from TA yifu Hu) 2. Get Centroids(method is from TA yifu Hu) 3. Assume that if no tracking has happened yet, every point starts its own track, which happens at time 0.
4. Configure our own Kalman filter based on KalmanFilter() provided by openCV to predict the location of every tracking objects.
5. Use Hungarian algorithm to compute the optimal assignment and select detections that minimize the total distance in the track cluster.
6. Update status and attributes of the filter, which includes initializing new tracks when new objects are detected and removing old tracks when we lost the objects.
7. Repeat the steps above till we get to the last frame.
Two main classes are shown below:
class Hungarian
void assignmentoptimal(int *assignment, double *cost, double *distMatrix, int nOfRows, int nOfColumns);
void buildassignmentvector(int *assignment, bool *starMatrix, int nOfRows, int nOfColumns);
void computeassignmentcost(int *assignment, double *cost, double *distMatrix, int nOfRows);
void step2a(int *assignment, double *distMatrix, bool *starMatrix, bool *newStarMatrix, bool *primeMatrix, bool *coveredColumns, bool *coveredRows, int nOfRows, int nOfColumns, int minDim);
void step2b(int *assignment, double *distMatrix, bool *starMatrix, bool *newStarMatrix, bool *primeMatrix, bool *coveredColumns, bool *coveredRows, int nOfRows, int nOfColumns, int minDim);
void step3(int *assignment, double *distMatrix, bool *starMatrix, bool *newStarMatrix, bool *primeMatrix, bool *coveredColumns, bool *coveredRows, int nOfRows, int nOfColumns, int minDim);
void step4(int *assignment, double *distMatrix, bool *starMatrix, bool *newStarMatrix, bool *primeMatrix, bool *coveredColumns, bool *coveredRows, int nOfRows, int nOfColumns, int minDim, int row, int col);
void step5(int *assignment, double *distMatrix, bool *starMatrix, bool *newStarMatrix, bool *primeMatrix, bool *coveredColumns, bool *coveredRows, int nOfRows, int nOfColumns, int minDim);
double Solve(vector >& DistMatrix, vector& Assignment);
class TKalmanFilter
KalmanFilter* kalman;
double deltatime;
Point2f LastResult;
TKalmanFilter(Point2f p, float dt = 0.2, float Accel_noise_mag = 0.5);
Point2f GetPrediction();
Point2f Update(Point2f p, bool DataCorrect);

Experiments for Part 1

1. Input: 19 images
2. Output:
Images with located hands marked by yellow and green rectangles.(The images are two large and it can not fit in the window size of my computer, so I save the images in the files to get the result)
3. The precision:
The precision is pretty good. For every image, there will be two rectangles to locate the hands even if the hands are occluding with each other.

Experiments for Part 2

For the bat dataset, we used the provided segmentation and localization data. The segmentation of the bat dataset is provided in a set of label maps. There is one number per pixel, delimited by commas. Pixels with the value 0 are background. The maps are 1024 by 1024. The detections are given in a comma delimited file, one for each frame. There is one point per line. Each point is given as the X coordinate followed by the Y coordinate, delimited by commas.
For the cell dataset, we change the method to get centroid. Since the input are images, we blur image first, then use adaptive threshold to segment cells in the image. In oder to eliminate some noise and get the centroids, we dilate and erode first, and then we use connectedComponentsWithStats() to eliminate components which size smaller than 800

We assigned each bat or cell a new random color and drew the flight trajectories of the bat or cell in that color.

Results For Part 1

Result for img15

Result for img17

Result for img20

Result for img24

Result for img26

Result for img27

Result for img31

Results For Part 2

Result for bat

Result for cell

Discussion For Part 1

1. Strengths:

Our method can locate hands accurately

2. Weaknesses:

The skin detection can only detect a small part of the hands, thus the rectangle can not always include the whole hand

3. Limitation:

Our method performs well in this condition(static background, pianist wearing long sleeves), if the condition changes, it may not perform very well

4. Expectation:

Use some other method to detect hands only.

5. Potential future work:

Detect most part of hands.

Discussion For Part 2

1. Dataset:

For the cell dataset, we have to find centroids by ourselves. Since finding accurate cell segmentation is very challenging, another two ways we've tried that are moments and bounding box turned out to be not as good as using adaptive threshold and connect components. In that case, we mainly focus on the multi-object tracking task and detecting the birth of a new cell rather than segmentation.

2. Entering and existing:

Entering: Check the unassigned detectors for new centroids before update.
Existing: Stop tracking the centroid, if its projected x or y value was outside the size of the picture.

3. Occlusion:

We set the maximum number of frames that the track saves to not receive data measurements to 10.
If the point reappears according to what the Kalman filter predicts, just jump there. If it does not happen within 10 frames, we consider the object lost and stop tracking it.

4. Spurious detection:

We set the threshold distance to 60(for bats),500(for cells). If two points are separated by a distance greater than the threshold, even though there is no nearby match, we won't match this pair of points.

5. Challenging part:

Succeeded in tracking multiple objects smoothly and accurately in most cases. One minor problem is that when an object flies over the path, it will be erased.

6. Potential future work:

Implementing multiple hypothesis tracking could be the further research.

Conclusions For Part 1

For this part, We successfully detect the hands in every image. Only the hand parts in light are used in the detection process and then the bounding rectangle that enclose hands are drawn based on the detected part with some extension in each direction. Using the absdiff() in such a static scene is a simple and useful way to eliminate background and piano. If the scene is not static and pianist wears T-shirt or some clothes that will expose the arm, it will be difficult to locate only hands.

Conclusions For Part 2

As the result showed above, it is obvious that the implementation of Kalman filter works well with Hungarian algorithm. The Kalman filter predicts the position of the objects accurately and the Hungarian data association method works efficiently. Although it fails to retrack objects after occlusion in small cases, it is still very successful with smooth and reliable tracks. If we can have the accurate position of objects' centroids, the algorithm can track the objects and draw the flight trajectories better in real time.

Credits and Bibliography

1. Help offered by TA(The idea how to compute the mean of images and how to use it). 10/24/2018
2. OpenCV documents to find some methods and functions: https://docs.opencv.org/3.2.0/annotated.html, 10/18/2018-10/30/2018
3. Some functions of OpenCV to compute contours: https://blog.csdn.net/wangshuai610/article/details/79913600, 10/06/2018
4. OpenCV documents of Kalman Filter: http://docs.opencv.org/master/dd/d6a/classcv_1_1KalmanFilter.html#gsc.tab=0, 10/17/2018
5. Improve Kalman Filter performance: http://answers.opencv.org/question/73190/how-to-improve-kalman-filter-performance/, 10/20/2018
6. Hungarian algorithm: https://github.com/Smorodov/Multitarget-tracker, 10/22/2018