In this part, you are given a sequence of video frames in which a person is playing the piano with both hands.
Try to develop an algorithm to identify the pianist's hands. Portions of the hands are sometimes in deep shadow, which creates a challenging imaging situation.
1. Pre-process frames in order to separate hands and the background and detect skin;
2. Locate the hands with rectangles
The goal of this part of the programming assignment is for you to learn more about the practical issues that arise when designing a tracking system. You are asked to track moving objects in video sequences, i.e., identifying the same object from frame to frame:
You may consider two frames at a time (or, more ambitiously, use multiple hypothesis tracking (MHT) with more than two frames).
You may use a greedy, suboptimal bipartite matching algorithm (or, more ambitiously, implement an optimal data association algorithm).
To estimate the state of each tracked object, you may use an alpha-beta filter (or, more ambitiously, a Kalman filter).
The bat dataset shows bats in flight, where the bats appear bright against a dark sky. The dataset includes both grayscale and false-color images from this thermal image sequence as well as their segmentation and localization data. The segmentation of the bat dataset is provided in a set of label maps. The detections are given in a comma delimited file, one for each frame.
The cell dataset shows cells moving on a petri dish, the brightness of the pixels within the cells are very similar to the values of the background. The dataset includes both the original raw data and the normalized data. Images in the raw data have different overall brightness levels and contrast.
1. Get the centroids of objects
2. Predict the position of an object on next frame based on its positions on previous frames using Kalman filter
3. Associate each of the predicted positon with a real detected object in the next frame using Hungarian Algorithm
4. If there are too many bats, we first consider occlusion; if the point reappears according to what the Kalman Filter predicts, we jump there. If this does not happen within 10(this value can change) frames, we consider the bat lost and stop tracking it.
The average image of 19 images
The difference between the first frame and average
Region of interest of the first image
Skin of the first image
Hands located in image
1. Input: 19 images
Images with located hands marked by yellow and green rectangles.(The images are two large and it can not fit in the window size of my computer, so I save the images in the files to get the result)
3. The precision:
The precision is pretty good. For every image, there will be two rectangles to locate the hands even if the hands are occluding with each other.
For the bat dataset, we used the provided segmentation and localization data. The segmentation of the bat dataset is provided in a set of label maps.
There is one number per pixel, delimited by commas. Pixels with the value 0 are background. The maps are 1024 by 1024.
The detections are given in a comma delimited file, one for each frame. There is one point per line. Each point is given as the X coordinate followed by the Y coordinate, delimited by commas.
For the cell dataset, we change the method to get centroid. Since the input are images, we blur image first, then use adaptive threshold to segment cells in the image. In oder to eliminate some noise and get the centroids, we dilate and erode first, and then we use connectedComponentsWithStats() to eliminate components which size smaller than 800
We assigned each bat or cell a new random color and drew the flight trajectories of the bat or cell in that color.
Result for img15
Result for img17
Result for img20
Result for img24
Result for img26
Result for img27
Result for img31
Result for bat
Result for cell
For this part, We successfully detect the hands in every image. Only the hand parts in light are used in the detection process and then the bounding rectangle that enclose hands are drawn based on the detected part with some extension in each direction. Using the absdiff() in such a static scene is a simple and useful way to eliminate background and piano. If the scene is not static and pianist wears T-shirt or some clothes that will expose the arm, it will be difficult to locate only hands.
As the result showed above, it is obvious that the implementation of Kalman filter works well with Hungarian algorithm.
The Kalman filter predicts the position of the objects accurately and the Hungarian data association method works efficiently.
Although it fails to retrack objects after occlusion in small cases, it is still very successful with smooth and reliable tracks.
If we can have the accurate position of objects' centroids, the algorithm can track the objects and draw the flight trajectories better in real time.
1. Help offered by TA(The idea how to compute the mean of images and how to use it). 10/24/2018
2. OpenCV documents to find some methods and functions: https://docs.opencv.org/3.2.0/annotated.html, 10/18/2018-10/30/2018
3. Some functions of OpenCV to compute contours: https://blog.csdn.net/wangshuai610/article/details/79913600, 10/06/2018
4. OpenCV documents of Kalman Filter: http://docs.opencv.org/master/dd/d6a/classcv_1_1KalmanFilter.html#gsc.tab=0, 10/17/2018
5. Improve Kalman Filter performance: http://answers.opencv.org/question/73190/how-to-improve-kalman-filter-performance/, 10/20/2018
6. Hungarian algorithm: https://github.com/Smorodov/Multitarget-tracker, 10/22/2018