Problem Definition
The goal of this assignment is to learn more about the practical issues that arise when designing a tracking system. More specifically, we are going to track animals in video sequences, i.e., identifying the same animal from frame to frame.
To achive this goal, we first need a model to predict the position of an object on next frame based on its positions on previous frames. We could use filters such as alpha-beta filter, kalman filter to do the trick. For each predicted position, we need to associate it with a real detected object in the next frame using some data association algorithm.
Difficulties include bad object detections, objects occluding with each other, etc.
Method and Implementation
- First find the centroids of each detected objects in each and every frames.
- In c++, create a tracking object class to store all the data related to the objects under tracking, including its position, velocity, and filter parameters.
- For the first frame, create an instance of the tracking object class for each detected objects. Add them to the tracking object list.
- Initiate the alpha-beta filter to predict the position of each tracking objects in the next frame
- Data-Association: Using greedy method to find the detected object in the next frame with minimum distance to the predicted position.
- Update the filter parameters as well as the tracking objects list. This includes adding new detected objects into the tracking object list if no exsiting tracking object is associated with it, and removing objects from the list if we lose track of them.
- Repeat such steps all the way through the last frame.
Class and functions I created to implement the above algorithm include:
- The Tracking Object Class: detectedObject
class detectedObject { public: detectedObject(int size_x, int size_y); ~detectedObject(); abFilterParam abFparam; vector <int> currentLocation; int currentArea; vector <double> currentVelocity; vector <vector <int>> locations; vector <int> areas; vector <vector <double>> velocities; Mat trajectoryImg;//the image of its trajectory int firstAppearFrameID; int firstAppearPointID; unsigned char rgb[3]; };
void readFromTxt(String filename, vector<vector<int>>& points)
is used to read centroid data
written in a text file into a vector of vector of int. vector<int> alphaBetaFilter_predict(detectedObject& deObj)
is used to predict the position of a given
tracking object based on its current location and its alpha-beta filter parameters. int matchingPoints(int trackObjID, vector& veloc,
vector<int>& point_pred, vector <vector <int>>& points_curr, unordered_map<int, vector<int>>& assignedObjLists,
unordered_map<int, int> & points_current_remains);
will find the best match of detected object within the next frame to a given tracking object based on its predicted position and other properties.void alphaBetaFilter_update(detectedObject& deObj, vector<int> newPoint)
is used to update the alpha-beta filter
parameters of a tracking object based on the matching result. Experiments
I implemented the above tracking algorithm which can deal with general multiple objects tracking in video sequences. The INPUT for my algorithm is the centroids of the detected objects in each frame.
For the "bat" and "fish" dataset, the first step is to calculate their centroids and write them into text files.
For the "bat" dataset, I used the segmentation and detections provided by Prof. Betke. The segmentation is provided in a set of label maps. There is one number per pixel, delimited by commas. Pixels with the value 0 are background. The maps are 1024 by 1024. The detections are given in a comma delimited file, one for each frame. There is one point per line. Each point is given as the X coordinate followed by the Y coordinate, delimited by commas.
For the "fish" dataset, since only the contours of the fishes are provided, I first use "red color" detection to keep the contours,
and then use thresholding to convert it into binary image. Finally use OpenCV's connectedComponentsWithStats
function
to obtain the centroid of each fish contour.
With these centroids as INPUT, the algorithm can track the objects and draw the tracking results dynamicly in real-time.
Results
The tracking results for both dataset are shown below.
NOTE: These videos are deliberately slowed down to see the differences clearly, the algorithm actually runs in real-time.
In the THIRD AND FIFTH VIDEO, red dots represent tracking objects, green dots are their corresponding predicted positions using Alpha-beta Filter, and blue dots are the matching results in the next frame.
Discussion
- As shown in the videos, the tracking results is good when the detection of the objects is correct and when the object is not occluded. The alpha-beta filter can perfectly predict the position of the tracking objects by setting appropriate parameters
- For spurious detections, a distance threshold is set in the matching function so that it won't match the predicted point with a point far away even though their is no nearby match, in this case, the object will be removed from the tracking object list.
- For future work, first we need to improve the detection quality and use more frames to improve the results when objects are occluding with each other.
Conclusions
The implemented alpha-beta filter predicts the position of the objects fairly well, and in practice, the greedy data association method works efficiently. As a whole, this track can work well provided that the detections are accurate.
Credits and Bibliography
Cite any papers or other references you consulted while developing your solution. Citations to papers should include the authors, the year of publication, the title of the work, and the publication information (e.g., book name and publisher; conference proceedings and location; journal name, volume and pages; technical report and institution).
Material on the web should include the url and date of access.
Credit any joint work or discussions with your classmates.