Assignment Title

CS 585 Assignment 4
Yicheng Li
Zilin Zhang
Fangya Xu
Feb 2020

Problem Definition

The goal of this part of the programming assignment is for you to learn more about the practical issues that arise when designing a tracking system. You are asked to track moving objects in video sequences, i.e., identifying the same object from frame to frame

Method and Implementation


Before we segment our object, no matter for bats or cells, we have to make sure we use proper method to prepossess our image so that the segmentation method would run successfully.

Firstly, we convert the image into gray image. Then since the bats image and the cells image have different kind of colour feature. We use different method to convert them in to binary image. For bats image, the background lights increase from top of the image to the bottom. So we use a pretty high threshold to find the bats. For cell image, the histogram of the image is aggregate. So we should choose the threshold carefully to seperate the object and background.


Then we begin to segment our objects, which is the very first step to track the objects. We used canny edge detection to find the edges of the objects. Then we find the contours of the object as we learned in our class and dinf the centroids of the object. In this part, we also need to use two different methods. For bets image, the range of the radios of the objects is pretty large so that we should use a flexible radios. For cell image, we can limit the radios of the objects to a relatively narrow range to make sure the cells would be segmented.

Object Tracking

For tracking part, we use a series of method to track the object and predict the track. At the begining, we give every object an unique trackID to discriminate each other. For the each frame, we calculate the distance between each unique object to determinate if they are the same. Then we use Hungarian algorithm determine and assign evey object to correct trackID as well as prediction. Then we use the results to maintain tracks handle unassigned track predictions and detections.

Tracks Prediction

At the end of the iteration of each frame, we apply Kalman Filter to predict and correct or update tracks. We recursively calculate the state vector with state transtion matrix, which captures state transition from one time step to another, and the previous state vector. Then we calculate the covariance matrix with state transition matrix and previous covariance matrix and process noise matrix. Then we calculate this estimation for each frame and find best estimate of final state.


After we find the best estimate of final state, we do correction. We first calculate the covariance weighting matrix with variance along diagnal with matrix in observation equations and previous predicted covariance matrix. Then we calculate the Kalman Gain matrix with it.And finally we calculate the predicted state vector and that is the result we want.

Experiments and Results

Bats images

Cells images


  • In the prepocessing part, we meet some problems. Firstly, we figure out that we have to use two different methods to seperate objects from background in two different images. The reason of this is that in bats image, bats could fly through the camera so that the radios of it could be really large and when the bats fly far away from the camera it could be really small. However, when it comes to cells image, things are little different. The microscope as well as the culture dish are fixed so the range of radios of the cells is narrow. In this case, we realize that we have to implement different methods in different situations and to do this, we have to study the image we used carefully.
  • When tracking objects, we also meet some trouble. First we have to figure out what is the maximum allowed frames to be skipped the track object undetected. First we tried 60 and then we found that in bats frames bats could come and go frequently and with such a big maximum frames we have to maintain lots of trace that may cost extra time. So we change the number of frames to 30. Then we have to implement a proper length for every trace since we can keep all trace in the frame, which do no goods to see it clearly. Lastly, we need to find out the distance we used to determinate whether two objects are the same or not. We use Euclidean distance and find out 160 pixels would be a proper threshold.
  • Potential future work. How could your method be improved? What would you try (if you had more time) to overcome the failures/limitations of your work?