CS 585 Image and Video Computing (Spring 2020) : Assignment 4 - Programming

CS 585 HW 4
Allison Mann
Siddharth Mysore
Date: April 3rd 2020

NOTE This page uses javascript packages that needed to be loaded from mathjax.org for TeX display so the page may not display correctly if viewed offline


Problem Definition

The goal of this programming assignment was to learn more about the practical issues that arise when designing a tracking system. We are tasked with tracking moving objects in video sequences, i.e., identifying the same object from frame to frame. Two datsets are provided - the bat dataset and the cell dataset on which to demonstrate our work.


Method and Implementation

Object Localization

  1. Bat dataset

    For the bat dataset, we opted to use the localizations provided in the project resources.
  2. Cell dataset

    The cell dataset requires us to perform segmentation for tracking. The grayscale values of the cell objects and dish background are very similar, so we used a variety of techniques to measure cell centroids for the Kalman filter. To differentiate the region of interest (the cell's dish) from the background, we use a hand crafted mask. This is possible because the dish and camera do not move throughout the video. Any detections outside of this region are not reported. Within the disk, we separate it into two regions. The cells in the bottom region (bottom ~70% of each frame) are much easier to localize because there is a larger separation between absolute grayscale values of the dish and the cells. Here, we use absolute thresholding. For the top region, absolute thresholding is not possible without missing many cells or erroneously picking up the background. Here, we use edge detection to detect the cell boundary, dilate, and then use a lower absolute threshold on the pixels close to the detected boundary. We combine detections for the top and bottom regions, and perform connected components analysis to extract centroids of the detections.

    Example Localizations of Cell Dataset

    The original frame in the dataset
    Binary cell detections
    Cell centroids after connected components analysis

Object Tracking

For tracking, we utilize an Extended Kalman Filter (EKF). The EKF is an extension to the vanilla Kalman Filter which linearizes the process model around the control point and operates with discrete approximation of a continuous time system.

The EKF operates with the following assumptions:

  1. The prior state is represented by a Gaussian distribution, i.e. $ p(x) \sim N(\mu, \Sigma) $
  2. The continuous process model is $\dot{\bf x} = f({\bf x},n)$ where $n \sim N(0,Q)$
  3. The measurement model is $z = h({\bf x},v)$ where $v \sim N(0,R)$

For the purposes of the EKF used in this project, we assume that the internal state representation of an object is given by a 4D vector. The first 2 dimensions represent the estimated position in Cartesian space, while the second 2 represent the estimated velocity, i.e. $$ {\bf x_t} = [x_t, y_t, \dot{x}_t, \dot{y}_t]^\top $$ The EKF allows us to track the internal state estimate as well as the covariance of the estimate. While the covariance is not directly used in our work (for simplicity), it could concievably be used to more intelligently scale matches during bipartite matching.

Process Update

The process update seeks to estimate the new position of the tracked obect after some time elapsed. This would be given by $ {\bf x_t} = {\bf x_{t-1}} + f({\bf x_{t-1}},n_t)\delta t $.

By further assuming that the object velocity is constant, barring a measurement indicating otherwise, we can simplify the dynamics to be: $$ f({\bf x},n) \approx {\bf Ax} + n$$ $$\text{where }{\bf A} = \begin{bmatrix} {\bf 0}_{2 \times 2} & {\bf I}_{2 \times 2} \\ {\bf 0}_{2 \times 2} & {\bf 0}_{2 \times 2} \end{bmatrix}$$ With an assumption of unchaning noise ($\frac{\delta f}{\delta n}$ is constant), we can then represent the state and covariance estimate process update as a discretized one-step Euler integration: $$ {\bf \mu}_t = {\bf F}_t{\bf x}_{t-1} $$ $$ \bar{\Sigma}_t = {\bf F}_t {\bf \Sigma}_{t-1} {\bf F}_t^\top + {\bf Q} $$

Measurement Update

A general measurement model ${\bf z}_t = h({\bf x}_t, v)$ can be linearly approximated as ${\bf z}_t \approx h({\bf \mu}_t,0) + {\bf C}_t({\bf x}_t - {\bf \mu}_t) + v$ Given that for both the bat and cell datasets, we are observing the (x,y) Cartesian coordinates of the centroids of the bats or cells in a fixed reference frame, we can represent the measurement (a.k.a. observation) model as: $ {\bf z}_t = {\bf C} \tilde{x_t} $ where $\bf z$ is the measurement - in this case the (x,y) cartesian coordinates of a bat or cell, $\tilde{x_t}$ represents the true state of the object at instance $t$, and ${\bf C} = \begin{bmatrix} {\bf I}_{2 \times 2} & {\bf 0}_{2 \times 2} \end{bmatrix}$.

Following a similar derivation as with the standard Kalman filter, we can derive the Kalman gain and state estimate updates for this system as: $$ {\bf K}_t = \bar{\bf \Sigma}_t {\bf C}^\top \left({\bf C} \bar{\bf \Sigma}_t {\bf C}^\top + R \right)^{-1} $$ $$ {\bf x}_t = {\bf \mu}_t + {\bf K}_t({\bf z}_t - {\bf C \mu}_t) $$ $$ {\bf \Sigma}_t = \bar{\bf \Sigma}_t - {\bf K}_t{\bf C} \bar{\bf \Sigma}_t$$ In the absence of a measurement, the current state estimate is taken as the state estimated by the process model, i.e. $x_t = \mu_t$ and $\Sigma_t = \bar{\Sigma}_t$.

Tracking Pipeline


The full tracking pipeline is implemented (from scratch) in EKFfilter.py. Two class objects are defined: EKF2D which handles an individual 2D EKF filter and BatchTracker which handles a batch of EKD2D objects. Together, they implement the following:
  1. Initialize EKF filter-tracker(s) at a provided initial position(s)
  2. For each new frame, pefrom a process update to estimate the current estimated position(s), $\mu_t$ of the object(s)
  3. Find $n$ nearest neighbors amongst the detections for frame $t$ for each object
  4. Greedily assign detections as measurements for objects based on how close the detection is to the predicted object position, given that the distance between the detection and prediction is below a set threshold
  5. Given a detection assignment, update the internal state estimate of the object via a measurement update
  6. If an object cannot greedily claim any of its n-nearest neighbors, it is determined to not have an associated detection for the frame and the internal state is updated with the results of the process update
  7. For any detections that have been assigned to any of the object-trackers, a new tracker is spawned at the detected position


Experiments

Object Tracking

Object tracking was primarily built with the bat dataset as a reference. It was primarily important to establish a filter that was capable of not only following along with a specific object but also ignore 'distracting' measurements as appropriate.

Consider this following track, which tracks a single bat over the full bat sequence:


It is clear to see from the video clip that the tracker is able to successfuly track this bat through the full video sequence. For most of the track, there is little ambiguity, however, around the middle of the video, a number of bats overlap and the detections are dropped. Two consecutive frames - frames 828 and 829 - are analyzed to show how the filter handles the case where no good detection is available (828) and where a detection is received (829). These specific frames are analyzed as they illustrate the workings of the EKF well.

EKF Prediction and Updates

Frame 828
Frame 829
In Frame 828, none of the nearest detections (marked in red) are a good match for the tracked object (previous position marked in yellow) so the state estimate is updated with just the process update estimate (cyan)
In Frame 829, one of the nearest detections (marked in green) is a good match for the tracked object so the state estimate is updated with just the measurement update estimate to yield a new state estimate (blue)

Generally, achieving good tracking required the tuning of the filter biases. We tuned the noise parameters associated with internal state representation and measurement noise, $Q$ and $R$ respectively, until good tracking was achievable. This was mostly just done through trial and error with the intuition that a lower $Q$ value indicates a higher trust in the internal state estimate, and a low $R$ value indicates a higher trust in the measured positions. Biasing too much on internal state representations would not allow the states to evolve accurately with new measurements and being too reliant on measurements would cause tracks to be lost or corrupted in cases of overlap or missing detections. As our results show, the values we arrived at allowed for reasonably good tracking


Results

Bat dataset

Generally speaking, the tracking for the bats dataset appears decent, as demonstrated in the following video:



While the results for tracking are, we belive, generally good for the bats dataset, there are a few problem cases where tuning the filters to work well generally resulted in performance issues on a few individual cases:

Tracking issues on Bats dataset

Correct behavior when tracking bats in white and red ellipses thus far
The tracks for the bats marked by the white and red ellipses switch and never recover. This is also around when the track for the bat marked by the green ellipse is lost due to occulusion with the bat marked by the magenta ellipse
The red- and white-marked bats have completely switched tracks, thus invalidating both of thier track histories. The magenta-marked bat has been tracked well, but the green-marked bat lost its track and a new track is initiated when a new detection is available.

Cells dataset

The tracking on the cells dataset is more chaotic because the cells don't move at constant velocity and cells will collide and split. However, the Kalman filter manages to correctly track every cell for a large portion of the video:

Tracking issues on Cells dataset

The cells dataset performs relatively well for cells that split into multiple cells as it creates a new track quickly. However, it may be quick to create erroneous new tracks, especially when cells drastically change velocity, or due to imperfect segmentation. This is shown especially in this frame, where 3 separate cells that were each previously detected as a single cell split into multiple tracks.

Incorrect new tracks in a frame from the cell dataset

This pathology causes cells to be mostly correctly followed, but will change tracks in the middle of the video, switching to a track that was incorrectly spawned earlier.

Cell tracking is much more successful in regions near the top of the dish, where cells are more separated. It has the most issues in regions with a lot of cell activity, where many cells are clumped together in a small region moving erratically.

Example of highly erroneous tracks in high density areas

Discussion

As shown in the videos, the tracking results are most successful when the detection of the objects is correct and when the objects are not occluded or densely packed. Under good conditions, the Kalman filter can perfectly predict the position of the tracking objects after tuning appropriate parameters.

For spurious detections, there is a distance threshold that is in play during measurement matching that ensures that the predicted point won't be matched with a distant measurement even though there is no nearby match. In this case, if the object goes too many frames without a mesaurement update, the object will instead be removed from the active tracks.

While it is difficult to provide an objective analysis without ground-truth information, visually, it would appear that our method allows for reasonable tracking even in situations where objects touch and occlude each other or when new objects are introcude into the frame, given that the dynamics of the object are reasonable - this is plainly observed in the bats dataset where the movement of the bats, while sometimes erratic, is mostly regular. However, we still face issues when dealing with sudden large shifts in object dynamics - as observed in the cells dataset.

With more tuning of the filter gains and biases, it may be possible to achieve better tracking. Our methodolgy is sound however and is flexible to tuning efforts, should they be taken.


Credits and Bibliography

The following websites were accessed for reference