Groupmates: Megan Van Welie, Rebecca Graber
This assignment required us to identify, segment, and track eels and crabs moving around a tank, and to use this tracking data in order to draw conclusions about the relative activity level, and possibly even health of the eels and crabs between video files. If such data could be reliably extracted, it would eliminate a tedious (and subjective) task from the workflows of researchers interested in animal behavior over a large corpus of video data.
Unfortunately, we were not able to generate highly reliable data, as there were many challenges to overcome in the source data, and even using a fairly wide range of techniques (discussed below), we still ran into serious issues with both false-positives (noise, ripples in the water, etc), and false-negatives (portions of the eels bodies were simply too hard to differentiate from the tank).
Our tracking workflow began by using template matching to locate the tank in the image, and slice away all other, superfluous data from the video frames; this worked well in the two test videos, and we expect it to continue to work fairly well for any similar setups, but possibly will require some amount of tuning for substantially different camera setups, or cases where the tank has been rotated.
Next, we performed background subtraction, using a rolling 10-frame median background image (we tested various frame ranges and mean versus median, and found that a 10-frame median worked best). This resulted in a fairly useful indicator for where eel activity was occurring, but was less useful for the crabs, which tend to move very slowly.
Even in the case of the eels, however, we found that this background subtraction was insufficient to achieve reliably contiguous segmentation of the eels, owing to the fact that under certain lighting conditions, the eels very closely resemble the back of the tank. In an attempt to improve our detection, we added frame-differencing motion detection data to the binary image fed into our segmentation algorithm. This helped somewhat, but failed to entirely fill in the gaps.
We implemented a generic entity tracking algorithm that takes in a vector of State objects representing the objects seen in the previous frame, and a vector of State objects representing the objects seen in the current frame and a distance function that compares two state objects and outputs a floating point distance value. Currently, State objects just contain some basic stateless object data, like centroid, area, and pixel mask, but our implementation is relatively easy to extend, as more fields can simply be added to the State class to support new comparison operations.
The algorithm outputs the best matching pairs of states, continuing until every state object in the second input set has a match. Intuitively, this serves to insure that every object we see in the current frame gets matched with an ancestor object from the previous frame which has minimal distance from it, as defined by the distance function. In this case, our distance function was to take the percentage of intersecting pixels between the two objects (reasoning that eels do not move quickly enough to have nonzero intersection with themselves within a single frame).
We found this produced more reliable results than centroid distance comparisons, which seemed to be very noisy. Note that it is entirely possible for an object from the previous frame to pass its ID on to two or more children in the next frame, assuming it is the best distance match to all of them; this is not a bug, but rather by design. We chose to allow for this behavior because we noticed that, in spite of our best efforts, eels would often be detected as multiple nearby fragments of non-background rather than as a single contiguous non-background entity due to their high degree of similarity in appearance to the back of the tank.
In a similar vein, we also attempted to minimize detection of a single eel as multiple objects by assigning the same ID to sufficiently nearby segments-- we implemented this functionality by using the entity tracking algorithm again, this time feeding it the same frame twice, and searching for segments that had low centroid-distance matches to segments other than themselves. These segments were then assigned the same ID.
Given the level of false negatives and false positives in the data, extracting fine-grained activity information from the video data proved extremely difficult; however, we still attempted to do so using the following techniques:
Centroid measurement over time: this method was incredibly noisy on the eels, as their centroids tended to move radically between frames, owing to their wavy, oblong shape. However, this method did show promise for the crabs.
Skeletonization: this method showed some promise for the eels when we were able to obtain an unfragmented segment, as it yielded a simpler representation of the eel's bodily position, which was somewhat less subject to noise, and therefore useful for estimating head, centroid, and tail positions.
We tried to use the following other methods in the course of the project, but eventually abandoned them due to poor results:
Flood fill: we attempted to use flood fill on the original image data, seeded by our background-subtracted mask in order to fill in gaps eel segments, but were unable to find any combination of sensitivity parameters that produced an appreciable benefit.
Remembering static objects: We attempted to keep track of objects that we were interested in (ie, had been detected as eels/crabs before), but had stopped moving, and therefore become part of what our algorithm treated as the background for subtraction purposes. We tried to accomplish this by comparing the previous and current frame color values for locations that had contained objects in the previous frame, but hadn't been detected as having anything in them in the current frame. This simply didn't work in most cases, and had the side effect of leaving "ghost objects" in places where an eel that was very close in color to the tank wall had passed, as the algorithm was unable to tell whether the eel had moved on.
Ultimately, the data we harvested was far too noisy to be useful in characterizing any fine-grained animal activity; however, we were able to generate sets of frame intervals that we beleived (based on large amounts of activity detected) the animals were particularly active during. It is possible that these metrics could even be used for very rough activity characterization, as it is entirely possible that certain environmental conditions impact frequency and duration of activity for eels and crabs.