The primary goal of this assignment was to design and implement algorithms to delineate objects in video images and analyze their shapes.
Method and Implementation
Part 1: Segmentation
Bats: Three methods of segmentation were explored in processing the bats dataset. The first method attempted was a single value percentile threshold method, which given a histogram of a grayscale image, will include the highest intensity value bins until the desired percentile of pixels in the image are included. All pixels which are included in these bins are set to white, and all others are set to black. The second method attempted was an adaptive variation of the percentile threshold method, which varies the target percentile linearly across a given range with changes in the y-axis. The final segmentation method explored included a heavier manual component. After observing that the background of the dataset remained constant throughout the video, we used GIMP, an image editor, to manually isolate the background of the image. A background subtraction of the isolated from each frame followed by a simple threshold resulted in a clean segmentation.
void applyThreshold(Mat& src, Mat& dst, int threshold);
void percentileThreshold(Mat& src, Mat& dst, vector
void adaptiveLinearPercentileThreshold(Mat& src, Mat& dst, vector
Cells: Following on from the work from the bats segmentation, we first attempted segmentation of the cells using a similar percentile based threshold method. The cells have distinct higher intensity border regions which surround darker interiors. As a result, we decided to combine both an upper and lower percentile threshold into a single image in an attempt to isolate both the inner and outer regions of each cell together. Continuing with the attempt to capture both the inner and outer regions, we implemented region growing for uncertain lower percentile regions based on the locations of regions isolated with certain upper percentile thresholds.
void regionGrowingLowerPercentileThreshold(Mat& src, Mat& dst, vector
void applyThreshold(Mat& src, Mat& dst, int threshold, bool above);
void applyUpperAndLowerPercentileThreshold( Mat& src, Mat& dst, vector
void applyLowerPercentileThreshold( Mat& src, Mat& dst, vector
void applyUpperPercentileThreshold( Mat& src, Mat& dst, vector
Fish: The basic segmentation algorithm used was variable thresholding. Different thresholds were applied to different regions of the image to account for a distinct difference in the lighting on the right and left sides of the image. These thresholds/regions were determined heuristically rather than as a continuous function in width or length because the changes in lighting were very abrupt in the different regions. Before thresholding, any pixels with a substantially larger green component or a too-light teal color were removed. This was in order to account for some of the brighter elements of the background, especially the green plant and teal rocks. In addition, background subtraction was used to remove other known background elements with coloring more similar to the fish. The background image was created outside of the image processing data, as many of the fish were moving too slowly to avoid becoming part of a historically-aggregated mask. Finally, the segmented image was eroded using a rectangular filter to remove single-pixel segments without removing ones that were long but not tall. The image was then dilated again to restore the shape of the fish.
Part 2: Connected Component Labeling
void colorize(Mat& in, Mat& out)
void make_sticky(Mat& prev, Mat& curr, Mat& out)
void myFindContours(Mat& input, Mat& output)
To implement connected component labeling, we used the iterative algorithm discussed in class (the one that was diagrammed as a flowchart in the in-class handout). To merge labels, we implemented an algorithm that picks an arbitrary pair of labels known to be the same, and then greedily attempts to grow the equivalence class until it cannot any longer. It then picks another pair, repeating until every label is a member of an equivalence class.
The algorithm we implemented has a somewhat different output format than the OpenCV stock algorithm, as we output a 1-channel, 32 bit matrix, in which the intensity of a given pixel represents what connected component it is a member of.
In order to achieve persistance between frames of video in our component labeling, we also implemented the make_sticky function. This function takes in the result of finding contours on the previous frame, and the result of doing so on the current frame, and attempts to determine which contours in the current frame correspond to contours from the previous frame. We accomplish this by comparing the contents of the two contour frames, finding areas that intersect. For any such intersection, we assume that the component from the previous frame is the same component as the component it intersects with in the current frame, and copy forward the label.
Finally, for easy visualization, we implemented the colorize function, which essentially hashes a component ID number onto a consistent place in the RGB color space by setting each channel equal to (component_id * some large number) mod 255, with different large numbers for each channel.
Part 3: Boundary Detection and Object Characterization
Boundary Detection: Boundary detection was implemented using the algorithm discussed in class. After finding a start point, the point immediately to the left was chosen as the first "outside" pixel. Starting with this pixel, all the N8 neighbors of the starting point were searched until another pixel of the desired value was found. The previous pixel was the new outside pixel. This process was repeated until the algorithm returned to the start point. The algorithm was repeated with every segment label in order to find the boundaries for all objects.
myGetBoundaries(Mat& src, unordered_map
Characterization: Primarily based on the first and second image moments, we implemented several characterization functions which calculate the circularity, orientation, area, and compactness of each segmented region. These metrics, combined with prior knowledge of the datasets, allow us to make simple determinations of the types of objects we segment. Specifically, we used circularity with a threshold of 0.1 to roughly determine when the bats wings were spread or folded.
double circularity(Mat& src, int obj_id);
double orientation(Mat& src, int obj_id);
double compactness(Mat& src, int obj_id);
Point centroid(Mat& src, int obj_id);
int area(Mat& src, int obj_id);
FishThe algorithm used to segment the fish was very successful in areas with a "quiet" background, but produced considerable noise when there were lots of background objects such as plants. Opening the images helped to remove some of the noise, but especially large, bright background objects remained in the segmented images. Opening the images also led to the eroding of the smaller fish, making it difficult to identify them. The odd sizes of the background objects that remained produced a wide enough variation in circularity and orientation that it was not really possible to identify the fish based on low circularity, as we thought would be possible. A better method might have simply been to divide the image into more regions to make the thresholding more granular. For example, in the lower left corner of the images where all the fish were bright blue, a higher threshold would have eliminated much of the noise from the leaves of the plant. It would be interesting to try this with more images. Presumably, over a longer period of time the fish would move more, allowing us to create a background mask using historical data.
Connected ComponentsThis algorithm has several limitations; when two connected components collide and then seperate, only one of the labels will be preserved, with the other component recieving an entirely new label. Similarly, moving objects that break into disconnected regions and then reform due to limitations in our thresholding will exhibit unstable labeling behavior.
The strengths of our methodology relied heavily on prior knowledge of the dataset. In both the fish and bat datasets we leveraged the fact that the background remained static across frames to handle a large part of the segmentation. Though this method was largely effective, our automated approaches did show some limitations. For example, we remained unable to completely merge the interior and exterior of the cells, and thus were prevented from identifying true connected components which a human observer would label as single objects.
Given more time, we believe that our characterization methodology could be greatly improved. Though we implemented useful functions, including circularity and compactness, our final results do not use these metrics to their full potential in characterizing the segmented objects. Additionally, more could be done to reduce the manual component of the segmentation by automatically sampling known background regions to create the base background subtraction mask.
Our main message from this assignment is that segmentation is a difficult problem that greatly varies in its challenges across different datasets.
Credits and Bibliography