Object Shape Analysis and Segmentation

CS 585 HW 3
Aaron Jacob Varghese
Shivam Satwah
02/24/2020


Binary Image Analysis

Problem Definition

Given a binary image (e.g. hands or tumor images), we try to find connected components and label each object. We try to detect boundary and skeleton of an object. We also calculate object area, orientation, circularity, and compactness for each object.

Method and Implementation

  1. Connected Component Labeling: We used the Sequential Labelling algorithm to identify and label connected components. The rules are as described in class. The function returns a matrix with labels as component pixel values. Label 0 for background. If the object has noise, as in the 3rd hand image file, the image is eroded to filter it out.

  2. After connected component labeling, we can get objects in the binary image. Area will be the number of pixels. We filter out objects depending on a parameter. For the test images, it is half the area of the largest component.

  3. The image is padded with size 1px in order to be able to progress with the boundary following algorithm discussed in class. We implemented the boundary following algorithm on the labels matrix and specify a target label, so it finds the target component and proceeds to follow the boundary using Moore's method. The border contours are displayed on the labelled image.

  4. After labelling all the objects from sequential connected component labeling, the countours of the components are passed to a propery function to compute its orientation by computing Emin and Emax by the formula given here:

    $$Emin = \frac{a+c}{2} - \frac{a-c}{2}\left(\frac{a-c}{\sqrt{(a-c)^2 + b^2}}\right) - \frac{b}{2}\left(\frac{b}{\sqrt{(a-c)^2 + b^2}}\right)$$ $$Emax = \frac{a+c}{2} + \frac{a-c}{2}\left(\frac{a-c}{\sqrt{(a-c)^2 + b^2}}\right) + \frac{b}{2}\left(\frac{b}{\sqrt{(a-c)^2 + b^2}}\right)$$

    Then the cirularity is computed by Emin/Emax. For compactness, the perimeter is calculated using the size of the contour list, and area by the number of labelled pixels in the matrix. Then the compactness is computed by: $$Compactness = \frac{Perimeter^2}{Area}.$$

  5. Skeletonization is done using morphological operators. This involves repeated erosion of the image until no longer possible.

Experiments and Results

We tested our implementation on the four images on the website. Our implementations on Part1 are successfuly wokring the for images as shown in the below Table.

Examples Source Labeling & Boundary Skeleton
Examples 1
Examples 2
Examples 3
Examples 4

For area, orientation, Compactness and circularity, we will only report the one for the object in the second example, as for other examples, which contain multiple objects, these stats would be too tedious to show.

Area: 37322

Orientation: -0.1533

Circularity: 0.5029

Compactness: 58.056

Discussion

For preprocessing, we use dilation and erosion to remove noises and filling holes in an object. Our border and skeleton algorithms become slow when there are too many objects in the image. So, we filter smaller objects depending on the primary component.


Segmentation

Problem Definition

We are required to segment objects from frames of videos and perform different analysis depending on the video.

Method and Implementation

  1. For the piano dataset, we first run the skin detection code to find only the hands and other objects of the same color in the frames. We can then take the mean of all frames and keep the resultant average image. We subtract this mean image from all the required frames to detect the hands of the pianist that have moved. We use absolute threshold to convert this into binary image and use the component labelling algorithm to find the two biggest components. We also tried adaptive thresholding but were not able to do it successfully.

  2. For the bat dataset, we follow a similar approach of subtracting the mean of all images from the required frame and get the absolute threshold. This generates the required segmentation of bats in a frame. The interesting problem is to find if the wings are spread or closed. We find the centroid of the object in question and check if the centroid lies within its contour. If the centroid lies outside, the wings are folded and when the centroid is within the wings are spread out. This is denoted on the image as "--" (spread) and "/\" (closed).

  3. Pedestrian detection - For this video, we first created a mean image from all the frames. This would remove all the humans from the images (to a reasonable extent). Then, we iterate through each frame and subtract the mean image from it, extracting just the humans. Then we ran both absolute thresholding and adaptive thresholding separately. Adaptive thresholding gave better results than absolute, which makes sense. There was a lot of occlusion from the sign post in the center of the video. Also, two humans stayed in the same position for a long time, resulting in them affecting the mean image.

Experiments and Results

Examples Processed
Piano
Bat
Pedestrian

Discussion

In the piano dataset, the color and brightness of the piano keys are similar to the skin color from the hand. It became a challenging task to get the segmentation working for hand detection. We incorporated multiple methods and finally be able to get the hand seperated from the background(piano keys) using a different skin detection technique.

In the bat dataset, some bats are extemely small which makes it difficult to correctly find the orientation of the wings(open or closed) using the approach we took (centroid method)

In the pedestrian dataset, there was a lot of occlusion from the sign post in the center of the video. Also, two humans stayed in the same position for a long time, resulting in them affecting the mean image. We tried other segmentation methods like adaptive thresholding which did not yield correct results for us here.


Conclussion

We were able to implement all the required algorithms and developed a deeper understanding of how and why they work.

For piano data set, the model we created is able to segment hands out of the background efficiently with good results.

For the bat data set, we detected most of bats with absolute thresholding. Our approach to find orientation of wings yeilds good results also.

For the pedestrian data set we were unable to generate good results. But it seems like a good start towards the correct way to achieve pedestrian segmentation tracking.


Credits and Bibliography