CS585 Assignment 3

CS 585 HW 3
E Chengyuan
teammate: Li Jun
Feb 24 2020


Problem Definition: Object Shape Analysis and Segmentation

The goal of this assignment is to design and implement algorithms that delineate objects in video images and analyze their shapes.

Problem 1: Binary Image Analysis

Problem Definition

The data for this part are binary images of hands and a stained tissue sample of a breast cancer tumor.

  1. Implement a connected component labeling algorithm and apply it to the data below. Show your results by displaying the detected components/objects with different colors. The recursive connected component labeling algorithm sometimes does not work properly if the image regions are large.
  2. Implement the boundary following algorithm and apply it for the relevant regions/objects.
  3. For each relevant region/object, compute the area, orientation, and circularity (Emin/Emax). Also, identify and count the boundary pixels of each region, and compute compactness, the ratio of the area to the perimeter.
  4. Implement a skeleton finding algorithm. Apply it to the relevant regions/objects.


Problem 2 Segmentation

Problem Definition

There are 3 parts to this assignment:

1) In the first part of the assignment we are given frames of a person playing the piano. Our task is to segment the left and right hand from each frame.

2) In the second part, we are given the frames of a video of bats flying through the sky, the bats appear bright against a dark sky. We need to identify the bats (drawing bounding boxes on them). We have both gray scale and false color images from this sequence, I choose to use gray scale images.

3) Then we detect pedestrian for a sequence of frames. In the following sequence of people walking on street, we develop an algorithm to count the number of people in each frame.

For each of these sections, we create visual representations of our results and output as video.


Method and Implementation for Problem 1

Problem 1: Binary Image Analysis
1. Implemented the StackConnectedComponents(label) method. If an image region is very large, the number of recursive calls can exceed the stack depth allowed on the computer, causing the program to crash. It is possible to avoid this problem by using a stack data structure explicitly.
2.The labeled output of 'tumor_fold.png' contains a large number of components. So I did a erosion operation to reduce the number of labels.
3.Implemented the boundary following algorithm (Moore neighborhood border tracing alg), and applied it to the labeled images.
4. Took advantage of contours found for each object and used several library method to compute the area, orientation, circularity, boundary pixel numbers, and compactness for them.
5. Implemented a skeleton finding algorithm (Morphological skeleton) as follows:

Method and Implementation for Problem 2

Part 1 - Segmentation of Hands

  1. Pre-processing
  2. Firstly, we load all the image frames into a list. Then, we calculate the average pixel values for all the frames and store it as a new average imag.

  3. Segmentation
  4. 2 steps in the segmentation part: 1) compute the differences between image frames and the average image to get the movement energy in each frame. This move allows us to get rid of the static regions. Since the scene in the images is static, we used absdiff() to get the difference between each images and the average of the images. Then Convert the difference to gray scale, give threshold to it and convert it to binary image. Use binary image to prepare for the ROI of the image to make the segmentation easier. 2) Apply the skin color detection . Then we pick the top 3 largest blobs(left hand, right hand and the head) these regions are significantly larger than the noise blobs.

  5. Visual representation
  6. Bounding boxes are drawn with the labels of the hand(left or right).


Part 2 - Detection of Pedestrian

We use HOG person detector. The HOG person detector uses a sliding detection window which is moved around the image. At each position of the detector window, a HOG descriptor is computed for the detection window. This descriptor is then shown to the trained SVM, which classifies it as either “person” or “not a person”. To recognize persons at different scales, the image is subsampled to multiple sizes. Each of these subsampled images is searched. Detect people using hog.detectMultiScale function, the original setting: winStride=(4, 4),padding=(8, 8), scale=1.1, finally, we change the winStride=(2, 2) because on preivous setting, some small people on the frame cannot be detected. Then (2, 2) does much better on detecting small pedestrians. While the HOG method tends to be accurate , it still requires that the parameters to detectMultiScale be set properly. We apply non-maxima suppression to the bounding boxes If the overlap ratio is greater than the threshold , then the two bounding boxes sufficiently overlap and we can thus suppress the current bounding box.

Part 3 - Detection of Bats


Procedures as follows:

1.ensure the input image changes to grayscale
2. Gaussian blur the image to have less noise
3.threshold the image to reveal light regions in the blurred image
4. perform a series of erosions and dilations to remove any small blobs of noise from the thresholded image .But later we delete this part because the bats it detect drops dramatically because of this part.
5. perform a connected component analysis on the thresholded image, then initialize a mask to store only the "large" components neighbors=8, background=8
6. loop over the unique components, if this is the background label, ignore it; otherwise, construct the label mask and count the number of pixels
7. find the contours in the mask, then sort them from left to right
8. draw the bounding boxes
9. visualization


Results

Results of Binary Image Analysis

Results of hands segmentation

Results of pedestrian detection

Results of bats detection


Discussion

For the Binary Analysis

Strength and Weakness: The connected component labeling method, Moore-Neighbor Tracing method, skeletonizing method worked well on the binary images.
The operation of erosion effectively reduced the number of components.
The circularity values show how the object look like a circle. For example, Circularity(fist) = 0.8964601250403593 and Circularity(hand) = 0.7861079583218299 could prove this property.
The boundary following algorithm relies heavily on the start point choosing, which have to be done by human.
Future work: Have to solve the start point choosing problem for boundary following method.

For the Hand Segmentation

Overall, the performace is really well, no need to have obvious correction or fix. We detected the positions of hand in all image frames provided.

For the Bat Detection

The bat detection task is historically a hard one. The algorithm did well with some space to improve, Such as some dim and dark bats in the background. Also we still need to improve the algorithm to let it identify whether the bats are opening their wings or not.

For the Pedestrian Detection

The algorithm does a nice and satisfying job. The "small" people in the back are also detected, but not in all the cases. Also the redundance of the bounding boxes are accepetable but can be perfected so that there's no non-neccesary overlapping bounding boxes.


Conclusions

Connected component labeling method is very useful to label objects in binary images, and it works better with morphology operations like erosion and dilation. In order to compute the orientation line, area, perimeter, and some other useful values, we need to figure out the contour points for each object first.
We successfully recognized and visualized the hand positions of the pianist , detected bats in the night sky, and detected pedestrians in the street with a resonably high accuracy. Visualization as images and videos are also provided. The overal assignmnet is a success.


Credits and Bibliography

https://docs.opencv.org/master/d6/d00/tutorial_py_root.html
CS 585 Lab
https://yongyuan.name/blog/pedestrian-detection-opencv.html

Collabrated with my teammate: Li Jun.