Object Shape Analysis and Segmentation

CS 585 HW 3
Jithvan Ariyaratne & Nam Pham | Assignment Directory
Date: 2/24/2020

Problem Definition

The goal of this assignment is to design and implement algorithms that delineate objects in video images and analyze their shapes.

Problem 1: Binary Image Analysis

  1. Implement a connected component labeling algorithm and apply it to the data below. Show your results by displaying the detected components/objects with different colors. The recursive connected component labeling algorithm sometimes does not work properly if the image regions are large. See below for an alternate variant of this algorithm. If you have trouble implementing connected component labeling, you may use an OpenCV library function, but you will be assessed a small 5 point penalty.
  2. If your output contains a large number of components, apply a technique to reduce the number of components, e.g., filtering by component size, or erosion.
  3. Implement the boundary following algorithm and apply it for the relevant regions/objects.
  4. For each relevant region/object, compute the area, orientation, and circularity (Emin/Emax). Also, identify and count the boundary pixels of each region, and compute compactness, the ratio of the area to the perimeter.
  5. Implement a skeleton finding algorithm. Apply it to the relevant regions/objects.

Problem 2: Segmentation

Dataset 1: Piano In this sequence of a person playing the piano, try to develop an algorithm to identify the pianist's hands. Portions of the hands are sometimes in deep shadow, which creates a challenging imaging situation.

Dataset 2: Bats In the following sequence of bats in flight, the bats appear bright against a dark sky. We have provided both gray scale and false color images from this sequence; you may use whichever images you prefer. Can you use characteristics of the connected components you find in order to determine if the bats have their wings spread or folded? Can you identify regions contain multiple bats?

Dataset 3: Pedestrian In the following sequence of people walking on street, try to develop an algorithm to count the number of people in each frame. As a challenge, can you track each person respectively?

Method and Implementation

Problem 1: Binary Image Analysis

Connected component detection:

We implemented the sequential labeling algorithm. We perform 2 passes through the images to label all the connected components. We also implemented the Union Find data structure to help with this algorithm

Boundary Following algorithm:

We implemented the boundary following algorithm as specified in lecture. We use the color code from part 1 to identify the interesting regions, and then follow that region using the algorithm

Skeleton Detection algorithm:

We implemented topological skeleton algorithm. We keep eroding the image until we acquire the needed skeleton

Problem 2: Segmentation

Dataset 1: Piano

Crop frame area to only contain keyboard.

cropped = resized[top_y: top_y + rect_height, top_x:top_x + rect_width]
Apply BGR skin detection to detect the hand (also keyboard) and use Canny edge detection algorithm to apply the edges to separate the hand from the keyboard

  # apply skin detection
  skin_detection = skinDetectionBGR(cropped)
  # edge detection
  edges = cv2.Canny(cropped,100,200)
  reverse_edge = cv2.bitwise_not(edges)
  # apply edges to hand
  edged_hand = cv2.bitwise_and(reverse_edge, skin_detection)
Apply YCrCb skin detection to get only the hand. Perform an open to remove noises, and then use the image as a filter to get the hand regions.

  # use YCrCb skin detection to identify hand regions
  skin2 = skinDetectionYCrCb(cropped)
  noiseless_skin = cv2.morphologyEx(skin2, cv2.MORPH_OPEN, open_kernel)
  dilated_skin = cv2.dilate(noiseless_skin, dilate_kernel, iterations=1)

  # use YCrCb as a filter
  final = cv2.bitwise_and(dilated_skin, edged_hand)
We then get contours from this final image and apply it to the original image.

  # find contours
  contours, hierarchy = cv2.findContours(final, cv2.RETR_LIST, cv2.CHAIN_APPROX_SIMPLE)
  cv2.drawContours(resized, contours, -1, (0, 255, 0), 1, offset=top_left)

Dataset 2: Bats

Create an adaptive threshold to get the larger bats

    adaptive_threshold = cv2.adaptiveThreshold(resized, 255, cv2.ADAPTIVE_THRESH_MEAN_C, cv2.THRESH_BINARY, 11, -5)
    kernel = cv2.getStructuringElement(cv2.MORPH_ELLIPSE, (3, 3))
    adaptive_threshold = cv2.morphologyEx(adaptive_threshold, cv2.MORPH_OPEN, kernel)
Detect motions to get the bats too small to detect with adaptive threshold

    non_dup = non_dup_diff(resized, resized_prev)
    ret,thresh_frame_diff = cv2.threshold(non_dup, 5, 255, cv2.THRESH_BINARY)
Final binary: OR of motion and adaptive threshold to get the most coherent image of the bats.

final = cv2.bitwise_or(adaptive_threshold, thresh_frame_diff)
Process final image

    colored, code_map, top_codes = seqLabel(final)
    top_5_contours = findBoundary(code_map, top_codes)
Dealing with each largest bats -> detect fold or open, we used contour circularity concepts, where the bats wings would be considered to be folded when circularity was greatest. To detect regions with multiple bats we had to implement a loop that would check every pixel in the frame to determine where we had the biggest variable pixel density.

Dataset 3: Pedestrian

We first find the abolute difference between a frame with no peope and frame x. We then make this new frame grayscale. We then appy a gaussian blur. We then create a threshold function. We then dilate the frame and detect contours of the dilated frame. Finally create a counter to count the number of tracking rectangles in the image.

while True:
    diff = cv2.absdiff(og_img, img)
    gray = cv2.cvtColor(diff, cv2.COLOR_BGR2GRAY)
    blur = cv2.GaussianBlur(gray, (5,5), 0)
    _, thresh = cv2.threshold(blur, 40, 255, cv2.THRESH_BINARY)
    kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (10, 10))
    dilated = cv2.dilate(thresh, kernel, iterations=0)
    contours, _ = cv2.findContours(dilated, cv2.RETR_TREE, cv2.CHAIN_APPROX_SIMPLE)
    count = 0


Problem 1: Binary Image Analysis

We tried implementing flood fill, but it seems much less efficient than sequential labeling (much more loops). We also tried to implement the skeleton detection algorithm as described in lecture, but it seems to take a long time finding the distance to the background.

Problem 2: Segmentation

Dataset 1: Piano Dataset 2: Bats Dataset 3: Pedestrian
We tried subtracting the hand from a single frame of a keyboard where there was no hand but that still picked up key shadows. We also implemented a trackbar to position the target area so we can efficiently use skin detection.
We tried different parameters for motion detection but some bats were too far away to detect effectively.
We tried using only motion detection but that wasn't as effective as using a background with no people. We even tried edge detection to mask viable contours because some people wwoud be extreemely close to each other.


Problem 1: Binary Image Analysis

open-bw-full open-bw-partial open_fist-bw tumor-fold
No Second Skeleton

Problem 2: Segmentation

Dataset 1: Piano Dataset 2: Bats Dataset 3: Pedestrian


  1. For the piano problem, we used skin detection (both on BGR and YCrCb to detect the hands). We then use one as a filter to get the hand from another binary image. For the bats problem, we used adaptive thresholding and motion detection to detect the bats. For the people problem, we used subtraction from background image to detect people.
  2. We used the iterative algorithm sequential labeling to find the connected components
  3. A lot of the times after segmentation we need to perform an opening to remove noises from the image. Sometimes we also need to perform a closing to connect fragmented components after detection. We also filter out smaller components and only process top 5 largest components.
  4. We use area to differentiate real objects from noises, and we also use circularity to detect the states of bat wings.
  5. A great lesson we learnt is how to use information from multiple techniques to advise our final decision. In this assignment a lot of the time we use bitwise_and or bitwise_or to combine information from binary images of several techniques to help us find the solution. Had we got more time, we would try to solve the people counting problem more exhaustively (detecting when people merge together) and how to detect hands more reliably for the piano problem.


We learnt that different image detection techniques had to be used for different scenarios and were able to use multiple solutions to solve the problems.

Credits and Bibliography

CS 585: Lecture 3-6 (1/28/2020 - 2/6/2020)
CS 585: Lab 2-5 (1/31/2020 - 2/21/2020)