CS 585 Homework 3
Yize Xie U14485891
- Implement a connected component labeling algorithm.
- Apply a technique to reduce the number of components.
- Implement the boundary following algorithm.
- For each relevant region/object, compute the area, orientation, circularity (Emin/Emax), and compactness.
- Implement a skeleton finding algorithm.
- Implement two segmentation algorithms.
- Apply the tools you implemented for part 1 to analyze your data.
Method and Implementation
Connected Component Labeling
We use Deep First Search(DFS) algorithm for connected component labeling, and we implement it by using stack.
- ensure image is binarized.
- if stack is empty goto 3, if not empty, goto 4.
- find an unprocessed pixel and push it to stack, and mark it as a new component. If all of pixels have been processed, terminate.
- pop the front element of stack, push neighbors of it into stack (if not processed), and mark them the same as the front element.
- go back to 2.
If the component we find has too low area, more specifically, lower than a threshold we set, the component will be discarded.
We implement Moore-Neighbor Tracing with Jacob's stopping criterion, which is very good for boundary detection on 8-connectivity components. In the algorithm, we keep track with an inner pixel and an outer pixel. The pointers move clockwise until they are back to beginning in a same manner.
Our skeleton finding algorithm belongs to thinning. We erode the image gradually with a cross kernel, and what remains is exactly the skeleton we need to find.
Experiments and Results
Below is sample of Fist and Palm.
As is shown in the above images, our algorithm detects component, boundary and skeleton successfully. For orientation, it is degree anti-clockwise from the axis "top to bottom". Reasonably, fist has a relatively higher circularity than palm.
White balance: We want to conduct skin detection later, but the orange light in the hall will make everything look like skin. Therefore, firstly we need to white balance images so that everything will look naturally.
Compute background: Identifying what is moving will clean the image to a great extend. Therefore, we compute the background by combining part of two frames. This should be acceptable in real world scenario because we can take a photo before musician comes.
Get difference: For each frame, calculate the difference from background and here we conduct segmentation using absolute threshold algorithm.
Recover color: Thresholded image is gray scaled, but what we need to do skin detection should be colored. We use the binarized image as a mask on original one to get color.
Skin detection: Skin detection is based on HSV value, and in order to deal with shadow, we detect an extra one with brightness augmented. Morphology OPEN and CLOSE are used here to reduce noise. The circularity of hands is typically from 0.2 to 0.7.
Identify two hands: The last step is to classify which one is top hand and which one is bottom. The trick here is that two hands should take similar space in expectation.
- Compute background: On the gray scale image, there are many noise in the bottom part. But pattern here is that bats are always brighter than background. So we simply calculate min value of each frame as the background of this dataset. Then we
- Get difference: We use absolute difference to extract bats from original images.
- Absolute threshold: Absolute threshold here is enough to extract bats from small noise and have a good result.
- Adaptive threshold: Absolute threshold is used to detect heads of bats (extracting them from wings). It is useful because later we need to decide whether there are multiple bats in a single component.
- CCL: After reducing noise with OPEN and DILATION, we conduct connected component labeling. Each component represents a bat.
- Process exception: The exception occurs when multiple bats overlap. Our assumption here is that in the heat map, the head of each bat has greater brightness than other part. So if we detect multiple heads in the same component, we will draw thicker rectangles to represent multiple bats.
- Calculate circularity: if the circularity is higher, the rectangle will be whiter and the bat is more likely to hold its wings. On the contrary, if the circularity is lower, the bat is more likely to expand its hands.
- Compute background: In this scenario, we expect the background image to be the averaged frame. However, due to two people chatting in the middle for long time, the average value is not accurate here, so we use some part of first frame to supplement middle part. It should be acceptable in real time because if we have more data, the averaged frame should be more accurate.
- Get difference: We use absolute difference to extract pedestrians from original images. However, there are too much noise here.
- Segmentation: We use adaptive threshold here to extract humans. Although we already selected best parameter, morphology like OPEN and CLOSE are still conducted to reduce noise.
- CCL: This image, although noise still in it, is best we can get before connected component labeling. We conduct CCL and afterwards, filter components that are too small to be a human. Regions with width > 15 and height > 37 are typically kept by us, because they look like human.
- Object tracking: We have simple but workable object tracking algorithm here (without any OpenCV object tracking function). That is we assume the component and the component detected in the previous frame which are close are from same object. We mark same object with same color.
We get very good result on every dataset. But if we have more time, we should spend some on trying to detect human behind obstacles, or extract humans from each other when they are close.
We are very happy to find that our bat detection algorithm can detect some bats that is really hard for human to observe.
We learn really a lot from this homework.
Credits and Bibliography
My teammates are Wenxing Liu and Weifan Chen.