CS585 Fall of 2018, Assignment 3

Assignment 3

YIDA XIN

Part 1. Paper-and-pencil Assignment: Morphology, Hausdorff Distance
Please see the in-class paper submission. Here is the PDF version.

Part 2. Programming Assignment
[Acknowledgment of collaboration with Dina Bashkirova and Vitali Petsiuk]
Note that, in the description in the paragraphs below, all capital-lettered names are provided by me (Yida Xin) only, meaning that my teammates may well have named the same tasks by different names. I will try to point out those differences in naming that I am aware of, nevertheless:

At the highest level, we would like to accomplish the task of Hand Tracking. To do so, we have broken it down into two subtasks, namely, Hand Detection Again Noisy Background and Hand-Gesture Recognition (a.k.a. Hand Classification). It was later on that I personally came to realize that these two tasks could be carried out simultaneously via Template Matching, because as long as each “hand” to be detected carries a gesture with it and as long as we specify a different template for each such gesture, we can write a program that carries out the detection of a hand and the recognition of a hand gesture simultaneously. This allows us to focus on only the Hand Detection subtask.

To detect the hand against its noisy background in a video sequence, we thought of, and I made an attempt to, using the technique of Frame-to-frame Differencing. In my brief implementation (so far), one can clearly see that I preserve the pixel-wise absolute differences of all the pixels of the current frame and all the pixels of the previous frame. This yields a mapping that is of the same shape as both frames in comparison. This mapping is in gray scale. In theory, if there is no noise, this mapping captures the motion of the moving objects if those moving objects are moving rather fast; this mapping captures the contours of the objects that we would to recognize, because slight movements do not change the “insides” of the objects very much but do reveal the “boundaries” of these objects.

Consequently, this real-time Hand Detection task is further divided into three subtasks. Namely, at any given time: (1) capture two consecutive video frames, in real time, and convert them both to their gray-scale counterparts; (2) compute the difference-frame of these two frames, which is in gray scale, and then convert this gray-scale difference-frame into its binary counterpart; (3) match this binary difference-frame against the library of existing templates, where each such template is a binary image as well. Note that whenever we try to hold our hand steady next to our head against the camera, the binary difference-frames, in theory, are supposed to look like the contours of all the moving objects, again because only the “boundaries” of those moving objects are supposed to be captured. Also note that the templates themselves are also, in theory, contours in binary scale.

For myself, see my code for a brief implementation of (1) and (2). The implementation has not been able to work well, however, because the camera in real time captures too much noise and the frames are therefore noisy. I also observe that many contours are present that are not supposed to be picked up, e.g. the contours of the tiles of the ceiling, the contours of my eyebrows and my nose, etc. These should also be regarded as part of “noise.” Consequently, some de-noising algorithm need to be implemented, either as a separate subroutine or as a direct consequence of applying one or several forms of Morphology. For (3), algorithms have been implemented that detect skin-colored objects, detect the hand in each template image via skin-color detection, and extract the contour of the hand in each template. Once again, all these contours are binary images, where the contour itself is made of white pixels and all else is made of black pixels.

Some examples of the visual results that we have been able to obtain so far are displayed below:

Examples of original hand-template images
Examples of binary hand-template images
Examples of binary hand-template contours

The final (so far) version of the hand-tracking system that we have so far is able to yield the performance below:

The numbers displayed are Normalized Correlation Coefficients. Another version, not shown here, implements Normalized Cross-correlations.