# Assignment One

For the purposes of this assignment, I designed a fast and efficient algorithm for converting an rgb color image to a greyscale image. The algorithm applied the following formula to each pixel to perform the conversion.

dst_pixel = (src_pixel[0] >> 2) + (src_pixel[1] >> 1) + (src_pixel[2] >> 3);

The above algorithm is much faster then a typical algorithm that assigns weights to the pixel values and miltiples them since we are only shifting the bits.

The output looked as follows:

Input Image Output Image

# Assignment Two

## Problem Definition

Given a game where two people use their hands as "guns" how can we detect gun fire and which player shot the gun.

## Method and Implementation

To detect which gun was owned by which player, I created two template images, one was a leftward facing hand and the other was a rightward facing hand. Then I performed two template matches per frame to determine where each players gun was. This could later be used to give information where the bullets of the gun would be headed but was for now not used other then to filter out unneeded points. To determine a gun fire, I performed a second template match which was the first template image with a motion blur. The second template image was matched with the sum energy of successive frames. Although in principle this should work, this ended up being rather finicky

The crucial parts of the game are the following; extract green channel from rgb frame. Template match green channel frame with hand template image. Filter out pixels that are not within the bounding box. Subtract the previous frame with the current frame. Sum energy of successive frames. Template match motion blurred hand with motion energy of successive frames within the bounding box of the hand.

## Experiments

The most important parameter to this game is the template image and the maximum difference coefficient between a template image and a frame. Thus template images where carefully constructed to mimic experimental results. The coefficients were set to minimize amount of incorrect guesses. I would rather the game not detect a gun then detect something that is not a gun or worse detect the red player's gun as the blue player's.

To evaluate my game, I simply count the amount of times it could not find a player and times it identified a character (and when it should vs when it should not). I also measured if the game detected a shot fired in a similar fashion.

## Results

### Results

Trial Success Fail
Blue Gun
Red Gun
Shooting

### ROC

Trial True False
Blue Gun present 9 1
Blue Gun not present 0 10
Red Gun present 8 2
Red Gun not present 0 10
Shooting Occured 6 4
Shooting Did Not Occur 3 7

Keeping in mind, that all conditions were optimal (including the distances).

## Discussion

• What are the strengths and weaknesses of your method?
• Hand tracking works very well at certain distances (as you might expect). Same with determining if the user has fired. However, too far away or too close and the game makes no detection at all.
• Potential future work. How could your method be improved? What would you try (if you had more time) to overcome the failures/limitations of your work?

I would most likely create a template pyramid so as to be able to account for differences in size. I would also track the hand over the entire time the gun is "shot" insead of just the start and end of it.

## Conclusions

In conclusion I created a fun although simple game. I have ideas in which to improve the interface, but for now guns not being tracked means that you "missed" the other player, which makes sense because if you are too far away for the template matching to work you wouldnt be pointing your gun at the other person. When the system does not correctly identify a gun fire, that is attributed to a misfire or a jam. I used a combination of many techniques all involving template matching but extending template matching to allow the tracking of unique objects to give contect to specific actions so that the actions can be attributed to a specific player. I think that the game was a success, although this was quite a lot of work for one person to do and so in the future I will probably (hopefully) work with a group.

# Assignment Three

## Problem Definition

Given three data sets {bats, cells, aquarium} find a way to segment the objects of interest in a fast and efficient manner.

## Method and Implementation

For all three data sets, we used adaptive thresholding. Percentile thresholding worked well for the bat data set as well but we found that the adaptive thresholding mixed with a dilation worked far better and more consistently. For the bat data set, we converted the source image to greyscale then applied an adaptive threshold to that greyscale image. Then we perform a dilation. Finally we caclulate the bounds and area and use the area to determine if we think that it is a bat or not. The area must be greater then 50 but less then 500 pixels for us to consider it a bat. With the cells we follow the same process except the area must be greater then 250. Finally for the fish we calculate the circularity and ensure that it is greater then 0.5.

## Experiments

Experiments were simply on each data set did the algorithm find a close amount of the object of interest and how many of those objects were actually the object of iterest. Thus we have three confusion matrices (one for each data set).

## Results

### Results

Output Example Output 1 Example Output 2
Bats
Cells
Aquarium

Bats Not Bats
Bats 30 9
Not Bats 5 ?

Cells Not Cells
Cells 6 0
Not Cells 2 ?

Fish Not Fish
Fish 30 4
Not Fish 15 ?

## Discussion

• What are the strengths and weaknesses of your method?
• Cell detection works very well as you can see. Bat detection works well except for the corners which are very noisey. The fish detection seems to be the worst since we detect most fish but also detect many objects that are not fish.
• Potential future work. How could your method be improved? What would you try (if you had more time) to overcome the failures/limitations of your work?

I would most likely design better filters that used information about the object of interest to filter out objects that were not that object. For eaxample using perimeter vs area ratios and/or compactness. We also relied on a cheap trick of downsampling the cell data to get it to work well instead I would like a more mathematically correct approach to modelling data.

# Assignment Four

## Problem Definition

Given footage of eels and crabs swimming in a water tank:

• Segment the water tanks out
• Segment/Track the eels and the crabs
• Output when eels are in the tank area
• Output movement information of the eels

## Method and Implementation

To segment the water tanks out from the rest of the footage, I designed an algorithm that used a simple thresholding to isolate the brightest areas of the footage. Then I calculated the area and bounding boxes of the connected components and chose the larges two areas that were squarelike. To segment/track the eels I used optical flow. Understanding that eels are generally moving, I only looked at areas where the optical flow's vectors had at least a magnitude of 0.5. Then we could use the optical flow to track which eels are which eels in cases where there are more then one. Since the crabs don't move much we used thresholding to segment and tracked using centroids. Although not presently finsihed, it would be simply to determine if an eel entered the tank because we have a very robust and accurate tracker. The movement of the eel is related to the optical flow of the eel we could also calculate the moments of the eels and use that to determine the fequency of the eels oscilations.

## Experiments

Experiments were could we accuately track an eel.

## Results

### Results

Output Example Output 1 Example Output 2
Eels
Crabs

Eels Not Eels
Eels 3 0
Not Eels 5 ?

Crabs Not Crabs
Crabs 3 2
Not Crabs 0 ?

## Discussion

• What are the strengths and weaknesses of your method?
• We spent alot of time on the segmentation of the eels/crabs so that is obviously where our project is strongest. We believed that the most important step was a good segmentation and that although the rest of the project was simple to implement, they results from that implementation would be meaningless if we could not deliver a good segmentation/tracking approach. Do to time constraints we were really able to focus on the latter part of the assignment.
• Potential future work. How could your method be improved? What would you try (if you had more time) to overcome the failures/limitations of your work?

I would most likely like to refine segmentation futher as that seems to be the most import part of any computer vision process. Tracking a small amount of objects is not difficult if we can get a very good segmentation. That said I think we needed to focus more of our time on the remaining aspects of the assignment.

## Conclusions

Any project that uses computer vision relies on a good segmentation. This is intuitive because we really need a good understanding of what an object is in order to find meaningful data about the object. Any information about a blob is only as good as the algorithm that generates that blob. Segmentation is the hardest to accurately implement but with a good segmentation everything else just follows.

# Art Vision

## Background and Objective

Large landscape paintings are always drawn following perspective principle. This project will try to extract the lines that are “parallel” on the building and construct perspective lines and the converging point on the painting. We can evaluate the perspective accuracy of the painting and even recreate a perspective correct version of it. After we get the converging/vanishing point, we will run a segmentation method to extract the objects, people and buildings in the foreground from the background. We will find a way to tell the distance of the objects on the paintings from the viewpoint. Then we can use the z-values that we get from the previous step and create a 2.5d space that contains several layers that have different distance from the viewpoint of the painting. The space will be able to be rotated, transformed and modified.

1. Vanishing Point
• Input: Raw Image
• Output: Converging Lines, Horizon, Vanishing Point
2. Segmentation
• Input: Raw Image
• Output: Connected Components
3. 2.5D Conversion
• Input: Raw Image, Vanishing Point, Segmentation
• Output: Layers of Image

## Tools

• OpenCV - Image Processing
• OpenGL - Visualization
• Qt - Window and Program Management

## Data Source

After browsing many painting from wikiArt and some other online galleries. We decided that we are using renaissance perspective paintings with people on the foreground and buildings on the side and on the background. We also want the paints to be realistic and sharped, and we want the buildings and constructions to have parallel lines that help us determine the converging point. Additionally, the objects and people on the foreground must have significant difference from the background that helps decent segmentation. Here are few painting we might be using:

## Converging Lines and Focal Point

We expect to be able to generate a set of lines that help define the projection of the image. We expect that edge detection might be helpful to accomplish this. Once we construct the edges we can determine the focal point of the image and calculate the different layers within the image.

## 2.5D Conversion

The result of the third step will be something as the following figure. The people in the foreground are extracted as different layers and spaced in the 3d space as flat objects, with the background place at the far-end of the field of view.

## State of the Art

We have carefully studied this topic. There are not many research projects on this topic. The closest topic we have found is a paper called “A generative model for 2.5D vision: Estimating appearance, transformation, illumination, transparency and occlusion” by Jojic and Frey. We found a lot of material that talks about linear perspective and aerial perspective in paintings that may help our project. Wikipedia has a useful article on 3D Reconstruction . It discuesses several different ways of reconstructing a 3D scene from a 2D image which may be of use. Wikipedia also includes reconstructing a scene using distortions to calculate the perspective of the image. This is very similar to the approach that we discussed together. Since projecting a 3D point onto a 2D plane is a nonreversable process, we need some notion of size or at least ratios. We will not be able to give accurate readings in terms of distance but it is very possible to output normalized coordinates. To do so we need knowledge of the size ratios of a person, in particular, we need the typical size ratios of a person as drawn by Renaissance artists. To be able to use this information we must accuartely be able to determine what "blobs" of pixels represent a person. This article goes through several different techniques for detecting humans although sadly many rely on video data. The circular hough transform looks somewhat promising as it should allow us to detect the faces of humans. From there we may be able to infer if a person is closer of further depending on the size of the circles detected (or we may chose to go a step further and find the exact dimensions of the face).