Seam carving (or liquid rescaling) is an algorithm with the purpose of image retargeting, which is the problem of displaying images without distortion on media of various sizes (cell phones, projection screens) using document standards, like HTML, that already support dynamic changes in page layout and text but not images.
Implementing the seam carving algorithm and using it to perform the following tasks of image manipulation
- Content Aware Image Resizing
- Image Enlargement
- Object Deletion
- Video Object Deletion
Method and Implementation
We divided the implementation of this project into ordered steps and the method of each is described below:
The algorithm uses an energy function to identify optimal paths of low importance pixels, which it can remove for resizing. Energy values depend on how much the surrounding pixels change color. Traditional energy calculation of pixel at pos (x,y) is simply the difference between pixel value at (x-1,y) and (x+1,y).
We used forward energy calculation algo which predicts what pixels will be adjacent after a seam removal, and uses that to suggest the best seam to remove. We have to consider which pixels are brought together by the removal of a particular pixel. This depends on if the current pixel is connected to a seam on the top-left, top or top-right.
Seam Carving Algorithm
Once the energy matrix is calculated for the entire image, our goal in each iteration is to find the lowest energy seam passing through the entire height/width of the image. At first it might make sense to use a greedy approach to solve this problem but it can lead to getting stuck in a high-energy region of the image in later stages, as showin in the image below.
This problem can be solved efficiently with a dynamic programming approach, with the recurrence relation shown below.
At each pixel, we look at the minimum energy seams encountered so far, ending at the three neighbor pixels in the row just before the target pixel. We then associate the target pixel with the least energy seam among these. At the end of the algorithm, the minimum energy seam in the entire image is represented by the least energy pixel in the last row. The rest of the seam can be constructed using the stored pointers.
We can apply the same technique of finding low energy seams to expand the image as well. This works on the same principle that the human eye is unlikely to detect the low energy content in an image and hence a small addition of such seams would be imperceptible.
- Run the original seam carving algo on a duplicate image to find n-seams and store these seams
- Iterate through each seam, inserting new pixels at the location of each seam element.
- The new pixel value would be the average of its neighbors on the left and right.
For object deletion, we first used a Mask R-CNN model pre-trained on the COCO dataset. The model takes an image as input, produces regions of interest, classifies those regions and finally generates masks for them.
Using this, we can generate masks for target objects in the given image. We superimpose the mask on top of the calculated energy matrix, giving negative weights to the area corresponding to the objects in the image. By doing this, we force the seam to pass through the object we would like to remove.
Video Object Deletion
This was accomplished by automating the mask generation process using Mask R-CNN. By feeding the algorithm our video and the target class of the object to be removed, the algorithm will automatically run the network each time and generate a mask, which can then be applied on the energy matrix and used to delete the object.
We experimented to find the best possible energy funcion since we were going to used the algo for a complex task like video object deletion as well. Out of all the energy functions we found the best one to be the forward energy function. The other energy function we tried were the traditional energy function and convolution filter energy function
The traditional energy function seemed to be have a lot of visual artifacts especially of a pixel with its top layer in case of big color changes. The horizontal distortions were also fairly visible. The processing time of this function was the fastest among all.
The convolution filter energy function as implemented in our code convolves a 3 X 3 filter for the whole image and the resultant values are the energy matrix. The filter is quite similar to the ones used for edge detection since the idea of big energy change is same. This filter provided better results. The horizontal distortions were no longer visible but some visual artifacts especially with the top layer were visible. The time taken for this function to calculate the energy matrix was more than the traditional energy functions.
We ran our model on multiple images, resizing them and deleting objects. Some of our results are listed below:
|Trial||Source Image||Result Image|
|Trial||Source Image||Result Image|
|Remove person, handbag|
|Remove person, motorcycle|
Video Object Deletion
Removing all objects shaped like human beings using RCNN for the Computer Vision Homework 3 video and then resizing by inserting seams low energy seams.
- The overall method works fairly well on several images that we fed it
- There are some cases where the unique placement of background colors and patterns causes the seam carving to malfunction, as shown in the image above, where we tried to delete the car.
- Additionally, the mask generation process is not perfect and hence the object removal is not always very precise. This results in some artifacts in the image after an object is deleted.
- This could be improved to some extent by using a more powerful model for segmentation, although it would be at a cost to the overall runtime.
- Another thing we could do is add a manual mask editing GUI, which would allow us to fine-tune the generated mask. This would be at the cost of automation, however.
- We were unable to achieve real-time speeds, as the algorithm takes a while to run(upto 2 sec per seam on a 600x400 image). This may be due to many factors.
- We used an object oriented approach, so this may be adding extra overhead to the functions
- The program is implemented in Python. A C++ implementation will no doubt see increases in speed
- Even a slightly faster energy function would improve the speed of the program by a whole lot
- We attempted to parallelize a part of the algorithm(updating minimum energy calculation in each pixel per row parallely), but the added overhead of CPU calls actually decreased performance, so we abandoned it
- For video object deletion we achieved surprisingly promising results as all human beings are completely removed from the frames for most part of the video. We abserved that the pole again was being a nuisance to the RCNN as it failed to recognise objects behind it and we see glimpses of human presence
- This application of computer vision can be used in film making especially, object removal. When the films are shot on a set they are constantly required to remove objects such as harness and extra support crew from the shot.
- This application of resizing images is also useful for browsers when the content must be resized according to the device size.
We have built an automated system for content-aware image resizing that can also function as an object removal mechanism and content enhancer. With further improvement in the mask that we create, it would be possible to enhance/remove content with good precision. This project helped us appreciate the potential of computer vision systems.