Eric Missimer, Ryan Fleisher, George Pabst
Even with society's ever-expanding reliance on technology and the fast growing trends of the computer age, there are still many who are unable to participate. A large portion of these people are those with physical disabilities. Fortunately, Margit Betke and the Image and Video Computing group at Boston University developed and released software called Camera Mouse which allows a user that cannot use a mouse due to physical impairments to control the mouse with a portion of their face via any inexpensive web-camera. Although the software has already helped tens of thousands of people, we felt that there are still improvements which could be made, specifically, to allow the user to have complete independence with their computer experience.
First, we found that an individual with disabilities needs an assistant to select tracking points for them. Also, sporadic, involuntary movements would register as voluntary movements and lose tracking points on the person's face. Plus, many times the program would eventually drop tracking points on the face, and the tracking box would eventually slide off to the side. Finally, the only way to register mouse clicks was for the user to "hover" over a selected area.
We decided to improve the Camera Mouse functionality so that the program can automatically select tracking points without any user input. This allows users, especially those who are disabled and cannot control a mouse, to not have to rely on outside assistance to help initialize tracking points for the user to use the program. To avoid losing the tracking feature due to sporadic movements, we track multiple features points of the face. This allows for smoother movement and decreases the chance that the tracking point is lost. Finally, we give the user two different options for mouse clicks. These include being able to left click and right click by blinking either the left or right eye, left clicking by blinking both eyes, or to choose the previous hovering method.
The first addition we made to the Camera Mouse software is the ability to automatically find multiple tracking points. The original version of the software could only track one point, and it had to be set manually. Camera Mouse was made in the spirit of allowing people who are physically disabled the opportunity to use a computer without assistance, so we felt that automatically setting the tracking points would allow even more independence for the user. Because we are automatically selecting tracking points, we must have the program determine the best tracking points along the face to select. Intuitively, these happen to be the most complex points of the face, such as eyes/eyebrows, parts of the nose, and dips/crevices in the cheeks. We define the most complex points as the points which have the highest correlation coefficient with respect to other points within that region. Our program, when initially run, performs a calibration tutorial which prompts the user to place his or her head into a small region, and then finds up to the three best tracking points. These tracking points work are ranked based off of a coefficient we assign each of them, and work in conjunction to minimize the loss of tracking points due to sporadic movement. The tutorial also includes customized smoothing based off of the range the user can move his or her head.
The second addition we made to the Camera Mouse software is a feature for the user to select how to trigger a mouse click. Originally, the software was designed for the user to hover over an object to let them click. However, this was problematic, for users could too easily trigger mouse clicks when they weren't desired, and there was no left or right click capabilities. We added a step in the tutorial for users to select whether they would like to trigger mouse clicks with the default hovering method, or with voluntary eye blinks. We accomplished this by detecting large changes in the image by detecting if the intensity changes by a set threshold value.
The user is given a time limit to put the center of his face into a box drawn on the camera image feed. Once the time limit expires, the program processes the current frame and calculates the sobel gradient map. The map is then used to find the top areas where the image changes most abruptly (i.e. higher values for the gradient). We think these spots will be the easiest to track from frame to frame.
Our program uses multiple tracking points in order to add redundancy to the tracking. At every iteration the program finds the tracking point with the best match in the camera image. The "best match" is determined using the normalized correlation coefficient comparing the template saved in the initial set-up with the current frame. The best match is also used to recalibrate the other tracking points by specifying a search area near the original displacements from the best match for a template search.
If all the tracking points become occluded (i.e by a hand) it can recover when randomly one of the original tracking points is matched using our algorithm. The locations where our algorithm will be matching the tracking points will be jumping around if they are all lost, once the image stabilizes it will be more likely to find the tracking points even if the face is in a different position before for example, an unintentional jerk because of the movement of the points during the jerk.
To calibrate the range of motion of the user with the size of the screen (i.e. to make sure its possible to click on all parts of the screen), we again give the user a time limit with the instruction to move the head left, right, top and bottom. Taking the maximum and minimum components reached by each tracking point, we adjust the sensitivity of the mouse appropriately.
To detect blinking, we used template matching with a normalized correlation coefficient value to determine if the user has blinked. We take an image of the user's open eyes and used that image as a template. Therefore, we tracked the general location of the and checked frame by frame for blinking. If the open eye template is not matched in the image (using the normalized correlation coefficient as the metric for matching), we detect that the user has his eyes closed. If the user has his eyes closed for a certain number of frames a blink is registered and the system is sent a mouse click.
First, we instructed the user to position their head within a trapezoid on the screen in order to scan a lower portion of the person's face. We then constructed a Sobel gradient map of the captured image. For every other possible tracking position, we summed up the absolute value of the gradient within that tracking point. We chose the best tracking points by picking those with the highest the summation of the gradient, with the enforcement that there is a minimum distance between any two possible points. We found this to be the best method of finding tracking points, because points with high change in intensity will have a higher Sobel gradient. These points tended to be more complex points, and therefore easier to track.
In the next part of the tutorial, we calibrated the mouse by tracking how far the person could move in the maximum direction, and mapped this small search space to the dimensions of the screen. We determined the speed and accuracy of controlled based off of these measurements.
To track points, for the first iteration of tracking, all the points start their template match search at their previous location. After that iteration, the point with the highest normalized correlation coefficient is used to seed the positions. Seeding the positions involves using the best matched tracking point to recalibrate the other tracking points by specifying a search area near the original displacements from the best matched tracking point for a template search of the other points.
We determined what clicking method the user wants to use by having the user move the mouse to a specific quadrant of the screen. To select the tracking points of the eyes, we used a similar template matching algorithm to the one we used to find the face tracking points. We initially find the user’s eyes by having them position their eyes within a certain bounding box. To determine whether is a mouse event is triggered, we take a template of the open eye and that template is then compared to the expected eye location and if a low normalized correlation coefficient is found, then we determine the eye is closed. After the eye has been closed for a preset amount of time, it is determined that it is a voluntary eye blink, and the mouse event is triggered.
Video: Finding tracking points on paper
Video: Occulding tracking points
Despite our success with our first attempt at increasing the usability for the Camera Mouse software, there are still many improvements that could be made to the system. These are some of the features that could be expanded upon:
- Create better algorithm for finding better tracking points (more complex feature points)
- Detection when tracking points have been lost to start initialization process again
- Updating the vectors between points to take into account moving closer and farther away to the camera
- Adjusting the vector between points to take into account movement
- More accurate and better smoothing
- Have the program run at start up
- Remove time constraints from tutorial to remove pressure from the user
- Track the head and find feature points without subjecting the user to position themselves in front of the camera
- Implement the hover click as an alternative to the blinking click
- Determine tracking point size and/or increase search area based off of computer speed