Entity Selection and Ranking for Data Mining applications


Expert-management portals like linkedin.com, odesk.com and guru.com are indicative sites that allow people to advertise their work or set of skills to the broader public. For example, linkedin features more than 120 million members which allows potential employers, collaborators, etc. to discover individuals or groups of individuals with the desired expertise. Similarly, review-management sites like Amazon or Yelp collect large number of reviews about products or services. For example, kindle has more than 30,000 reviews on Amazon. Naturally, users cannot go over all these reviews and are helped significantly by the identification of a small subset of reviews that is sufficiently informative. Finally, as online social and media networks grow in importance as sources of news and other information, there is an urgent need for tools that automatically identify and recommend important nodes of the network, that specific users may need to follow to fully exploit the power of online social media. In each of these scenarios, given a collection of entities (e.g., reviews about a product, experts that declare certain skills, network nodes or edges), the goal is to identify a subset of important entities (e.g., useful reviews, competent experts, influential nodes respectively).

Existing work on recommender systems attempts to identify important entities either by entity ranking or by entity selection. Entity-ranking methods associate a a score with each entity; They ignore the redundancy between the highly-scored entities. Entity-selection methods try to overcome this drawback by evaluating the desirability of a group of entities taken together; They attempt to identify the best subset of entities, while ignoring other subsets of entities that may be equally-good or almost as good as the best subset. Against this background, this project aims to overcome the drawbacks of existing entity selection and entity ranking methods through a synergistic integration of both into a common framework that allows entity-ranking based on entity selection and entity-selection that based on entity ranking. In the resulting framework, the scores of individual entities are determined in part by the number of good groups of entities they can be part of; and good group of entities consist of entities with high scores.

The main challenge addressed by this work is how to explore the solution space of combinatorial problems in order to identify subsets of entities that participate in many good solutions. The resulting new practical methods for exploring the solution space of combinatorial problems find applications related to expert management systems, management of online product reviews, and network analysis (including physical and social networks).

NSF Award III # 1218437, PI, 500K