Research Statement and Highlights

In the era of ‘‘big data", the goal of my research is to find efficient and principled methods for extracting useful knowledge from large data collections. My work focuses on data-analysis problems that arise in real applications (e.g., Web, e-commerce, social networks, the Internet, etc.). For these problems I focus on finding clean computational formulations that capture these real-life problems and also designing efficient algorithms for solving them. In terms of problem formulations, I am interested in simply stated, yet technically challenging, questions. Technically, I am interested in the design and analysis of the algorithms as well as their practical performance on real datasets.

My work focuses on problems related to (a) social networks, (b) online reviews, (c) expertise management systems and (d) privacy-preserving data analysis. While my research is always motivated by practical applications, there are some fundamental research questions that span all the above application domains.

Social networks

This thrust of my research aims to address the following (simply stated) question: ‘‘Given a (social) network which are the most important nodes in that network?"

In 2010, when I started working on this question, existing research was addressing it in two ways: (i) By associating every node with an importance score and ranking the nodes in decreasing importance; the centrality of a node is an example of such a score because it measures the shortest paths of the network that pass through the particular node. (ii) By identifying a set of influential nodes, i.e., a group of nodes that once these nodes adopt a product or an idea, then the spread of the product (or idea) in the network is potentially maximized.

My work has advanced these existing approaches as follows: first, we pointed out that associating a centrality score with every node hides valuable information about the interaction between the nodes. To address this shortcoming, we introduced the notion of group centrality. Secondly, we pointed out that prior work on the identification of influential groups of nodes was based on the potential rather than the observed influence of the nodes in the network. For example, a node may be potentially influential (e.g., by having a large neighborhood), but in practice his neighbors do not follow his example. To address this issue, we introduced the notion of effector nodes of a social network. Effectors are nodes that are the most probable initiators of a trend or an epidemic. Thus, our work was the first to propose an a posteriori analysis of the propagation and the identification of the key nodes for a particular observed propagation.

Representative Publications

A. Gionis, E. Terzi, P. Tsaparas: Opinion maximization in social networks. SIAM Data Mining Conference (SDM), 2013.

V. Ishakian, D. Erdos, E. Terzi, A. Bestavros: A Framework for the Evaluation and Management of Network Centrality. Siam Data Mining Conference (SDM) 2012.

D. Erdos, V. Ishakian, A. Lapets, E. Terzi, A. Bestavros: The FilterPlacement Problem and its Application to Minimizing Information Multiplicity. International Conference on Very Large Databases (VLDB) 2012 .

T. Lappas, E. Terzi, D. Gunopoulos, H. Mannila: Finding Effectors in Social Networks. ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, 2010.

Management of online reviews

In online market places (e.g., Amazon) or online review-management systems (e.g., Yelp), there are many products for which thousands of reviews exist. Therefore, a natural question is the following: ‘‘Which reviews should one read before making a decision to purchase a product or use a service?"

Existing work was addressing this question by ranking reviews based on importance scores (e.g., helpfulness votes), which are computed independently for every review. As a result, the top-scored reviews may be redundant, i.e., they may all comment upon the same aspects of a particular product.

In contrast to existing work, my work in the area has been among the very first to evaluate the quality or the informativeness of groups of reviews rather than every review independently of the rest. For example, we have developed methods that identify a small-cardinality set of non-redundant reviews that cover most of the aspects of the product or a set of reviews that accurately represents the distribution of opinions (per product aspect) of the underlying collection. Finally, we have also considered the issue of reviewer motivation and we have developed review-ranking mechanisms that motivate reviewers to write useful reviews by contributing significant information to the existing review collection.

Representative Publications

A. Gionis, T. Lappas, E. Terzi: Estimating Entity Importance via Counting Set Covers. ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, 2012.

T. Lappas, M. Crovella, E. Terzi: Selecting a Set of Characteristic Reviews. ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, 2012.

Theodoros Lappas, Evimaria Terzi: Toward a fair review-management system. Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD) 2011. code

Panayiotis Tsaparas, Alex Ntoulas and Evimaria Terzi: Selecting a comprehensive set of reviews. ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, 2011.

Expertise management systems

The proliferation of expertise management systems (e.g., LinkedIn, Odesk etc.) has raised questions like: ‘‘which is the most appropriate expert or group of experts to complete a given task?" As in the case of online reviews, existing methods mostly focused on evaluating experts individually, by associating them with a performance score.

Our work on team formation was the first to address the problem of hiring a team of experts with different backgrounds and yet good history of collaborating together. In a continuation of this work, we have developed a score for ranking experts based on the number of possible good teams they can potentially participate in. This latter work is the first one to integrate the ranking with the group-selection paradigm into a single score. This score identifies entity importance via considering the quality of all the groups that an entity can participate in.

Representative Publications

A. Gionis, T. Lappas, E. Terzi: Estimating Entity Importance via Counting Set Covers. ACM SIGKDD International Conference on Data Mining and Knowledge Discovery, 2012.

Theodoros Lappas, Kun Liu, Evimaria Terzi: Finding a team of experts in social networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.

Privacy in social networks

Privacy-preserving data management and data sharing have been a major concern – particularly in the context of social networks. My first work in the area of privacy and social networks was the first paper to introduce the notion of anonymity in graphs. My subsequent work in the area had two thrusts. The first one, studies, models and quantifies users information-sharing behaviors. As part of this work, we have developed a privacy score that measures the tendency of a user to share his/her private information online. As part of the second thrust, we are exploring the degree to which the revelation of certain bits of information about graph data reveal information about individual nodes or their connections to others.

Representative Publications

D. Erdos, R. Gemulla, E. Terzi: Reconstructing Graphs from Neighborhood Data. IEEE International Conference on Data Mining, 2012, Brussels, Belgium, December 2012

G. Gursun, N. Ruchansky, E. Terzi, M. Crovella: Routing State Distance: A Path-based Metric for Network Analysis. Internet Measurement Conference (IMC), 2012.

G. Gursun, N. Ruchansky, E. Terzi, M. Crovella: Inferring Visibility: Who is (not) talking to whom. ACM SIGCOMM 2012

Elena Zheleva, Evimaria Terzi, Lise Getoor: Privacy in Social Networks. Morgan & Claypool Publishers 2012

M. Hay, K. Liu, G. Miklau, J. Pei, and E. Terzi: Privacy-aware Data Management in Information Networks. Tutorial in ACM Conference on Management of Data (SIGMOD), 2011 slides.

Kun Liu, Evimaria Terzi: A framework for computing the privacy score of users in online social networks. ACM Transactions on Knowledge Discovery from Data (TKDD).

Kenneth Clarkson, Kun Liu, Evimaria Terzi: Towards Identity Anonymization in Social Networks. Book Chapter in Link Mining: Models Algorithms and Applications. Editors: C. Faloutsos, J. Han and P. Yu.

Niko Vuokko, Evimaria Terzi:Reconstructing randomized social networks. SIAM Data Mining Conference (SDM) 2010.

Kristen LeFevre, Evimaria Terzi: GraSS: Graph Structure Summarization. SIAM Data Mining Conference (SDM) 2010.

Kun Liu, Evimaria Terzi: Towards identity anonymization on graphs, ACM I nternational Conference on Management of Data (SIGMOD) 2008.