Research Overview

Evimaria Terzi, 2018

In this era of ‘‘big data", the goal of my research is to find efficient
and principled methods for extracting useful knowledge from large data collections.
I usually start with data-analysis problems that arise in real
applications. Then, I focus on finding clean computational formulations to model these problems and also
on designing efficient algorithms for solving them. In terms of problem
formulations, I am interested in simply stated, yet
technically challenging, questions. Technically, I am
interested both in the design and analysis of the
algorithms as well as their practical performance on real datasets.

My research agenda focuses on the development of principled and simple algorithms that are efficient,
robust to noise and can handle both noisy and partially-observed data.
Although at different times I am working on different application domains,
I always stay faithful to this vision.

Application-specific research themes

Team formation in expertise management systems.

The proliferation of expertise management systems (e.g.,LinkedIn,
Upwork} etc.) has led to research on team-formation problems.
Our work on team formation (T. Lappas, K. Liu, E. Terzi: Finding a team of experts in social networks. ACM SIGKDD 2009)
was the first to address
the problem of hiring a team of experts with different
skills and good history of collaborating together.
Ever since, there has been a lot of work in the data-mining community
around this topic. Our more recent work on this topic focuses on: (a) the development of
team-formation methods that have their roots in practical observations as obtained by social
scientists [1], (b) the development of algorithms for identifying the
team members that need to be hired/fired and
outsourced by a company to achieve optimal cost for projects that arrive in
an online fashion [2], and (c) problems related to the minimum
alteration of existing, yet problematic, teams so that their operation becomes smoother [3].

My recent collaboration with BU Spark! and the Hariri Institute of
Computing (HIC) has given my research on team formation an additional, more practical,
dimension as we are applying our team-formation algorithms to the teams
formed in student hackathons. In this setting, there are often students (usually from under-represented
groups and minorities) who oftain fail to find teammates. Our algorithms
provide recommendations for participants before the event so that everyone has an initial set of people to
talk to the day of the event. At the moment, we apply our algorithms to hackathons in the Boston area,
but we plan to license our software to hackathons throughout the country.

In the future, I will consider questions like: ‘‘Given team performance how can we evaluate the
ability level of each individual team member?" or ‘‘Can we predict the
performance of a newly-formed teams?’’.
The difficulty of these questions is due to the fact that the performance
of a team is not a summation of its individual
experts. Rather, it is defined by an unknown function, which will be
part of the learning task.

Recent Related papers

Urban data analysis

My previous work on unconventional types of recommender systems
increased my interest in recommendation of routes in urban
environments (WSDM'14 and Syst. ’16) and subsequently the development of algorithmic
techniques for analyzing
data that capture urban activity. Our recent work [1] has developed new algorithmic
tools for identifying popular routes that city dwellers take in a city. This helped us find city hotspots
both for locals and tourists. Motivated by the widespread use of
ride-hailing platforms (e.g., Uber or Lyft), we have developed
a framework for recommending to drivers strategies that maximize their
expected earnings [2].Our methods for this problem allow for significant improvement of the
earnings of drivers when compared to naive drivers that are not strategic or simply follow price surge.
A key component of our framework is that it enables us to rigorously
reason about and analyze the sensitivity of our results to perturbations in the input data.

In the future, I will continue developing algorithmic techniques for
analyzing data coming from different applications
designed to improve different modes of peoples’ lives.

Related recent papers

Social networks

The analysis of social-network data has been a recurring theme in my research.
Currently, I am interested in some recurring questions
related to network sparsification [1] or the identification of functional communities [2].
More recently, motivated by the rise of echo-chambers I have worked on the
problem of modifying the network so that such phenomena (and their effects) are minimized [3].
I expect that these problems will become vital in the future and
my work on social networks will focus more and more on themes with such direct impact on society.

Related recent papers

Cross-Cutting Research Themes

Active data mining

In many applications the data come in the form of aggregates
as massive scale makes fine-grained data collection
and storage intractable.
In other applications, we have access to only a small percentage of the data points.
Most of the times, one needs to perform analysis of
such incomplete or aggregated data either to perform prediction or
other data-analysis tasks (e.g., matrix completion or clustering).
Over the last years my research has focused on the development of techniques
that appropriately query the data (e.g., in practice by asking specific
feedback from users, or obtaining specific measurements)
in order to better perform a specific data-mining
task cite{chaudhari18markov,golshan14unveiling,malmi17active,mavroforakis17active,ruchansky15matrix}.
Although some of these papers are a bit older, this
direction became quite dominant in my research over the last years.

The end goal of this research direction is to develop a generalized
framework for what I call active data mining.
This framework will allow us to strategically query unseen data with the
goal to optimize for the data-mining task
at hand (e.g., clustering, community detection, etc.).

Related recent papers
  1. SDM 2014.

Robust data mining

In our recent work on ride-hailing platforms[1], we developed a framework that
enabled us to rigorously reason about and analyze the sensitivity of our
results to perturbations in the input data, without ever
performing these perturbations. Motivated by this, I plan to further
investigate the application of similar ideas to
other data-mining problems. On a high level, this research thrust will
address long-standing questions in data mining wrt the impact
of noise in the results. For this, we will build upon existing work on robust optimization, yet we will focus on
the robust versions of data-mining problems, which have not been studied rigorously before.

Related recent papers


Application of primal-dual algorithms in data-mining applications

A very recent and exciting direction to my research has been the
application of primal-dual algorithms to a large number of data-mining
problems. Although many of the algorithms we use in the data-mining
community have their roots in some primal-dual algorithm, we – as a
data-mining community – rarely go back to the original linear-program
formulation of our problems. The goal of my future research in this domain is to
do exactly that as we have ample preliminary evidence that shows that these
algorithms can become very efficient and useful when they take into consideration the structure
imposed by the practical problems we use them for.