Tue-Thur 12:30-2:00 pm
Room: MCS B031
The general theme of the course will be on algorithmic issues related to data mining. The specific focus of the course will be on models and algorithms for network data (e.g., social networks, collaboration networks, biological networks, etc).
The course will have a seminar format and it will be based on recently published material. The list of recommended reading will be constantly updated.
Evimaria Terzi, evimaria@cs.bu.edu
www.cs.bu.edu/~evimaria
Office Hours: Monday 3:30 pm - 5:00 pm and Tuesday 11:00 am -12:30.
Contact me by e-mail to arrange an appointment if necessary.
Working knowledge of programming and data structures. Familiarity with basic algorithmic concepts, probability, statistics and linear algebra. Programming projects will require knowledge of C (or C++), java, python.
Special thanks are due to Aris Anagnostopoulos (University of Rome), Jure Leskovec (Stanford), Panayiotis Tsaparas (Microsoft Research) for providing useful slides from their conference presentations and lectures on the topic.
Jan 14 | What is this class about / Introductory lecture | .pptx,.ppt,.pdf |
Jan 19 | Generative models for social networks | .pptx,.ppt,.pdf |
Jan 21 | Jon Kleinberg: The small world phenomenon: An algorithmic prespective | |
B. Bollobas: Mathematical Results in Scale-Free random Graphs | ||
J. Leskovec, J. Kleinberg, C. Faloutsos. Graphs over Time:Densification Laws, Shrinking Diameters and Possible Explanations | .pptx,.ppt,.pdf | |
Jan 26 | D. Kempe, J. Kleinberg, E. Tardos: Maximizing the spread ofinfluence in social networks | .pptx,.ppt,.pdf |
Jan 28 | J. Byers, M. Mitzenmacher, G. Zervas: Adaptive Weighing Designsfor Keyword Value Computation | by G. Zervas |
Feb 02 | H. Mannila, E. Terzi: Finding links and initiators: agraph-reconstruction problem | .pptx,.ppt,.pdf |
T. Lappas, K. Liu, E. Terzi: Finding a team of experts in social networks | ||
Feb 04 | K. Liu, E. Terzi: Towards identity anonymization on graphs | .pptx,.ppt,.pdf |
K. Liu, E. Terzi: A framework for computing the privacy score ofusers in online social networks | by E. Terzi | |
Feb 09 | E. Zheleva, L. Getoor: To join or not to join: the illusion ofprivacy in social networks with mixed public and private user profiles | by Mark Moses |
Feb 09 | L. Backstrom, C. Dwork, J. Kleinberg:Wherefore artthoou r3579x?: anonymized social networks, hidden patterns and structuralsteganography | by Xuan Zhang |
Feb 11 | A. Narayanan, V. Shmatikov: De-anonymizing social networks | byLiang Zhu |
Feb 11 | J. Leskovec et. al.: Cost-effective outbreak detection in networks | by David LaPalomento |
Feb 16 | No Class | |
Feb 18 | P. Rusmevichientong, S. Zhu, D. Selinger: Identifying early buyersfrom purchase data | by Vinita Shinde |
Feb 18 | M. Mathioudakis, N. Koudas: Efficient Identification of Startersand Followers in Social Media | by Justin Davis |
Feb 23 | Hartline et. al.: Optimal marketing strategies for social networks | by Joseph Akinwumi |
Feb 23 | A. Anagnostopoulos, R. Kumar, M. Mahdian: Influence and correlationin social networks | by George Brova |
Feb 25 | G. Kossinets, J. Kleinberg, D. Watts: The structure of informationpathways in a social communication network | by Min Ye |
Mar 02 | Social influence and diffusion of user-created content | by LilyWong |
Mar 02 | J. Leskovec, L. Backstrom, J. Kleinberg: Meme-tracking and thedynamics of news cycle | by Davide House |
Mar 04 | J. Leskovec, D. Huttenlocher, J. Kleinberg: Signed networks insocial media | by Rachel Katz |
Mar 04 | S. Marvel, J. Kleinberg, S. Strogatz: The Energy Landscape ofSocial Balance | by Marco Winkler |
Mar 09, 11 | No class - Spring Break | |
Mar 16 | N. Vuokko, E. Terzi: Reconstructing randomized graphs | byP. Scheffler |
Mar 16 | Some topic on antrhopological issues and social networks | by Conrad Nied |
Mar 18 | Herzenstein et al: The democratization of personal consumer loans | by Michel Machado |
Mar 18 | J. Kleinberg, S. Suri, E. Tardos, T. Wexler: Strategic NetworkFormation with Structural Holes | by Jackie Crescimanno |
Mar 21 | Connections between social and affiliation networks | by D. Erdos |
Mar 23 | Information Asymmetries in Pay-Per-Bid Auctions: how Swoopo MakesBank | by George Zervas |
Mar 30 | Problems and Algorithms for Probabilistic Graphs | by MichalisPotamias |
Apr 01 | by N. Triandopoulos | |
Project Presentations | ||
Apr 06 | Mark Moses, Xuan Zhang, Liang Zhu | |
Apr 08 | David LaPalomento, Justin Davis, Joseph Akinwumi, Vinita Shinde | |
Apr 13 | Min Ye, Lily Wong, George Brova | |
April 15 | Conrad Nied, Rachel Katz, Michel Machado | |
Apr 20 | Phil Sheffer, David House, Jackie Crescimanno | |
Apr 22 | No class | |
Apr 27 | Marco Winkler | |
April 29 | Poster Presentations | |
Students that take the course for credit will be required to complete a course project. Completion of the project constitutes 50% of the overall grade. In the course of preparing the project the students will have to do one presentation of some papers (preferrably related to their project). The presentation will be 20% of the credit. Finally, a reaction paper that summarizes the initial thoughts of the students with respect to their topic will be another 20% of the total credit. Finally, active class participation will be the additional 10%.
One class presentation
One reaction paper
One project
The grading will be based 20% on the presentation, 20% on the reaction paper, 50% on the project and 10% on class participation. Here is a more detailed description of each of these components.
Incompletes will not be given.
Late Assignment Policy: There will be a penalty of 10% per day, up to three days late. After that no credit will be given.
The course is primarily based on recent material (from the past 5-10 years); this means that most of it exists in the form of papers on the Web, and the existing literature raises a lot of interesting issues that have yet to be explored.
In the course of preparing the project the students will have to do one presentation of some papers (preferrably related to their project). The presentation will either be an initial attempt to familiarize the stronitt rubinfeldudents with the area they are going to be working on for the rest of the semester. A student may decide to do a presentation on a topic irrelevant to his/her project theme.
As a way to get everyone thinking about the research issues underlying the course, there will be a short reaction paper of roughly 5 pages in length. You can work in groups of up to 3 people on the paper.
The reaction paper should be structured as follows. First, you should read at least two closely related papers relevant to a particular section of the course. You should then write approximately 5 pages in which you address the following points:
What is main technical content of the papers?
Why is it interesting in relation to the corresponding section of the course?
What are the weakness of the papers, and how could they be improved?
What are some promising further research questions in the direction of the papers, and how could they be pursued?
Reaction papers should not just be summaries of the papers you read; most of your text should be focused on synthesis of the underlying ideas, and your own perspective on the papers. To make this concrete, you should make sure that you devote much of the content to the last bullet above: promising directions for further research. In particular, the reaction paper should contain at least some amount of each of the following types of content:
A proposal for a model or algorithm – potentially extending, varying, or improving something in the papers you’ve read – together with some mathematical analysis of it.
A test of a model or algorithm (either your own or something from one of the papers) on a dataset or on simulated data.
The reaction paper should be considered as a very good way to explore a potential project topic.
The final piece of the work for the course will be a project. You can work on this in groups of up to 3 people, and it is largely up to you to define the topic and scope of the project.
The first step in the project will be a short ‘proposal.’ This is meant just to be a brief description of what you’re intending for the project – about 2 pages in length, with a discussion of relevant background work and tentative plans for how you’ll proceed. If your project is based on your reaction paper, then you don’t need to repeat things you’ve said in the reaction paper – it’s enough to describe how you’re planning to turn the ideas from the reaction paper into a larger project.
The basic genres of project are the following:
An experimental evaluation of an algorithm, model, or measure on an interesting dataset. The datasets on the course home page suggest some possible domains in which to think about such experiments; but you can also assemble your own data.
A simple Facebook application.
A theoretical project that considers an algorithm, model, or measure in
the area of some course topic, and derives rigorous results about it.
An extended, critical survey of one the course topics, going into significant depth and offering a novel perspective on the area.
As with the reaction paper, the project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data.
The result of the project will typically be a 10-15 page paper, describing the approach, the results, and the related work. The references on the course home page serve as examples of what such papers tend to look like; of course, the overall form of the paper will depend on the nature of the project.
The final stage will be a presentation of the projects in class by each group. The exact schedule for the project presentations will be worked out later in the semester.
David Easley and Jon Kleinberg: Networks, Crowds, and Markets: Reasoning About a Highly Connected World.
Hossam Sharara, Lisa Singh, Lise Getoor, Janet Mann: The Dynamics of Actor Loyalty to Groups in Affiliation Networks. International Conference on Advances in Social Network Analysis (ASONAM) 2009.
Elena Zheleva, Hossam Sharara, Lise Getoor: Co-evolution of social and affiliation networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.
Silvio Lattazi, D. Sivakumar: Affiliation networks. ACM Symposium on Theory of Computing (STOC), 2009.
Jure Leskovec, Christos Faloutsos: Scalable modeling of real graphs using Kronecker multiplication. International Conference on Machile Learning (ICML), 2007.
Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2005.
M. E. J. Newman: Power laws, Pareto distributions and Zipf’s law, Contemporary Physics.
R. Albert and L.A. Barabasi, Statistical Mechanics of Complex Networks, Rev. Mod. Phys. 74, 47-97 (2002).
B. Bollobas: Mathematical Results in Scale-Free random Graphs.
D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. Newman, and S. H. Strogatz: Are randomly grown graphs really random? Phys. Rev. E 64, 041902 (2001).
D.J. Watts: Dynamics and Small-World Phenomenon. American Journal of Sociology, Vol. 105, Number 2, 493-527, 1999
Watts, D. J. and S. H. Strogatz: Collective dynamics of ’small-world’ networks. Nature 393:440-42, 1998
Michael Mathioudakis, Nick Koudas: Efficient identification of starters and followers in social media. Extended DataBase Technology Conference (EDBT), 2009.
Theodoros Lappas, Kun Liu, Evimaria Terzi: Finding a team of experts in social networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.
Heikki Mannila, Evimaria Terzi: Finding links and initiators: a graph-reconstruction problem. SIAM Data Mining Conference (SDM) 2009.
Deepayan Chakrabarti, Yang Wang, Chenxi Wang, Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Transactions on Information and Systems Security, 2008.
Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan: Discovering leaders from community actions. ACM Conference on Information and Knowledge Management (CIKM) 2008.
Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie S. Glance: Cost-effective outbreak detection in networks. ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2007.
Hanghang Tong, Christos Faloutsos: Center-piece subgraphs: problem definition and fast solutions. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.
David Kempe, Jon Kleinberg, Eva Tardos: Maximizing the spread of influence through a social network. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003.
Pedro Domings, Matthew Richardson: Mining the network value of customers. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2001.
Lei Zou, Lei Chen, M. Tamer Oszu: K-automorphism: a general framework for privacy preserving network publication. Proceedings of Very Large DataBases (PVLDB) 2009.
Elena Zheleva, Lise Getoor: To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. International Conference on World Wide Web (WWW), 2009.
Arvind Narayanan, Vitaly Shmatikov: De-anonymizing Social Networks. IEEE Symposium on Security and Privacy 2009.
Michael Hay, Chao Li, Gerome Miklau, David Jensen: Accurate Estimation of the Degree Distribution of Private Networks. IEEE International Conference on Data Mining (ICDM) 2009.
Kun Liu, Evimaria Terzi: A framework for computing the privacy score of users in online social networks. IEEE International Conference on Data Mining (ICDM) 2009.
X. Ying and X. Wu: Graph generation with prescribed feature constraints. Siam Data Mining Conference (SDM), 2009.
Kun Liu, Evimaria Terzi: Towards identity anonymization on graphs, ACM International Conference on Management of Data (SIGMOD) 2008.
Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis: Resisting structural identification in anonymized social networks. Conference on Very Large Databases (VLDB) 2008.
Lars Backstrom, Cynthia Dwork, Jon Kleinberg: Wherefore art thoou r3579x?: anonymized social networks, hidden patterns and structural steganography, International Conference on World Wide Web (WWW), 2007.
All course participants must adhere to the CAS Academic Conduct Code. All instances of adacemic dishonesty will be reported to the academic conduct committee