CAS CS 591, Advanced Topics in Data Mining, Spring 2010

Schedule

Tue-Thur 12:30-2:00 pm

Room: MCS B031

Course Outline

The general theme of the course will be on algorithmic issues related to data mining. The specific focus of the course will be on models and algorithms for network data (e.g., social networks, collaboration networks, biological networks, etc).

The course will have a seminar format and it will be based on recently published material. The list of recommended reading will be constantly updated.

Instructor

Evimaria Terzi, evimaria@cs.bu.edu
www.cs.bu.edu/~evimaria
Office Hours: Monday 3:30 pm - 5:00 pm and Tuesday 11:00 am -12:30.
Contact me by e-mail to arrange an appointment if necessary.

Prerequisites

Working knowledge of programming and data structures. Familiarity with basic algorithmic concepts, probability, statistics and linear algebra. Programming projects will require knowledge of C (or C++), java, python.

Special thanks

Special thanks are due to Aris Anagnostopoulos (University of Rome), Jure Leskovec (Stanford), Panayiotis Tsaparas (Microsoft Research) for providing useful slides from their conference presentations and lectures on the topic.

Syllabus

Jan 14 What is this class about / Introductory lecture.pptx,.ppt,.pdf
Jan 19 Generative models for social networks .pptx,.ppt,.pdf
Jan 21 Jon Kleinberg: The small world phenomenon: An algorithmic prespective
B. Bollobas: Mathematical Results in Scale-Free random Graphs
J. Leskovec, J. Kleinberg, C. Faloutsos. Graphs over Time:Densification Laws, Shrinking Diameters and Possible Explanations.pptx,.ppt,.pdf
Jan 26 D. Kempe, J. Kleinberg, E. Tardos: Maximizing the spread ofinfluence in social networks .pptx,.ppt,.pdf
Jan 28 J. Byers, M. Mitzenmacher, G. Zervas: Adaptive Weighing Designsfor Keyword Value Computation by G. Zervas
Feb 02 H. Mannila, E. Terzi: Finding links and initiators: agraph-reconstruction problem .pptx,.ppt,.pdf
T. Lappas, K. Liu, E. Terzi: Finding a team of experts in social networks
Feb 04 K. Liu, E. Terzi: Towards identity anonymization on graphs .pptx,.ppt,.pdf
K. Liu, E. Terzi: A framework for computing the privacy score ofusers in online social networks by E. Terzi
Feb 09 E. Zheleva, L. Getoor: To join or not to join: the illusion ofprivacy in social networks with mixed public and private user profiles by Mark Moses
Feb 09 L. Backstrom, C. Dwork, J. Kleinberg:Wherefore artthoou r3579x?: anonymized social networks, hidden patterns and structuralsteganography by Xuan Zhang
Feb 11 A. Narayanan, V. Shmatikov: De-anonymizing social networks byLiang Zhu
Feb 11 J. Leskovec et. al.: Cost-effective outbreak detection in networks by David LaPalomento
Feb 16 No Class
Feb 18 P. Rusmevichientong, S. Zhu, D. Selinger: Identifying early buyersfrom purchase data by Vinita Shinde
Feb 18 M. Mathioudakis, N. Koudas: Efficient Identification of Startersand Followers in Social Mediaby Justin Davis
Feb 23 Hartline et. al.: Optimal marketing strategies for social networks by Joseph Akinwumi
Feb 23 A. Anagnostopoulos, R. Kumar, M. Mahdian: Influence and correlationin social networks by George Brova
Feb 25 G. Kossinets, J. Kleinberg, D. Watts: The structure of informationpathways in a social communication network by Min Ye
Mar 02 Social influence and diffusion of user-created contentby LilyWong
Mar 02 J. Leskovec, L. Backstrom, J. Kleinberg: Meme-tracking and thedynamics of news cycle by Davide House
Mar 04 J. Leskovec, D. Huttenlocher, J. Kleinberg: Signed networks insocial media by Rachel Katz
Mar 04 S. Marvel, J. Kleinberg, S. Strogatz: The Energy Landscape ofSocial Balance by Marco Winkler
Mar 09, 11 No class - Spring Break
Mar 16 N. Vuokko, E. Terzi: Reconstructing randomized graphsbyP. Scheffler
Mar 16 Some topic on antrhopological issues and social networks by Conrad Nied
Mar 18 Herzenstein et al: The democratization of personal consumer loans by Michel Machado
Mar 18 J. Kleinberg, S. Suri, E. Tardos, T. Wexler: Strategic NetworkFormation with Structural Holes by Jackie Crescimanno
Mar 21 Connections between social and affiliation networks by D. Erdos
Mar 23 Information Asymmetries in Pay-Per-Bid Auctions: how Swoopo MakesBank by George Zervas
Mar 30 Problems and Algorithms for Probabilistic Graphs by MichalisPotamias
Apr 01 by N. Triandopoulos
Project Presentations
Apr 06 Mark Moses, Xuan Zhang, Liang Zhu
Apr 08 David LaPalomento, Justin Davis, Joseph Akinwumi, Vinita Shinde
Apr 13 Min Ye, Lily Wong, George Brova
April 15 Conrad Nied, Rachel Katz, Michel Machado
Apr 20 Phil Sheffer, David House, Jackie Crescimanno
Apr 22 No class
Apr 27 Marco Winkler
April 29 Poster Presentations

Workload

Students that take the course for credit will be required to complete a course project. Completion of the project constitutes 50% of the overall grade. In the course of preparing the project the students will have to do one presentation of some papers (preferrably related to their project). The presentation will be 20% of the credit. Finally, a reaction paper that summarizes the initial thoughts of the students with respect to their topic will be another 20% of the total credit. Finally, active class participation will be the additional 10%.

  1. One class presentation

  2. One reaction paper

  3. One project

The grading will be based 20% on the presentation, 20% on the reaction paper, 50% on the project and 10% on class participation. Here is a more detailed description of each of these components.

Incompletes will not be given.

Late Assignment Policy: There will be a penalty of 10% per day, up to three days late. After that no credit will be given.

Class presentation

The course is primarily based on recent material (from the past 5-10 years); this means that most of it exists in the form of papers on the Web, and the existing literature raises a lot of interesting issues that have yet to be explored.

In the course of preparing the project the students will have to do one presentation of some papers (preferrably related to their project). The presentation will either be an initial attempt to familiarize the stronitt rubinfeldudents with the area they are going to be working on for the rest of the semester. A student may decide to do a presentation on a topic irrelevant to his/her project theme.

Reaction paper

As a way to get everyone thinking about the research issues underlying the course, there will be a short reaction paper of roughly 5 pages in length. You can work in groups of up to 3 people on the paper.

The reaction paper should be structured as follows. First, you should read at least two closely related papers relevant to a particular section of the course. You should then write approximately 5 pages in which you address the following points:

Reaction papers should not just be summaries of the papers you read; most of your text should be focused on synthesis of the underlying ideas, and your own perspective on the papers. To make this concrete, you should make sure that you devote much of the content to the last bullet above: promising directions for further research. In particular, the reaction paper should contain at least some amount of each of the following types of content:

The reaction paper should be considered as a very good way to explore a potential project topic.

Project

The final piece of the work for the course will be a project. You can work on this in groups of up to 3 people, and it is largely up to you to define the topic and scope of the project.

The first step in the project will be a short ‘proposal.’ This is meant just to be a brief description of what you’re intending for the project – about 2 pages in length, with a discussion of relevant background work and tentative plans for how you’ll proceed. If your project is based on your reaction paper, then you don’t need to repeat things you’ve said in the reaction paper – it’s enough to describe how you’re planning to turn the ideas from the reaction paper into a larger project.

The basic genres of project are the following:

As with the reaction paper, the project should contain at least some amount of mathematical analysis, and some experimentation on real or synthetic data.

The result of the project will typically be a 10-15 page paper, describing the approach, the results, and the related work. The references on the course home page serve as examples of what such papers tend to look like; of course, the overall form of the paper will depend on the nature of the project.

The final stage will be a presentation of the projects in class by each group. The exact schedule for the project presentations will be worked out later in the semester.

Reading material

David Easley and Jon Kleinberg: Networks, Crowds, and Markets: Reasoning About a Highly Connected World.

Generative models for social networks

Hossam Sharara, Lisa Singh, Lise Getoor, Janet Mann: The Dynamics of Actor Loyalty to Groups in Affiliation Networks. International Conference on Advances in Social Network Analysis (ASONAM) 2009.

Elena Zheleva, Hossam Sharara, Lise Getoor: Co-evolution of social and affiliation networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.

Silvio Lattazi, D. Sivakumar: Affiliation networks. ACM Symposium on Theory of Computing (STOC), 2009.

Jure Leskovec, Christos Faloutsos: Scalable modeling of real graphs using Kronecker multiplication. International Conference on Machile Learning (ICML), 2007.

Jure Leskovec, Jon Kleinberg and Christos Faloutsos Graphs over Time: Densification Laws, Shrinking Diameters and Possible Explanations. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2005.

M. E. J. Newman: Power laws, Pareto distributions and Zipf’s law, Contemporary Physics.

R. Albert and L.A. Barabasi, Statistical Mechanics of Complex Networks, Rev. Mod. Phys. 74, 47-97 (2002).

B. Bollobas: Mathematical Results in Scale-Free random Graphs.

D. S. Callaway, J. E. Hopcroft, J. M. Kleinberg, M. E. J. Newman, and S. H. Strogatz: Are randomly grown graphs really random? Phys. Rev. E 64, 041902 (2001).

D.J. Watts: Dynamics and Small-World Phenomenon. American Journal of Sociology, Vol. 105, Number 2, 493-527, 1999

Watts, D. J. and S. H. Strogatz: Collective dynamics of ’small-world’ networks. Nature 393:440-42, 1998

Information propagation / collaboration

Michael Mathioudakis, Nick Koudas: Efficient identification of starters and followers in social media. Extended DataBase Technology Conference (EDBT), 2009.

Theodoros Lappas, Kun Liu, Evimaria Terzi: Finding a team of experts in social networks. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD) 2009.

Heikki Mannila, Evimaria Terzi: Finding links and initiators: a graph-reconstruction problem. SIAM Data Mining Conference (SDM) 2009.

Deepayan Chakrabarti, Yang Wang, Chenxi Wang, Jure Leskovec, Christos Faloutsos: Epidemic thresholds in real networks. ACM Transactions on Information and Systems Security, 2008.

Amit Goyal, Francesco Bonchi, Laks V. S. Lakshmanan: Discovering leaders from community actions. ACM Conference on Information and Knowledge Management (CIKM) 2008.

Jure Leskovec, Andreas Krause, Carlos Guestrin, Christos Faloutsos, Jeanne VanBriesen, Natalie S. Glance: Cost-effective outbreak detection in networks. ACM International Conference on Knowledge Discovery and Data Mining (KDD), 2007.

Hanghang Tong, Christos Faloutsos: Center-piece subgraphs: problem definition and fast solutions. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2006.

David Kempe, Jon Kleinberg, Eva Tardos: Maximizing the spread of influence through a social network. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2003.

Pedro Domings, Matthew Richardson: Mining the network value of customers. ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD), 2001.

Privacy and social networks

Lei Zou, Lei Chen, M. Tamer Oszu: K-automorphism: a general framework for privacy preserving network publication. Proceedings of Very Large DataBases (PVLDB) 2009.

Elena Zheleva, Lise Getoor: To join or not to join: the illusion of privacy in social networks with mixed public and private user profiles. International Conference on World Wide Web (WWW), 2009.

Arvind Narayanan, Vitaly Shmatikov: De-anonymizing Social Networks. IEEE Symposium on Security and Privacy 2009.

Michael Hay, Chao Li, Gerome Miklau, David Jensen: Accurate Estimation of the Degree Distribution of Private Networks. IEEE International Conference on Data Mining (ICDM) 2009.

Kun Liu, Evimaria Terzi: A framework for computing the privacy score of users in online social networks. IEEE International Conference on Data Mining (ICDM) 2009.

X. Ying and X. Wu: Graph generation with prescribed feature constraints. Siam Data Mining Conference (SDM), 2009.

Kun Liu, Evimaria Terzi: Towards identity anonymization on graphs, ACM International Conference on Management of Data (SIGMOD) 2008.

Michael Hay, Gerome Miklau, David Jensen, Don Towsley, Philipp Weis: Resisting structural identification in anonymized social networks. Conference on Very Large Databases (VLDB) 2008.

Lars Backstrom, Cynthia Dwork, Jon Kleinberg: Wherefore art thoou r3579x?: anonymized social networks, hidden patterns and structural steganography, International Conference on World Wide Web (WWW), 2007.

Collaborations/Academic Honesty

All course participants must adhere to the CAS Academic Conduct Code. All instances of adacemic dishonesty will be reported to the academic conduct committee