[Download CV]

  • Office: MCS 206
  • Email: vkalavri@bu.edu
  • Office hours (Fall 2020): TBD

Vasiliki (Vasia) Kalavri

Assistant Professor
Department of Computer Science, Boston University

About me

My research interests include distributed stream processing and large-scale graph analytics.

Before coming to BU, I was a member of the Systems Group at ETH Zurich, where I worked on Strymon, a system for predictive datacenter analytics. In September 2017 I was awarded the ETH Zurich Postdoctoral Fellowship for my research project "Automatic Scaling of Distributed Streaming Computations Using Graph Analytics on Real-Time Monitoring Data". I received my PhD from KTH, Stockholm, and UCL, Belgium, where I was admitted to a double doctoral program as an EMJD-DC fellow. My thesis, "Performance Optimization Techniques and Tools for Distributed Graph Processing" received the IBM Innovation Award 2017. During my PhD studies, I also spent time at DIMA TU Berlin, Telefonica Research Barcelona, and data Artisans.


Apache Flink Book

Fabian Hueske and I have recently published Stream Processing with Apache Flink by O'Reilly Media. If you read it, please leave a review or send us feedback and errata.

News

3 August 2020

New Survey Paper Available

Our paper "A Survey on the Evolution of Stream Processing Systems" is now available on the arxiv.

14 July 2020

Outstanding New Research Direction Award at HotStorage'20

Our USENIX HotStorage'20 paper In support of workload-aware streaming state management won the Outstanding New Research Direction Award and was a finalist for the Best Presentation Award.

4 June 2020

New SIGOPS Blog Post

I contributed an article to the SIGOPS Blog where I discuss the evolution and future of stream processing systems.

5 May 2020

Guest Lecture at UT Austin

I gave a Zoom guest lecture on large-scale stream processing at Vijay Chidambaram's distributed systems class.

[Video recording] [Slides].

27 April 2020

Paper Accepted at USENIX HotStorage'20

Our position paper, titled In support of workload-aware streaming state management, was accepted for presentation at the upcoming 12th USENIX Workshop on Hot Topics in Storage and File Systems. The workshop will take place on July 13, 2020 and it will be virtually co-located with USENIX ATC'20.

4 February 2020

Tutorial Accepted at SIGMOD'20

We will be presenting the tutorial Beyond Analytics: the Evolution of Stream Processing Systems at ACM SIGMOD'20.

[Tutorial website] [Paper].

20 January 2020

Keynote at NEDB Day 2020

I will be one of the keynote speakers at the North East Database Day 2020.

Research

I am broadly interested in three dimensions of big data analytics and stream processing:

Systems

How can we design and implement scalable data processing systems whose capabilities stretch beyond those of traditional data management platforms? My recent work in this area includes understanding the performance of streaming dataflows and enabling accurate automatic scaling of streaming jobs.

Algorithms

How can we represent, partition, summarize, and analyze possibly unbounded data of various formats and originating from diverse, distributed sources? My recent work in this area includes a distributed graph summarization technique and a survey of streaming graph partitioning methods in the context of data-parallel continuous processing.

Programming models

How can we achieve end-to-end, efficient big data processing while providing expressive, high-level programming models, accessible to data scientists and non-expert users? My recent work in this area includes a survey of high-level programming abstractions for distributed graph processing.

For a full list of publications, visit my Google Scholar profile.

Teaching

Fall 2020: CAS CS210 Computer Systems

NOTE: The class is full. Fill out this form to sign up for the waitlist.

Spring 2020: CS 591 K1 Data Stream Processing and Analytics

Spring 2019: Data Stream Processing and Analytics (ETH Zurich)

Summer Schools and Conference Tutorials

SERVICE

Conferences, workshops, competitions

IEEE ICDE 2021 (PC Member), ACM SIGMOD 2020 Student Research Competition (Judge), EuroSys 2021 (Travel Grant co-Chair), ACM DEBS 2020 (PC Member), ICDE 2020 (Demonstration Track), EDBT 2020 (Demonstration Track), CCGrid 2019 (Applications and Data Science track co-Chair), OPODIS 2018 (PC member), Middleware Doctoral Symposium (ACM/IFIP Middleware 2020), GRADES-NDA 2020 (co-located with SIGMOD 2020), USENIX HotStorage 2020, DBPL 2019 (co-located with PLDI 2019), GRADES-NDA 2019 (co-located with SIGMOD 2019), DBTest 2018 (co-located with SIGMOD 2018), GRADES-NDA 2018 (co-located with SIGMOD 2018), GABB 2018 (co-located with IPDPS 2018), GABB 2017 (co-located with IPDPS 2017), DEEM 2017 (co-located with SIGMOD 2017).

Industrial Conferences

Flink Forward Berlin 2019, Flink Forward San Francisco 2017, Berlin Buzzwords 2017, Flink Forward Berlin 2016, Berlin Buzzwords 2016

TALKS

From data stream management to distributed dataflows and beyond at North East Database Day 2020. [slides]

Three steps is all you need: fast accurate, automatic scaling decisions for distributed streaming dataflows at USENIX OSDI 2018. [slides] [audio]

Predictive Datacenter Analytics with Strymon at QCon San Francisco 2017. [slides] [video]

Online performance analysis of distributed dataflow systems at O'Reilly Velocity London 2017. [slides] [video]

Graphs as Streams: Rethinking Graph Processing in the Streaming Era at Berlin Buzzwords 2016. [slides] [video]

Demystifying Distributed Graph Processing at dotScale 2016. [slides] [video]