[Download CV]

  • Office: MCS 227F
  • Email: vkalavri@bu.edu
  • Office hours: Mon 4pm-5:30pm, Thu 5pm-6pm

Vasiliki (Vasia) Kalavri

Assistant Professor
Department of Computer Science, Boston University

About me

My research interests include distributed stream processing and large-scale graph analytics.

Before coming to BU, I was a member of the Systems Group at ETH Zurich, where I worked on Strymon, a system for predictive datacenter analytics. In September 2017 I was awarded the ETH Zurich Postdoctoral Fellowship for my research project "Automatic Scaling of Distributed Streaming Computations Using Graph Analytics on Real-Time Monitoring Data". I received my PhD from KTH, Stockholm, and UCL, Belgium, where I was admitted to a double doctoral program as an EMJD-DC fellow. My thesis, "Performance Optimization Techniques and Tools for Distributed Graph Processing" received the IBM Innovation Award 2017. During my PhD studies, I also spent time at DIMA TU Berlin, Telefonica Research Barcelona, and data Artisans.

Apache Flink Book

Fabian Hueske and I have recently published Stream Processing with Apache Flink by O'Reilly Media. If you read it, please leave a review or send us feedback and errata.

Prospective students

I'm recruiting motivated PhD students to join my research lab in Fall'2022. The application deadline is December 15 and GRE scores are not required. The Graduate School of Arts & Sciences offers application fee waivers for first generation applicants and women. Apply here.


22 April 2022

3 papers accepted at SIGMOD'22 workshops

CASP Systems Lab will be presenting papers at the BiDEDE'22, aiDM'22, and DEEM'22 workshops.

18 April 2022

Talk at Samsung MSL

I gave a Samsung MSL Tech Talk titled Next-generation state management for streaming workloads.

18 January 2022

Paper accepted at EuroSys’22

Our paper "A New Benchmark Harness for Systematic and Robust Evaluation of Streaming State Stores" has been accepted for presentation at the EuroSys'22 conference.

16 December 2021

Research Incubation Awards from the Red Hat Collaboratory

My research lab has received three Research Incubation Awards from the BU Red Hat Collaboratory.

13 December 2021

Paper accepted at ACM SAC'22

Our paper "Learning on streaming graphs with experience replay" has been accepted to appear at the 2022 ACM/SIGAPP Symposium on Applied Computing (SAC'22). See the preprint pdf here.

3 November 2021

Postdoc position available

We invite applications for a postdoc position in self-managed and power-efficient stream processing systems. Applications will be reviewed on a rolling basis starting November 22 and until the position is filled. More and information and instructions on how to apply can be found here.

1 September 2021

Awards from Google and Samsung

I am pleased to announce two recently awarded gifts: (i) A 2021 Data Acquisition, Processing and Analysis (DAPA) Award by Google and (ii) a Samsung Memory Solutions Lab award to support our ongoing work on Streaming Graph Representation Learning.

20 August 2021

Talk at the 14th TUC meeting

I gave a talk about "Learning to partition unbounded graph streams" in the LDBC TUC meeting, co-located with VLDB'21. The slides are available here

28 May 2021

Hariri FRP Award

Our Focused Research Program Proposal Continuous Analysis of Mobile Health Data among Medically Vulnerable Populations has been selected by the Hariri Institute for Computing for an award.

19 May 2021

Co-Chairing ACM SIGMOD'22 SRC

I am excited to serve a co-chair of the SIGMOD Student Research Competition together with Yongjoo Park.

12 March 2021

The CASP Systems Lab has a website!

After an eventful 1st year, my research lab, Complex Analytics & Scalable Processing, finally has a website. You can also follow us on Twitter @CASPSystems.


My research focuses on various aspects of (distributed) data-centric systems.
More recently, I have been working on self-managed stream processing systems and graph streaming systems.

Self-managed streaming systems

Stream processing research has come a long way and streaming systems have matured considerably since their invention, almost 3 decades ago. What will the next generation of stream processing systems look like? This is the question I have been working on answering with my most recent and ongoing work. We envision a next-generation of stream processing systems that are not only scalable and reliable, but also capable of self-management and automatic reconfiguration without downtime.

Performance analysis and modeling

As stream processing applications are long-running, it is a matter of time before any initial configuration becomes out-of-tune. If the system is not capable of adapting to workload changes, this might lead to data loss, idle resources, and SLO violations. To avoid such situations, we need to continuously monitor system operation, identify changing conditions, and react. See our recent work on understanding the performance of streaming dataflows by generalizing online critical path analysis.

Automatic optimization and reconfiguration

One of the biggest challenges streaming systems face is dealing with variance in input workloads. Contrary to batch processing and database management systems, a stream processing system has no control over the stream arrival rate (or order). How can streaming systems continuously satisfy QoS without wasting resources? Instead of using simplified predictive system models and externally observed noisy metrics, we proposed a white-box approach. DS2 is an automatic scaling controller that leverages system instrumentation and operator dependencies to extract accurate metrics and provide automatic elasticity with accuracy, stability, performance, and safety.

Workload-aware state management

Distributed instrumentation enables a set of reconfiguration decisions and enables stream processors to manage their resource allocation. This model relies on an external controller that continuously monitors the system’s performance and sends control commands to the stream processor’s cluster manager module. There are, however, additional optimizaiton opportunities if we look at how stream processing systems perform fundamental internal operations, such as state management. As streaming dataflow operators are instantiated once and are long-running, their access patterns and state size bounds are largely known in advance (or can be learned). We recently proposed workload-aware state management: using configurable state stores with support for different layouts and data types, and leveraging knowledge about operator state characteristics.

Graph streaming systems

Continuous analysis of graph streams is an emerging application area, where events indicate edge and vertex additions, deletions, and modifications. Even though there exist many specialized systems for dynamic graph processing, modern stream processors do not inherently support graph streaming use-cases. Graph streaming computations are challenging to implement with today’s stream processors because graph streaming algorithms do not nicely fit the dataflow model. Dataflows pushe data through a series of operators that apply transformations until they produce the end result. Such a model is not suitable for graph computations which instead require multiple passes over the graph state. This challenge can be addressed by either using a cyclic dataflow model or by implementing single-pass graph streaming algorithms. See our recent survey preprint for an overview of the area.

The field of graph streaming systems is still at its infancy and there exist many fundamental problems to solve. While there is abundant work on graph streaming algorithms, most of this theoretical work is not readily applicable to the streaming model used in practice by streaming systems. See our recent experimental study on streaming graph partitioning algorithms for an example.

For a full list of publications, visit my Google Scholar profile.


Spring 2022: CAS CS551 Streaming and Event-Driven Systems [Tentative syllabus]

Fall 2021: CAS CS210 Computer Systems

Spring 2021: CS 591 K1 Data Stream Processing and Analytics

Fall 2020: CAS CS210 Computer Systems

Spring 2020: CS 591 K1 Data Stream Processing and Analytics

Spring 2019: Data Stream Processing and Analytics (ETH Zurich)

Summer Schools and Conference Tutorials


Conferences, workshops, competitions

IEEE ICDE 2022 (Area Chair), ACM SIGMOD 2022, VLDB 2022, HAOC'21 (PC Member), GRADES-NDA 2021 (co-Chair), ACM DEBS 2021 (PC Member), USENIX ATC 2021 (PC Member), ACM/IFIP Middleware 2021 (PC Member), IEEE ICDE 2021 (PC Member), ACM SIGMOD 2020 Student Research Competition (Judge), EuroSys 2021 (AMA Session coordinator), ACM DEBS 2020 (PC Member), ICDE 2020 (Demonstration Track), EDBT 2020 (Demonstration Track), CCGrid 2019 (Applications and Data Science track co-Chair), OPODIS 2018 (PC member), Middleware Doctoral Symposium (ACM/IFIP Middleware 2020), GRADES-NDA 2020 (co-located with SIGMOD 2020), USENIX HotStorage 2020, DBPL 2019 (co-located with PLDI 2019), GRADES-NDA 2019 (co-located with SIGMOD 2019), DBTest 2018 (co-located with SIGMOD 2018), GRADES-NDA 2018 (co-located with SIGMOD 2018), GABB 2018 (co-located with IPDPS 2018), GABB 2017 (co-located with IPDPS 2017), DEEM 2017 (co-located with SIGMOD 2017).

Industrial Conferences

Flink Forward Berlin 2019, Flink Forward San Francisco 2017, Berlin Buzzwords 2017, Flink Forward Berlin 2016, Berlin Buzzwords 2016


From data stream management to distributed dataflows and beyond at North East Database Day 2020. [slides]

Three steps is all you need: fast accurate, automatic scaling decisions for distributed streaming dataflows at USENIX OSDI 2018. [slides] [audio]

Predictive Datacenter Analytics with Strymon at QCon San Francisco 2017. [slides] [video]

Online performance analysis of distributed dataflow systems at O'Reilly Velocity London 2017. [slides] [video]

Graphs as Streams: Rethinking Graph Processing in the Streaming Era at Berlin Buzzwords 2016. [slides] [video]

Demystifying Distributed Graph Processing at dotScale 2016. [slides] [video]