My research interests include distributed stream processing and large-scale graph analytics.
Before coming to BU, I was a member of the Systems Group at ETH Zurich, where I worked on Strymon, a system for predictive datacenter analytics. In September 2017 I was awarded the ETH Zurich Postdoctoral Fellowship for my research project "Automatic Scaling of Distributed Streaming Computations Using Graph Analytics on Real-Time Monitoring Data". I did my PhD at KTH, Stockholm, and UCL, Belgium, where I was admitted to a double doctoral program as an EMJD-DC fellow. My thesis, "Performance Optimization Techniques and Tools for Distributed Graph Processing" received the IBM Innovation Award 2017. During my PhD I also spent time at DIMA TU Berlin, Telefonica Research Barcelona, and data Artisans.
- Our tutorial Beyond Analytics: the Evolution of Stream Processing Systems has been accepted at SIGMOD'20 !
- I will be one of the keynote speakers at the North East Database Day 2020.
- New paper "Practice of Streaming and Dynamic Graphs: Concepts, Models, Systems, and Parallelism" is now available on arXiv.
- Demo paper accepted at BIRTE'19!
Our paper FASTER State Management for Timely Dataflow by Matthew Brookes, Vasiliki Kalavri, and John Liagouris has been accepted at BIRTE'19. You can find the full workshop program here and the paper here.
- Stream Processing with Apache Flink is now available!
I am a committer and PMC member of Apache Flink, an open-source framework and distributed execution engine for stream processing. I have written a book about the system together with Fabian Hueske.
Check it out!
We decribe fundamental concepts of parallel stream processing and discuss how streaming analytics differ from traditional batch data analysis. The book targets software engineers, data engineers, and system administrators willing to learn the basics of Flink's DataStream API, including the structure and components of a common Flink streaming application.
I am broadly interested in three dimensions of big data analytics and stream processing:
Systems: how to design and implement scalable data processing systems whose capabilities stretch beyond those of traditional data management platforms? My recent work in this area includes understanding the performance of streaming dataflows and enabling accurate automatic scaling of streaming jobs.
Algorithms: how to represent, partition, summarize, and analyze possibly unbounded data of various formats and originating from diverse, distributed sources? My recent work in this area includes a distributed graph summarization technique and a survey of streaming graph partitioning methods in the context of data-parallel continuous processing.
Programming models: how to achieve end-to-end, efficient big data processing while providing expressive, high-level programming models, accessible to data scientists and non-expert users?
My recent work in this area includes a survey of high-level programming abstractions for distributed graph processing.
For a full list of publications, visit my Google Scholar page.
M.Hoffmann, A. Lattuada, F. McSherry, V. Kalavri, J. Liagouris, T. Roscoe, Megaphone: Live state migration for distributed streaming dataflows, Proc. VLDB Endow. (2019). [pdf]
V. Kalavri, J. Liagouris, M. Hoffmann, D. Dimitrova, M. Forshaw, T. Roscoe, Three steps is all you need: fast accurate, automatic scaling decisions for distributed streaming dataflows, in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2018. [pdf] [slides]
Z. Abbas, V. Kalavri, P. Carbone, V. Vlassov, Streaming Graph Partitioning: An Experimental Study, in Proc. VLDB Endow. 11, 11 (2018). [pdf]
M. Hoffmann, A. Lattuada, J. Liagouris, V. Kalavri, D. Dimitrova, S. Wicki, Z. Chothia, T. Roscoe, SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows, in 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI '18). [pdf] [slides]
V. Kalavri, V. Vlassov and S. Haridi, High-Level Programming Abstractions for Distributed Graph Processing, in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 2, pp. 305-324, Feb. 1 2018. [pdf]
Spring 2020: [CS 591 K1] Data Stream Processing and Analytics
Spring 2019 (ETH Zurich): Data Stream Processing and Analytics
3rd International Summer School on Data Science, Croatia, 2018.
Big Data Analytics Summer School, Stockholm, 2017 & 2018.
Invited tutorial at the 31st British International Conference on Databases (BICOD), London, 2017.
Apache Flink tutorials at BOSS workshop (VLDB), 2016 & 2017.
2nd Int’l ScaDS Summer School on Big Data, Germany, 2016.
EIT Summer School on Cloud and Big Data, Sweden, 2016.
2017, IBM Innovation Award.
In recognition of an outstanding PhD thesis that presents an original contribution to informatics or its applications.
2017, ETH Zurich Postdoctoral Fellowship.
Research project: Automatic scaling of distributed streaming computations using graph analytics on real-time monitoring data.
2012, Erasmus Mundus Doctoral Fellowship.
Hosts: KTH Royal Institute of Technology and Universite catholique de Louvain.
For more presentations, visit my Slideshare profile.
From data stream management to distributed dataflows and beyond at North East Database Day 2020. [slides]
Workshop on Performance Debugging in Modern Computer Systems (co-located with ICDCS 2020)
Middleware Doctoral Symposium (ACM/IFIP Middleware 2020)
GRADES-NDA 2020 (co-located with SIGMOD 2020)
USENIX HotStorage 2020
DBPL 2019 (co-located with PLDI 2019)
GRADES-NDA 2019 (co-located with SIGMOD 2019)
DBTest 2018 (co-located with SIGMOD 2018)
GRADES-NDA 2018 (co-located with SIGMOD 2018)
GABB 2018 (co-located with IPDPS 2018)
GABB 2017 (co-located with IPDPS 2017)
DEEM 2017 (co-located with SIGMOD 2017)
Flink Forward Berlin 2019
Flink Forward San Francisco 2017
Berlin Buzzwords 2017
Flink Forward Berlin 2016
Berlin Buzzwords 2016