Research Project

Heterogeneity-Aware Adaptive Placement in Hybrid Edge–Cloud Stream Processing

This project explores how datacenter-centric stream-processing engines such as Apache Flink can be extended to support hybrid edge–cloud analytics without forcing users to rewrite applications. The goal is to make heterogeneous deployments practical under variable WAN bandwidth, shifting workloads, and constrained edge resources.

Apache Flink Hybrid Edge–Cloud Adaptive Routing Query Rewriting Fault Tolerance Event-Time Processing

Overview

Modern analytics pipelines increasingly ingest data at the edge—through cameras, sensors, wearables, drones, and other devices—while still relying on cloud-hosted systems for storage, aggregation, and decision-making. Traditional dataflow engines assume relatively homogeneous resources and stable datacenter conditions, which makes them poorly suited for deployments where WAN links fluctuate, edge devices differ in compute capacity, and input rates vary over time.

This work introduces a hybrid edge–cloud stream-processing framework built on Apache Flink that automatically rewrites applications into independently deployable segments and dynamically shifts selected operators between edge and cloud. The design preserves the usability benefits of mature stream engines while adding the adaptability required for heterogeneous wide-area settings.

Why This Is Needed

Core Ideas

Automatic Query Rewriting

Flink DataStream programs are transformed into independent query segments that can be scheduled, monitored, and reconfigured separately.

Adaptive Routing

Stateless operators before the first stateful stage can execute at the edge or in the cloud, allowing the system to shift computation as network and workload conditions change.

Near-Zero-Downtime Reconfiguration

Reconfiguration is driven by routing changes rather than full task migration, which avoids stop-and-restart behavior and keeps the pipeline live.

Compatibility with Existing Semantics

The framework is designed to preserve event-time processing, watermark propagation, and checkpoint-based fault-tolerance guarantees expected from modern stream engines.

How the Framework Works

Adaptive Placement Policy

The routing policy distinguishes between two main bottlenecks: network bottlenecks and edge compute bottlenecks. When WAN bandwidth becomes the limiting factor, the system moves suitable operators toward the edge to reduce traffic volume. When bandwidth is sufficient but the edge cannot keep up, more work is shifted back to the cloud. This process is repeated until the pipeline sustains the target throughput without backpressure.

Correctness and Processing Guarantees

Evaluation Highlights

Representative Application Scenarios

Air Quality Monitoring

Continuous PM2.5 sensor monitoring with downstream alerting and analytics.

Taxi Route Monitoring

Streaming trip reports to identify frequent routes for mapping and pricing systems.

Smart Grid Anomaly Detection

Monitoring smart-meter streams to detect and escalate abnormal household energy spikes.

Sentiment Analysis

Geo-tagged social media processing feeding downstream advertising and analytics pipelines.

Video-Assisted Traffic Monitoring

Analyzing continuous video feeds from cameras to detect traffic violations and anomalies.

Implementation Notes

The framework is implemented on Apache Flink and Apache Kafka as the message broker service. Query rewriting is performed automatically, allowing users to submit Flink applications without manually splitting pipelines or deciding placement in advance. The implementation also includes mechanisms for routing control, watermark handling across segments, and scalable policy evaluation.

Research Significance

This work shows that hybrid edge–cloud analytics does not necessarily require a clean-slate processing engine. By extending mature dataflow systems with adaptive placement, it is possible to retain the ecosystem advantages of tools like Apache Flink while adding the dynamic behavior needed for real-world heterogeneous deployments.

Note: This page summarizes an ongoing research effort. Since the work is not yet published under a public project identity, this page intentionally avoids using the internal system name and other details from the draft manuscript.