LIANG GUO
The general topic of my Ph.D. study is on Engineering Network Resources and Traffic to Achieve Better Quality-of-Service (QoS). This includes research on routing, scheduling and end-system control, although the dominant part of my thesis is specifically on network scheduling. In this statement of research, I describe my thesis work on Scheduling TCP Flows through Internet Traffic Managers: Modeling and Practice, as well as other major research projects I have worked on. Figure 1 depicts where the projects are located in our proposed scalable QoS-aware network architecture. I also discuss directions for future research.
The past few decades have witnessed the great success of the Internet as well as its limitations. The current ``best effort'' Internet does not provide any of network-level mechanisms to guarantee end-user service requirements, such as reliability, bandwidth or delay. One of the major difficulties of providing QoS is to handle the unpredictability inside the network. The main theme of my Ph.D. thesis is on designing mechanisms for next generation networks that are more predictable and controllable. Nevertheless, such enhancement of predictability should not significantly sacrifice scalability, since it is unanimously agreed upon that the success of the current Internet is largely due to its scalability. To this end, we propose a network architecture [1] in which some special network elements -- called Traffic Managers (TMs), are strategically placed (e.g., in front of clients/servers or at exchange/peering points between administrative domains). Network traffic through these TM's is then characterized, regulated and classified so that internal routers are capable of managing traffic to achieve specific performance goals.
Our first step toward traffic management is inspired by some interesting results from previous job scheduling studies. It was shown that in a system with highly variable job length distribution, few long jobs (or ``elephants'') request most of the system service. In such system, providing rapid response to interactive jobs which place frequent but small demands (or ``mice''), can significantly reduce the overall system average response time without penalizing long jobs. Measurements from Internet indicate that network flows, whether defined at port-to-port or at network-to-network level, possess such high variability in size (bytes). It is thus natural to apply controls inside the network that favor short flow transfers. Such service scheme can be implemented through differentiated control performed at TM's.
However, compared to job scheduling, network flow scheduling is equipped with less detailed information such as how much packets the flow will transmit. We resort to a simple volume counting heuristic to determine the class a flow belongs to. That is, new flows are always allocated to the highest priority level. Once a flow has sent more than a certain amount of packets (bytes), the priority of the flow is reduced to the next lower level. Usually, to maintain scalability, the number of priority levels defined inside the network should be much less than that in a job scheduling system.
The ad-hoc way of flow classification makes it hard to
directly apply traditional job scheduling models
to analyze network scheduling problems.
To this end, in [2]  and [3],
we propose a Multi-Level feedback
queue with Discriminatory Process Sharing
scheduling (ML-DPS) model which captures the
essence of network scheduling problem if
bandwidth is shared fairly among different
flows. We are interested in obtaining the average system
performance, or the average response time
of a flow of certain size
. Closed-form
solution for such problem with Poisson session
arrivals and general flow size distribution is shown
to be difficult to solve.
In our solution, we approximate the DPS queue by
independent processor sharing queues such that
the average waiting time at each level of queues
is linear to the amount of service received at
that level. We then utilize
Kleinrock's conservation law for time-shared system
to solve for the average response time for jobs
of different sizes.
The conservation law states
that, the average amount
of work to be served, represented by a function
of the average response time
, is a constant,
regardless of the scheduling algorithm.
can then be represented by a set of linear
equations and can be solved accordingly.
Through simulation, the approximation works very
well under heavy-tailed flow size distributions.
To achieve the end-to-end performance goals, it is important to understand the way end-system transport protocols react to changes of network conditions. Our research toward this end includes analysis of traffic generated by TCP [6], the dominant Internet transmission control protocol, and proposals on improving TCP's transient behavior to achieve fair and efficient bandwidth sharing in a highly dynamic network [7,8].
TCP exploits an end-to-end window-based feedback control mechanism to estimate suitable transmission rate at which it is allowed to send. Thus, in the beginning of the connection or when network condition changes drastically, it may take a transient period for TCP to adjust its window to the appropriate value. Therefore, generally short TCP connections exhibit much higher variability than long TCP connections in terms of the traffic pattern and the overall transmission time. Moreover, through a simple Markovian model, we found [6] that when congestion becomes severe, TCP itself can produce traffic that exhibits long-range dependence over limited timescales, especially when the length of the connection is short.
In steady-state, TCP uses Additive-Increase-Multiplicative-Decrease (AIMD) control to oscillate its congestion window around the desired value. Parameters of AIMD are engineered for efficient implementation. However, they also make TCP traffic exhibit high variability. Previous proposals that try to improve traffic smoothness usually make TCP respond slower to transient condition changes due to the inevitable tension between stability and responsiveness in a control system. We observe that by maintaining certain history information such as the window value at the previous congestion epoch, we are able to design control rules that generate smoother traffic without being detrimental to traffic responsiveness [7,8]. We exploit the entire space of control rules that utilize such history information and propose a spectrum of TCP-friendly congestion control algorithms. As a bonus, due to the history information, two users can adjust their window increase rates to converge to fairness faster than without such information.
The goal of QoS routing is to obtain feasible paths that satisfy end-system performance requirements. My research involves (1) computing communication paths (or routes) under performance/QoS constraints for one-to-one communication (i.e. QoS Unicast Routing); (2) computing QoS paths that leads to multiple destinations (i.e. QoS Multicast Routing); (3) computing these QoS multicast paths in a scalable way through a hierarchical organization of the network; (4) designing a scalable QoS multicast protocol that allows for dynamic member joins and leaves.
Many real-time applications require the transmission of diverse traffic streams with diverse QoS requirements such as bounded delay, bounded delay variation (jitter), and guaranteed loss probability. Thus, we need to find an efficient QoS routing algorithm that is able to find low-cost paths that can satisfy varying QoS. The QoS routing problem can be formulated as a constrained shortest (least-cost) path routing problem and is known to be NP-hard.
For one-to-one (unicast) communication, we proposed a fast algorithm [9], namely SSR+DCCR to find a near-optimal solution. This algorithm first uses Lagrangian relaxation method to define a tighter bound on path cost. It then exploits the salient ability of a non-linear function of link metrics to accurately search for the optimal feasible solution. Such hybrid algorithm is beneficial due to both the fast convergence of the Lagrangian method and the accuracy of the non-linear cost function, and thus can return a near-optimal solution in a very short time.
I have also been working on the problem of routing traffic of multicast applications requiring QoS guarantees. However, even a related simpler problem (the Steiner problem of finding a low-cost tree without considering QoS constraints) has been proved to be computationally expensive. We proposed an efficient algorithm, namely QDMR, which provides good solutions for large networks [10]. This algorithm dynamically adjusts its low-cost multicast path construction policy based on how far the current on-tree node is from violating the QoS delay bound. This on-line feature guarantees the time efficiency as well as the cost effectiveness of our proposed algorithm.
Hierarchical topology aggregation is a common technique for scaling to large networks. In our recent research [11], we have also investigated the interaction among the topology aggregation scheme, the path selection policy, and the workload distribution. We compared several topology aggregation schemes that are different in terms of the accuracy of the routing information they provide. By means of the Z-iteration method which numerically solves dynamic flow models of multi-service connection-oriented networks [12], we considered realistic skewed workloads as well as uniformly distributed workloads. Our simulation results led us to an interesting conclusion: under skewed workload, a less aggressive (and hence more costly) aggregation scheme, which results in a more detailed view of the network presented to the path selection algorithm, may lead to an overly restrictive selection of paths. This in turn adversely affects performance due to traffic concentration.
We are also interested in designing a scalable QoS-sensitive multicast protocol that allows for dynamic member joins and leaves. To this end, we are seeking a scheme that has little storage requirement and communication overheads. Through analysis and extensive simulations, we have compared several proposed schemes in terms of their capability of managing network resources and their protocol overhead. We also investigated the effect of different link cost functions on the performance of multicast routing algorithms [13]. In light of these performance studies, we are planing to study probabilistic multicasting techniques and scalable feedback control methods to control the message overhead brought by coordination in distributed environments.
The importance of network QoS is well-accepted by both the research community and the commercial service providers. However the deployment of QoS in the Internet is still highly controversial due to concerns of scalability, robustness and stability. I am interested in continuing my research on attacking these problems. I also plan to extend my research to other emerging Internet applications and services that have other QoS requirements.
In the area of routing, recent development in network measurement techniques has equipped the end-systems with enriched knowledge on network structures. This opens up a whole new space for performing scalable QoS routing at the end-systems. An example is the End-system Multicast approach which gets around the difficulty of deploying multicast at network level. It is worthwhile to delve into this promising area of research.
The efficiency of end-system control strongly depends on the accuracy of network measurement and characterization techniques. The ubiquitous self-similarity shown in network traffic, topology and user access pattern brings significant challenges to traditional parsimonious models. Moreover, optimal control rules under traditional Markovian assumptions may not work well for self-similar models. Exploiting the self-similar property to achieve more efficient controls is another direction I plan to pursue.
Our proposed differentiated scheduling of mice and elephants need to be extended for aggregated traffic to enhance scalability. Characterizing the behavior of aggregated traffic from mice and elephants is certainly a stepstone for such extension. Moreover, control theory is a promising approach for system stability analysis.
There are other functionalities that the Traffic Managers architecture can provide. For example, the expanding demand from wireless applications has spurred the issues of differentiating lossy links from congested links. TM's can help the end-system achieve better throughput by avoiding unnecessary rate reductions.
Last but not least, I am always interested in modeling and statistical analysis techniques. The flexible Internet service model accommodates a great diversity of applications, from text-based e-mail to multimedia video-conference. It however hinders the development of a parsimonious analytical model for the Internet. Pioneer work by Frank Kelly [14] illustrates how techniques from control theory and statistical queueing networks can be applied to optimal traffic control. His work inspires me to re-utilize classical queueing models for solving emerging network resource management and sharing problems.
This document was generated using the LaTeX2HTML translator Version 99.2beta6 (1.42)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 research
The translation was initiated by Liang Guo on 2002-11-07