Khaled A. Harfoush - Research Statement

I am strongly driven by the desire to produce research with long lasting impact and strongly believe that good researchers should not only know how to approach a problem at hand but should also know how to spot important problems in the field.

My general areas of research interest are in Networking-a constantly evolving field with ample research opportunities. My previous research projects have focused on two complementary problems. The first concerned the end-to-end characterization of Network properties through the construction of compact, efficient Network models that capture Network dynamics. The availability of such models would assist in the development of network-aware services. The second problem concerned the adaptation of control strategies of transport protocols and network applications at massively accessed Internet servers in order to more efficiently utilize shared network resources and optimize the content delivery process.

My initial approach to solving these (as well as other) problems typically involved the use of modeling and mathematical analysis to generate exact results. This is followed by the use of more empirical approaches involving implementation and deployment to enable experimental evaluation under realistic conditions and assumptions.
 

1. Overview

Most applications nowadays do not provide service guarantees and thus hosts may experience varying performance over the lifetime of a connection. Network-aware applications attempt to react to changes in network resource availability and/or network performance. This reaction to network conditions is essential to better utilize network resources and optimize content delivery, especially because of the considerable strain on Internet resources imposed by the phenomenal growth of the World Wide Web. In order for Network-aware applications to function properly they need accurate, efficient and scalable models capturing network conditions and properties over interesting parts of the network.

Massively Accessed Scalable Servers (a.k.a. Mass servers) are popular Internet servers, which produce a substantial fraction of the traffic flowing through the network. Mass servers are uniquely positioned (1) to observe and diagnose network conditions by tracking the flows that they generate, and (2) to manage and control network resources by better regulating and scheduling the traffic they inject into the network

It is desirable to achieve these goals over a wide spectrum of time scales. Over shorter time scales, a Mass server can minimize packet loss by smoothing the (bursty) process of injecting packets into the network. Over longer time scales, a Mass server can perform aggregate congestion control by wisely bundling like connections to avoid the burstiness that results from competition among flows.

Network diagnostic models can play a big role in Internet characterization. They can also optimize the deployment of a variety of applications and services. Examples include Server Selection, Overlay Network Organization, Admission Control, Flow Scheduling and Cache/Replica Placement. The various implications and benefits of Network models provide a large pool of research opportunities and promise significant research impacts.

I have made a number of research contributions related to the above general goals. My recent contributions have been in the areas of Internet measurements and diagnosis, Network modeling and management, and control protocols and services. For example:

Prior to my work on networking related problems, I had also worked on various problems in the field of Neural Networks and Speech Recognition. A full list of my publications is electronically available off of my Web page (http://cs-people.bu.edu/harfoush).

In the following sections I describe three representative pieces of work that I have done. In each section I will highlight the problem, describe the solution approach and the impact of my research.
 

2. Metric-Induced Network Topologies

    Problem: The development and deployment of distributed network-aware applications and services over the Internet require the ability to compile and maintain a model of the underlying network resources with respect to (one or more) characteristic properties of interest. To be manageable, such models must be compact, and must enable a representation of properties along temporal, spatial, and measurement resolution dimensions.

    Approach: We proposed a general framework for the construction of such metric-induced models using end-to-end measurements. We instantiated our approach using one such property, packet loss rates, and present an analytical framework for the characterization of Internet loss topologies. From the perspective of a server the loss topology is a logical tree rooted at the server with clients at its leaves, in which edges represent lossy paths between a pair of internal network nodes. We show how end-to-end unicast packet probing techniques could be used to (1) infer a loss topology and (2) identify the loss rates of links in an existing loss topology. We report on simulation, implementation, and Internet deployment results that show the effectiveness of our approach and its robustness in terms of its accuracy and convergence over a wide range of network conditions. A contribution of this work is to provide a mechanism to integrate metric-induced models collected at different hosts into one larger model. Another contribution is to provide a mechanism to integrate different metric-induced models collected from the same host at different points in time. These integration mechanisms allow the uncovering of more network details.

    Output and Impact: This work has resulted in two publications. The framework itself is described in a paper to appear in INFOCOM?02 [1] and the framework implementation in the Linux kernel (a.k.a. the Periscope Toolkit) is described in a paper to appear in PAM 2002 [2]. Periscope is being used by a number of researchers investigating various network-aware Internet and Peer-to-Peer applications.
     
     

3. End-to-end Characterization of Shared Loss Rates
    Problem: Current Internet transport protocols make end-to-end measurements and maintain per-connection state to regulate the use of shared network resources. When two or more such connections share a common endpoint, there is an opportunity to correlate the end-to-end measurements made by these protocols to better diagnose and control the use of shared resources.

    Approach: We developed packet-probing techniques to determine whether a pair of connections experience shared congestion. Our extensive simulation results demonstrated that the conditional (Bayesian) probing approach we employ provides superior accuracy, converges faster, and tolerates a wider range of network conditions than recently proposed memoryless (Markovian) probing approaches [5] for addressing this opportunity.

    Output and Impact: This work appeared in the proceedings of ICNP 2000 [4] and the Bayesian probing techniques are now part of the Periscope toolkit.
     
     

4. Bottleneck Bandwidth Along Targeted Path Segments
    Problem: Accurate measurement of network bandwidth is crucial for flexible Internet applications and protocols, which actively manage and dynamically adapt to changing utilization of network resources. These applications must do so to perform tasks such as distributing and delivering high-bandwidth media, scheduling service requests and performing admission control. Extensive work has focused on two approaches to measuring bandwidth: measuring it hop-by-hop, and measuring it end-to-end along a path. Unfortunately, best-practice techniques for the former are inefficient and techniques for the latter are only able to observe bottlenecks visible at end-to-end scope.

    Approach: We developed and simulated end-to-end probing methods, which can measure bottleneck bandwidth along arbitrary, targeted sub-paths of a path between two end-points in the network (including sub-paths shared by a set of flows). As another important contribution, we described a number of practical applications which we foresee as standing to benefit from solutions to this problem, especially in emerging, flexible network architectures such as overlay networks, ad-hoc networks, peer-to-peer architectures and massively accessed content servers.

    Output and Impact: This work is submitted for publication to SIGCOMM 2002 [3] (and is available as a Technical Report).
     
     

5. Future Research Directions

As a follow-up of my thesis research, there are many directions that I would like to explore further. Let me briefly discuss some of these directions:

Investigate whether it is possible to diagnose network conditions by simply inspecting original flow packets, without injecting additional probe packets in the network. For which metrics is this possible? How accurate are the results? The MINT framework abstracts any metric based on three properties: Monotonicity, Separability and Symmetry. Are there other properties that can be exploited? If yes then what is their impact on the framework inference, labeling and integration procedures? Different applications need different views of the network (different metrics of interest and different diagnostic resolutions, etc). Given an appropriate network diagnosis, what is the optimal way for an application to use the information? Cartouche Probes are used to infer the Bottleneck Bandwidth over a path prefix. Investigate ways to extend Cartouche probes to estimate the Bottleneck Bandwidth along arbitrary path segments. In addition to the above thesis-inspired problems, I am also very interested in pursuing research in other areas of networking research Wireless information access is becoming increasingly important which leads to a growing interest in wireless mobile ad-hoc networks. The highly dynamic nature of these networks, the potentially complex communication environment and the varying capability needs of the participants limit traditional Internet protocols ability to manage mobile ad-hoc network resources. This research field s very promising offering a wide range of research opportunities ranging from assigning participants unique IP addresses to security implications. It has been planned that I will be installing wireless lab in the school of Management at Boston University in the Spring 2002 semester. The lab will serve as a test bed for various wireless experiments. In order to provide Quality of Service, the Integrated Services paradigm provides per flow guarantees but suffers scalability problems due to the need to store per flow state at core routers. On the other hand, the Differentiated Services paradigm tries to solve scalability issues by removing per flow state from core routers. Traffic entering the network is classified and conditioned at the network boundaries and packets are assigned service bits. Core routers inspect packets service bits and schedule packets accordingly providing guarantees to flow aggregates and not to flows. Both Integrated and Differentiated services need router support. End-to-end admission control looks into providing statistical guarantees to flows without router support by relying on network diagnosis information. The field is relatively new and needs a lot of investigation. It has the potential of being deployed in today?s Internet. While I am eager to establish my identity as an independent researcher, I believe that research is a collaborative effort. It is collaborative on many fronts between industry, funding agencies and academia (faculty members and students). As a future academic, I will strive to bring research parties together, bring to the table many research ideas and influence my students to be sharp and independent thinkers.
 
 

References

[1] Bestavros, Azer; Byers, John; Harfoush, Khaled. Inference and Labeling of Metric-Induced Network Topologies. To appear in Proceedings of IEEE INFOCOM 2002, New York City, New York, June 2002.

[2] Harfoush, Khaled; Bestavros, Azer; Byers, John. PeriScope: An Active Probing API. To appear in Proceedings of PAM 2002, Passive and Active Measurement Workshop, Fort Collins, Colorado, March 2002.

[3] Harfoush, Khaled; Bestavros, Azer; Byers, John. Measuring Bottleneck Bandwidth of Targeted Path Segments. Technical Report BUCS-TR-2001-016 and submitted to ACM SIGCOMM 2002 for publication.

[4] Harfoush, Khaled; Bestavros, Azer; Byers, John. Unicast-based Characterization of Network Loss Topologies. In Proceeding of ICNP 2000: The 6th IEEE International Conference on Network Protocols (ICNP), Osaka, Japan, October 2000.

[5] D. Rubenstein, J. Kurose and D. Towsley. Detecting Shared Congestion of Flows via End-to-end Measurement. In Proceedings of ACM SIGMETRICS?00, Santa Clara, CA, June 2000.