Homework 3
(due Thursday, April 4, 2002)

IMPORTANT: This assignment must be submitted using WebCT (Log on with your ACS password, not your CS or CS-NT password). Late assignments will not be accepted by the system. I highly recommend you submit your assignment before the due date, rather than waiting until the last minute. I will not accept printed or emailed homeworks.

For this assignment, you are to perform some network research and report your results in Excel and PowerPoint.  I will call on three or more students at random to make presentations to the class.

You are to measure latency and throughput across the Internet.  Recall that latency is the amount of time it takes between your request and the first response (e.g., "the first car out of the car wash").  Throughput is a measure of how many units can be transferred in a given amount of time (e.g., how many cars go through the car wash per hour).  Recall that, with the car-wash example, increasing the number of washers increases throughput, but not latency.

To do your research, you need to use the UNIX command ping  Read the man page for ping first.  You want to use ping with the -s option.  Also, you should specify the size and number of packets you are sending.  For example,

            ping -s www.ati.tn 64 10

sends ten 64-byte packets to www.ati.tn
 

Latency

Test the latency to a variety of locations, some nearby and some far away.  Test each site at least five different dates and times during normal business hours, five different dates and times during the evening, and five different dates and times during the weekend.  For each test, send ten packets using ping and record each of the ten times reported.  Do not record the minimum, average, and maximum times reported.  If you get packet loss, do not record your data, but try your test again later.  Put the times for each site into a separate column in an Excel spreadsheet.  Fill up the first 50 rows (after any header rows) with the business-hours data, the next 50 rows with the evening data, and the last 50 rows with the weekend data.  Here is an image of what a data worksheet might look like (note that the sites and numbers are made up):  sample-latency.gif

You should test (1) three nearby sites (in or around Boston); (2) three sites on the West Coast of the United States; and (3) three sites far away, e.g., in Asia.  When you are done, you should have a spreadsheet with nine columns, each containing 150 data points.  Format the spreadsheet for readability:  in particular, use borders and adjust the column widths.
 

Finding Sites

You should choose your own sites.  I recommend searching at www.google.com  It's very important that the sites actually are in the locations you think they are; this is because some companies, for instance, on the West Coast actually have web servers on the East Coast.  So how do you find out where the sites are?  Well, first, I would explore the company or institution's web site a little for addresses and the like.  Next, run a few trial experiments.  You should find that sites that are at the same distance will have similar latencies.  For instance, local sites should have about 20-30 ms latency.  Sites on the West Coast should have substantially longer latencies, and sites across the world will have even longer latencies.

Here are some locations that I found (you are free to use these, although you may also choose others):

brandeis.edu
www.calstate.edu
www.uoregon.edu
www.adobe.com
www.tsinghua.edu.cn
If you have trouble reaching any of your chosen sites after several attempts, try another site.
 

Statistical Analysis

You should perform statistical analysis (as described in the next paragraph) on two of the three sites for each distance (close, medium distance, and far), for a total of six data sets.  You may have trouble reaching some sites at some times.  (I'm asking you to do an extra site for each distance in case you have persistent problems getting good data from one or more sites.)

When you are done collecting data, reduce the number of data sets.  I recommend you make a copy of your raw data (give the file another name).  Next, in your original Excel file, delete the columns you are not going to analyze, leaving six sites (six columns).

From the "Tools" menu in Excel, choose "Data Analysis;" then choose Descriptive Statistics."  (If you do not see that option, choose "Tools ... Add-Ins," and add-in the Data Analysis Tools.)  Select the first 50 rows of all of your data columns as your input range and check "Summary Statistics."  Once you press "O.K.," you should get a new sheet containing your statistical results.  You should remove the extra header columns ("Mean," "Standard Deviation," and so on), leaving only the first.

Repeat the statistical analyses for business hours and for weekend hours (i.e., the second and third groups of 50 rows).

Then, cut and paste the statistical information into one worksheet and delete the old ones.  You should move the columns around so that all of the data sets for each site are adjacent, and the columns are grouped by distance.  Format the new worksheet for readability.  Here is an example:  sample-stats.gif

Make three column graphs, one for each distance (nearby, West Coast, and far), each with two series:  median values and standard deviations.  (Standard deviation is a measure of how much variation there is in the data.)  The graph should be appropriately labeled and formatted to make for a nice presentation.  Here is a sample chart:  sample-chart.gif

Make sure you name all of your worksheet tabs appropriately (e.g., "Data," "Stats," "Chart (Close)," "Chart (Medium)", "Chart (Far)").  Delete all empty spreadsheets (e.g., "Sheet 2," "Sheet 3").

Save your file as latency.xls
 

Throughput:

You should choose three sites to perform your throughput tests (you may use three of the six used above for latency testing).

Start by determining what are the largest-size packets you can ping (it may be different for each site).  Start with packet sizes of 100 and increase by multiples of ten (i.e., 1000, 10000, ...) until the site will not accept any larger packets.  Once you have a general idea, test different sizes to get within 100 bytes of the actual maximum packet size (e.g., you may find a site accepts 1400-byte packets, but not 1500-byte packets).  When you are doing this testing, I suggest you only send 2 packets at a time.  If you send too many large packets, the site may refuse your pings temporarily.  If this happens, wait a few minutes and try again.

Once you have determined the largest-size packets you can send, run two tests for each site:  send ten 56-byte packets and send ten packets close to the largest size (e.g., if the largest size is 1400 bytes, you might send 1356-byte packets).

As before, perform five tests with ten packets at different times and dates during business hours, in the evening, and on the weekend.  You should have one column for each packet size for each site, for a total of six columns of data.  The header rows will list the sites and packet sizes, and underneath will be 150 data points.  Again, format the spreadsheet for readability.  Here is a sample:  sample-tput.gif

Similar to the latency data, remove one of the sites and perform statistical analyses on the remaining data.  Remember to format your new worksheet for readability and remove extraneous columns.

Using the median values, determine throughput to each site.  Put the following formula somewhere on the descriptive-statistics worksheet:  throughput = ([size of largest packet] - [size of small packet]) / ([median time for largest packet] - [median time for small packet]).  You also need to multiply by 1000 to convert from milliseconds to seconds, and multiply by 8 to convert from bytes to bits.  The results should be six numbers that express the number of bits per second for each site.

(For example, assume your largest-size packets are 1024 bytes, your median time for the large packets is 198 milliseconds, and your median time for the 16-byte packets is 190 milliseconds.  Then your throughput = 1000 x 8 x (1024 - 16) / (198 - 190) = 1008000 bits per second = 1.008 Mbps.)

Create a single column graph showing throughput values for each site for each time of day (one series with six data points total).  Format and label your graph for legibility.

As before, give your worksheet tabs meaningful names and delete empty worksheets.

Save your file as throughput.xls
 

PowerPoint Presentation:

Finally, create a PowerPoint presentation reporting your results.  Include each of your graphs on a slide.  You should also have a title page and at least four other slides, for a total of at least nine slides.  You may find it easiest to begin by using the PowerPoint Wizard.  Animate at least four items in your presentation, and include at least two images (either clip art or photographs or the like).

Your presentation should address the results of your research.  What did you find out about the latency to different geographical locations?  Is latency different at different times of the day?  How much do latency measurements vary?  Likewise report and analyze your throughput results.  What is your conclusion about how well the Internet works?

You should also include in your presentation whatever information you have about the sites.  Make sure you include the name of the server that answered your ping and the IP address.  You also should use dig and/or search the institution's web site to find the address (city, state, country).

Save your file as network.ppt
 

Extra Credit:

As mentioned previously, I will select three people at random to make presentations.  Those selected will get extra credit for their presentation.  However, if I select someone who is not present, or is not prepared, he or she may lose points.  Accordingly, be prepared to show your presentation and discuss your results during any class after the due date of this assignment.  If you are anxious about public speaking, feel free to come see me beforehand and I'll help you prepare.  Please note that I am not expecting highly polished presentations; I just want to be convinced that you learned something from this assignment.
 

Logging Experiments:

IMPORTANT:  You need to log all experiments with gsubmit at the time you perform the experiments.  If you have not logged your experiments, you will receive no more than half credit for this assignment.

Each time you do a series of experiments, start a script file with the script command.  You can do all nine of your latency sites and your three throughput sites in one script session.  Don't worry if you make mistakes or have to redo things while your script is running.  Recall that you end your script session with the exit command.

Name your script files as follows:  yourlogin-month-day-militarytime.txt  Military time means on a 24-hour clock.  With military time, use four digits and don't use a.m. or p.m. or a colon (:).  The morning starts at 0000 and ends at 1159.  Noon is 1200, one o'clock in the afternoon is 1300 and so on.  Eight seventeen in the morning is 0817 military time.  Nine fifteen at night is 2115.

So, for example, a script session begun on March 14 at 10:51 p.m. might look like this:
 

(stevec) ~ % script stevec-3-14-2251.txt
Script started, file is stevec-3-14-2251.txt
(stevec) ~ % ping -s brandeis.edu 56 10
PING brandeis.edu: 56 data bytes
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=0. time=30. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=1. time=25. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=2. time=24. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=3. time=25. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=4. time=35. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=5. time=22. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=6. time=22. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=7. time=19. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=8. time=23. ms
64 bytes from cliff.unet.brandeis.edu (129.64.99.34): icmp_seq=9. time=27. ms

----brandeis.edu PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms)  min/avg/max = 19/25/35
(stevec) ~ % ping -s www.tsinghua.edu.cn 56 10
PING www.tsinghua.edu.cn: 56 data bytes
64 bytes from 166.111.4.100: icmp_seq=0. time=487. ms
64 bytes from 166.111.4.100: icmp_seq=1. time=914. ms
64 bytes from 166.111.4.100: icmp_seq=2. time=446. ms
64 bytes from 166.111.4.100: icmp_seq=3. time=474. ms
64 bytes from 166.111.4.100: icmp_seq=4. time=463. ms
64 bytes from 166.111.4.100: icmp_seq=5. time=454. ms
64 bytes from 166.111.4.100: icmp_seq=6. time=484. ms
64 bytes from 166.111.4.100: icmp_seq=7. time=480. ms
64 bytes from 166.111.4.100: icmp_seq=8. time=465. ms
64 bytes from 166.111.4.100: icmp_seq=9. time=430. ms

----www.tsinghua.edu.cn PING Statistics----
10 packets transmitted, 10 packets received, 0% packet loss
round-trip (ms)  min/avg/max = 430/509/914
(stevec) ~ % exit
exit
Script done, file is stevec-3-14-2251.txt
(stevec) ~ % 

After creating such a script file, gsubmit it as follows (note that the first time you use gsubmit it will ask you for your name and ID number):
 
(stevec) ~ % gsubmit cs101a1 stevec-3-14-2251.txt 
stevec submitted ./stevec-3-14-2251.txt at Wed Mar 13 23:56:15 2002
(stevec) ~ % 

Submission:

Submit all three files (latency.xls, throughput.xls, and network.ppt) through WebCT.  Late submissions will not be accepted, so get started early.