Assignment 16: Dictionary Operations

Assignment 16: Dictionary Operations
- Preliminaries

due by 9:00 p.m. EST on Thursday 3/27/25

Preliminaries

In your work on this assignment, make sure to abide by the collaboration policies of the course.

For each problem in this problem set, we will be writing or evaluating some Python code. You are encouraged to use the Spyder IDE which will be discussed/presented in class, but you are welcome to use another IDE if you choose.

If you have questions while working on this assignment, please post them on Piazza! This is the best way to get a quick response from your classmates and the course staff.

Programming Guidelines

Refer to the class Coding Standards for important style guidelines. The grader will be awarding/deducting points for writing code that comforms to these standards.
Every program file must begin with a descriptive header comment that includes your name, username/BU email, and a brief description of the work contained in the file.
Every function must include a descriptive docstring that explains what the function does and identifies/defines each of the parameters to the function.
Your functions must have the exact names specified below, or we won’t be able to test them. Note in particular that the case of the letters matters (all of them should be lowercase), and that some of the names include an underscore character (_).
Make sure that your functions return the specified value, rather than printing it. None of these functions should use a print statement.
If a function takes more than one input, you must keep the inputs in the order that we have specified.
You should not use any Python features that we have not discussed in class or read about in the textbook.
Your functions do not need to handle bad inputs – inputs with a type or value that doesn’t correspond to the description of the inputs provided in the problem.
You must test your work before you submit it You can prove to yourself whether it works correctly – or not – and make corrections before submission. If you need help testing your code, please ask the course staff!
Do not submit work with syntax errors. Syntax errors will cause the Gradescope autograder to fail, resulting in a grade of 0.

Warnings: Individual Work and Academic Conduct!!

This is an individual assignment. You may discuss the problem statement/requirements, Python syntax, test cases, and error messages with your classmates. However, each student must write their own code without copying or referring to other student’s work.
It is strictly forbidden to use any code that you find from online websites including but not limited to as CourseHero, Chegg, or any other sites that publish homework solutions.
It is strictly forbidden to use any generative AI (e.g., ChatGPT or any similar tools**) to write solutions for for any assignment.

Students who submit work that is not authentically their own individual work will earn a grade of 0 on this assignment and a reprimand from the office of the Dean.

If you have questions while working on this assignment, please post them on Piazza! This is the best way to get a quick response from your classmates and the course staff.

Task 1: Dictionary Operations

80 points; individual-only

Begin a new file called a16_dict_operations.py. Include your test cases in this file. For each question below, you will write a function that produces the required result and returns a string result.

For each function, you should write multiple test cases, to ensure that your function works correctly.

Write the function print_dict(d) which will print out all (key, value) pairs in the parameter d (a dictionary) one per line, in ascending order by key.

For example:
```
if __name__ == '__main__':

     d1 = {'BU': 5, 'BC':3, 'Harvard':4, 'Northeastern':1} # alpha keys
     print_dict(d1)
```
will produce the following output:
```
BC: 3
BU: 5
Harvard : 4
Northeatern: 1
```
For example:
```
if __name__ == '__main__':

    d2 = {8:5, 2:7, 3:4, 9:2, 5:13} # numeric keys
    print_dict(d2)
```
will produce the following output:
```
2: 7
3: 4
5: 13
8: 5
9: 2
```
Hints:
- Begin with a basic loop to iterate over all keys in the dictionary. Inside the loop, print out the key and value on a single line, separated by a colon :.
- Once you get the basic loop working, add statements before the loop to obtain the keys as a list and sort them. Then, change your basic loop to iterate over the list elements (i.e., the sorted keys) to obtain the corresponding values.
In the game of Scrabble, players alternate turns placing words on the game board. Each letter has its own point value, and the score of the play depends on which letters you can play.

Here are the letters and scores used in the game:

Write the function scrabble_score(word) which processes the string word and returns a the number of points that word score in the game of Scrabble.

For example:
```
>>> scrabble_score('hello')
8
>>> scrabble_score('world')
9
>>> scrabble_score('jubilant')
17
```
Copy the following dictionary containing all lower case letters and their corresponding Scrabble scores into your function definition:
```
 letter_scores = {
    'a': 1, 'b': 3, 'c': 3, 'd': 2, 'e': 1, 'f': 4, 'g': 2, 'h': 4, 'i': 1, 'j': 8, 
    'k': 5, 'l': 1, 'm': 3, 'n': 1, 'o': 1, 'p': 3, 'q': 10, 'r': 1, 's': 1, 't': 1, 
    'u': 1, 'v': 4, 'w': 4, 'x': 8, 'y': 4, 'z': 10
    }
```
In your function, use a definite loop to process each letter in the parameter word, and use the dictionary of letter_scores to retrieve the corresponding score for that letter. Use the accumulator pattern to add up the scores for each letter and return the result.

Notes:
- Your function should work just as well with upper or lower case letters. There is no need to change the dictionary to do this! You can accomplish this by converting the word to lower case before processing it in the loop.
Write a function best_word(words) that will evaluate a list of words, and return the one word with the highest Scrabble score.

For example:
```
>>> words = ['nitwit', 'blubber', 'oddment', 'tweak']
>>> best_word(words)
'blubber'
```
Use the list-of-lists optimization technique, and your scrabble_score function to find the best scoring word from the list, and return that word.
Write the function letter_counts(text) which processes the string text and returns a dictionary containing the counts of every letter occurring in text.

For example:
```
>>> d = letter_counts('testing 1 2 3!')
>>> d  # show the contents of this dictionary
```
will produce the following output:
```
{'a': 0, 'b': 0, 'c': 0, 'd': 0, 'e': 1, 'f': 0, 'g': 1, 'h': 0, 'i': 1, 
'j': 0, 'k': 0, 'l': 0, 'm': 0, 'n': 1, 'o': 0, 'p': 0, 'q': 0, 'r': 0, 
's': 1, 't': 2, 'u': 0, 'v': 0, 'w': 0, 'x': 0, 'y': 0, 'z': 0}
```
Notes:
- The result shows only the lower case value of each letter in text. Before you process the text, obtain a lower case version of it so that you can count all letters without worrying about case.
- Notice as well that non-letters (e.g., numerals, spaces, punctuation marks) from text are excluded from the count.
- The counts for letters that are not present in text are set to 0 in the result. There are many ways to accomplish this, at least one of which requires only a single line of code. Think of this as initializing an accumulator variable. Hint: think about how a list comprehension might help accomplish this. You can convert a list of sublists into a dict using this syntax:
```
d = dict( list of sublists here )
```
Write the function letter_frequencies(text) that returns a dictionary containing all of the letter frequencies of all letters occurring in the parameter text.

For example:
```
>>> d = letter_frequencies('hello, world')
>>> d # show the contents of this dictionary
```
will produce the following output:
```
{'a': 0.0, 'b': 0.0, 'c': 0.0, 'd': 0.1, 'e': 0.1, 'f': 0.0, 'g': 0.0, 
'h': 0.1, 'i': 0.0, 'j': 0.0, 'k': 0.0, 'l': 0.3, 'm': 0.0, 'n': 0.0, 
'o': 0.2, 'p': 0.0, 'q': 0.0,'r': 0.1, 's': 0.0, 't': 0.0, 'u': 0.0, 
'v': 0.0,'w': 0.1, 'x': 0.0, 'y': 0.0, 'z': 0.0}
```
Notes/Hints:
- Begin by re-using (calling) your letter_counts(text) function from above to obtain the letter counts as a local variable (of type dictionary). You will process this dictionary to calculate the frequencies.
- Compute the “total number of letters”, which is the denominator used to compute letter frequencies. Calculate this by processing the dictionary of letter counts, to accumulate the total number of letters.
  
  In the above example, there are 10 letters total (i.e., we do not include the space or the comma).
- Create a new dictionary to contain the letter frequencies.
- The frequency of a letter is the count of occurences of that letter divided by the total number of all letters.
  
  In the above example, the letter ‘l’ occurs 3 times and there are 10 letters total, so the frequency of ‘l’ is 0.3.

Task 2: Processing a CSV Log File

optional task: because you like a challenge and solving puzzles is intuitively satisfying (a small bonus opportunity)

Background The “Domain Name Service” is an internet protocol/service that resolves (translates) hostnames (e.g., www.bu.edu) into Internet Protocol (IP) addresses (e.g., 128.197.253.126). Computers require IP addresses to send/receive data over Internet Protocol.

Here is a CSV file containing a stratified sample from an old, anonymized DNS log: dnslogs_sample.csv. The file contains records, one per line, of DNS requests – where a user’s computer requests the IP address corresponding to a hostname.

Here are a couple of lines from this log file, to give you an idea of the format:

    '15-Jan-2008', '09:07:22.076', '192.168.103.202#1025', 'www.weather.com'
    '15-Jan-2008', '09:48:20.381', '192.168.227.54#2184', 'www.facebook.com'

Each line contains the same 4 fields: the date, the time, the requesting IP address (and port number), and the requested hostname.

To do

Do this work in your file a16_dns_logs.py.

Write a function process_dns_logs(dns_filename) to process a CSV log file in this format and return a dictionary in which the keys are Top Level Domains (e.g., aol.com) and the values are a list* of IP addresses that requested that hostname (e.g. aol.com was requested by the IP addresses ['192.168.208.10', '192.168.211.253', '192.168.229.247']).

Here is a subset of the result dictionary the function will return:

    >>> dns_filename = "dnslogs_sample.csv"
    >>> d = process_dns_requests(dns_filename)
    >>> d # evaluate the dictionary to see its contents:
    {'atdmt.com': ['192.168.215.145', '192.168.230.186'],
     'optonline.net': ['192.168.211.3'],
     'google.com': ['192.168.165.49', '192.168.280.242', '192.168.221.29', '192.168.180.224'],
     'yimg.com': ['192.168.215.108'] }

Notice that in this result, we use the top-level domain, which includes only the last part of the hostname (usually the part including in the format domain.ext).

The function should open the file, process each line as a single record, and extract a list of fields from that record. It should further process the fields to obtain the top-level domain and requesting IP address. For example, this line:

  '15-Jan-2008', '12:18:29.822', '192.168.223.159#1147', 'www.bankofamerica.com'

should be processed to obtain top_level_domain of bankofamerica.com and ip_address of 192.168.223.159. Work out how to do these transformations by experimenting at the console and using string methods with which you are already familiar.

Accumulating the dictionary

For each line in the CSV file, you will either add or update an entry in the dictionary, as follows:

If top_level_domain exists as a key in the dictionary, you will append the ip_address to the list of IP addresses that requested this TLD.
If the top_level_domain does not exists as a key in the dictionary, you will insert it into the dictionary, along with a list containing the
ip_address that requested this TLD.

hint: To create a list containing a value, place square brackets around that value, e.g.: ["hello"] creates a list containing the string "hello".

After you have processed the CSV file and obtained a dictionary, use your print_dict function from above to print it out, for example:

    >>> dns_filename = 'dnslogs_sample.csv'
    >>> dns = process_dns_logs(dns_filename)
    >>> print_dict(dns)
    2mdn.net: ['192.168.159.53']
    adrevolver.com: ['192.168.233.86']
    advertising.com: ['192.168.234.221', '192.168.276.242']
    alphabet-boys.com: ['192.168.253.39']
    aol.com: ['192.168.208.10', '192.168.211.253', '192.168.229.247']
    atdmt.com: ['192.168.215.145', '192.168.230.186']
    bankofamerica.com: ['192.168.223.159']
    boston.com: ['192.168.279.70']
    cnn.net: ['192.168.253.69']
    collegehumor.com: ['192.168.195.62']

To test your work, add a main section to your python file. For example:

    if __name__ == '__main__':

        filename = 'dnslogs_sample.csv'
        dns = process_dns_logs(dns_filename)
        print_dict(dns)

Submitting Your Work

20 points; will be assigned by code review

Be sure to name your files correctly!

You will submit two files for this assignment: a16_dict_operations.py and a16_dns_logs.py

When you upload the files, the autograder will test your program.

Notes:

Upload these files to Gradescope before the deadline.
When you upload, the autograder script will process your file(s).
You may resubmit as many times as you like before the deadline, and only the grade from the last submission will be counted.

Warning: Beware of Global print statements

The autograder script cannot handle print statements in the global scope, and their inclusion causes this error:

The autograder failed to execute correctly. Please ensure that your submission is valid. Contact your course staff for help in debugging this issue. Make sure to include a link to this page so that they can help you most effectively.
You can prevent this error by not having any print statements in the global scope. Instead, create an if __name__ == '__main__': section at the bottom of the file, and put any test cases/print statements in that controlled block.
print statements inside of functions do not cause this problem.