All problems due by 11:59 p.m. on Tuesday, April 29, 2025.
In your work on this assignment, make sure to abide by the collaboration policies of the course. All of the problems in this assignment are individual-only problems that you must complete on your own.
If you have questions, please come to office hours, post them on
Piazza, or email cs460-staff@cs.bu.edu
.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
40 points total
Create a subfolder called ps5
within your
cs460
folder, and put all of the files for this assignment in that folder.
This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps5_partI
.
Add your work for all of the problems from Part I to this file.
Once you have completed Part I, choose File->Download->PDF
document, and save the PDF file on your machine. The resulting
PDF file (ps5_partI.pdf
) is the one that you will submit. See
the submission guidelines at the end of Part I.
25 points total
Consider the following sequence of log records written by a system that uses undo-redo logging:
LSN |
record contents |
---|---|
5 |
txn: 1; BEGIN |
10 |
txn: 1; item: C; old: 300; new: 350; olsn: 0 |
20 |
txn: 2; BEGIN |
30 |
txn: 2; item: D; old: 400; new: 420; olsn: 0 |
40 |
txn: 1; item: A; old: 100; new: 120; olsn: 0 |
50 |
txn: 2; item: B; old: 200; new: 240; olsn: 0 |
60 |
txn: 1; item: A; old: 120; new: 150; olsn: 40 |
70 |
txn: 1; COMMIT |
80 |
txn: 2; item: D; old: 420; new: 480; olsn: 30 |
90 |
txn: 2; item: A; old: 150; new: 190; olsn: 60 |
(9 points) If a crash occurs and log record 90 is the last one
to make it to disk, what steps would be performed during recovery
if the system is performing undo-redo logging and the
on-disk datum LSNs are not consulted? (In other words, you
should assume that the system is not performing logical
logging, and thus you don’t need to worry about redoing or
undoing a change unnecessarily.) Complete the table provided in
ps5_partI
to show how each log record would be handled during
both the backward and forward passes.
Guidelines:
Each cell of the table should include one of the following actions:
If the action is undo or redo, you should also include the appropriate assignment (e.g., you would write X = 800 if data item X is given a value of 800).
(9 points) If a crash occurs and log record 90 is the last one to make it to disk, what steps would be performed during recovery if the system is performing undo-redo logging and the on-disk datum LSNs are consulted (i.e., the system is performing logical logging, despite the presence of the old and new values in the update log records)?
Complete the table provided in ps5_partI
to show how each log
record would be handled during both the backward and forward
passes. You should assume that the datum LSNs at the start of
recovery are the following:
In addition, you should assume that the recovery subsystem does not perform any actions that the LSNs indicate are unnecessary.
Guidelines:
Each cell of the table should include one of the following actions:
Important: Make sure that you don’t put skip for cases that are more accurately described using don’t undo or don’t redo.
If the action is undo or redo, you should also include both the assignment for the data item (see above) and the assignment for the datum LSN (e.g., you would write datumLSN(X) = 100 if the datum LSN of item X is given a value of 100).
(7 points) We will complete the material needed for this part of the question on April 25.
Now assume that a dynamic checkpoint had occurred between log records 40 and 50:
LSN |
record contents |
---|---|
5 |
txn: 1; BEGIN |
10 |
txn: 1; item: C; old: 300; new: 350; olsn: 0 |
20 |
txn: 2; BEGIN |
30 |
txn: 2; item: D; old: 400; new: 420; olsn: 0 |
40 |
txn: 1; item: A; old: 100; new: 120; olsn: 0 |
45 |
CHECKPOINT (with appropriate additional info) |
50 |
txn: 2; item: B; old: 200; new: 240; olsn: 0 |
60 |
txn: 1; item: A; old: 120; new: 150; olsn: 40 |
70 |
txn: 1; COMMIT |
80 |
txn: 2; item: D; old: 420; new: 480; olsn: 30 |
90 |
txn: 2; item: A; old: 150; new: 190; olsn: 60 |
You should assume that the checkpoint record includes the appropriate additional information, as discussed in lecture.
How (if at all) would the presence of that checkpoint record change which log records are considered during the backward pass? Explain briefly.
How (if at all) would the presence of that checkpoint record change which log records are considered during the forward pass? Explain briefly.
15 points total; 5 points each part
Consider again the following sequence of log records:
LSN |
record contents |
---|---|
5 |
txn: 1; BEGIN |
10 |
txn: 1; item: C; old: 300; new: 350; olsn: 0 |
20 |
txn: 2; BEGIN |
30 |
txn: 2; item: D; old: 400; new: 420; olsn: 0 |
40 |
txn: 1; item: A; old: 100; new: 120; olsn: 0 |
50 |
txn: 2; item: B; old: 200; new: 240; olsn: 0 |
60 |
txn: 1; item: A; old: 120; new: 150; olsn: 40 |
70 |
txn: 1; COMMIT |
80 |
txn: 2; item: D; old: 420; new: 480; olsn: 30 |
90 |
txn: 2; item: A; old: 150; new: 190; olsn: 60 |
This log was created by a system that uses undo-redo logging. If a crash occurs and log record 90 is the last one to make it to disk, what are all possible on-disk values of each of the data items (A, B, C, and D) after the crash but before recovery?
How would your answer to part 1 change if the system were using redo-only logging instead of undo-redo? Briefly explain the reasons for any changes. (You should assume that none of the data items – A, B, C, D – are on the same page.)
How would your answer to part 1 change if the system were using undo-only logging? Briefly explain the reasons for any changes. (Here again, you should assume that none of the data items are on the same page. In addition, you should assume that when the DBMS forces dirty database pages to disk, it forces only those pages that must go to disk in order for undo-only logging to work correctly.)
Once you have completed Part I in Google Drive, choose
File->Download as->PDF document, and save the resulting file
(ps5_partI.pdf
) on your machine.
Login to Gradescope and click on the box for CS 460.
Click on the name PS 5: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps5_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
50 points total
In this part of the assignment, you will write queries for a MongoDB version of our movie database – one that uses the data model outlined in lecture.
You should begin by following our directions to install and configure both MongoDB and the version of the movie database that you will be using.
In addition, you should download the template that you will use for
your queries by clicking on the following link and saving the file in
your ps5
folder:
If the browser doesn’t allow you to choose where to download the file, right-click the above link and use Save link as... or the equivalent option.
See our separate instructions for the steps needed to perform your queries.
Remember: When typing a query in MongoDB Compass, you can allow your query to span multiple lines if you use Shift-Return or Shift-Enter at the end of a given line.
If you’re using a Mac, you should disable smart quotes, because they may lead to errors in MongoDB and in our testing. There are instructions for doing so here.
ps5_queries.py
is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the MongoDB method calls needed to solve the problems given
below. Test each method call in MONGOSH
to make sure that it works.
Once you have finalized the method call for a given problem, copy
the call into your ps5_queries.py
file, putting it between
the triple quotes provided for that query’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Each of the problems must be solved by means of a single query
(i.e., a single method call). The results of the query should
include only the requested information, with no extraneous fields.
In particular, you should exclude the _id
field from the results
unless the problem indicates otherwise.
You do not need to worry about the order of the fields in the results, nor the places in which line breaks or spaces appear.
Unless the problem indicates otherwise, you may only use aspects of the MongoDB query language that we discussed in lecture.
Your queries should only use information provided in the problem itself. In addition, they should work for any MongoDB database that follows the schema that we discussed in lecture.
Make sure to follow the guidelines above.
In Problem Set 1, you wrote a SQL query to find information about two Oscar nominees born overseas: Sebastian Stan and Felicity Jones. Write a MongoDB query to find the same information. The result of your query should be documents that each contain the name and place of birth of one of those people.
In Problem Set 1, you wrote a SQL query to find information about Oscar-winning movies whose names begin with the word “A”. Write a MongoDB query to solve a similar problem – but instead of limiting the results to Oscar winners, your query should find all movies whose names begin with the word “A”. The result of your query should be documents that each contain the name and year of one of those movies.
As in PS 1, your query should employ pattern-matching with an appropriate pattern. Make sure that you only include movies with names in which the very first word is “A”. It is not enough for the movie’s name to begin with the letter “A”; it must begin with the word “A”.
Write a MongoDB query to find the names of people who won a Best Director Oscar in the 1990s (i.e., an Oscar of type “BEST-DIRECTOR” in the 10 years from 1990 to 1999). If a given director won multiple Best Director Oscars during that time period, they should appear only once in your results. You should assume that the names of the relevant people are unique.
Hints:
You should use the single-purpose aggregation method called
distinct
that we covered covered in lecture, not an
aggregation pipeline. This method will ensure that a given
director name appears at most once in the results. Recall that
the distinct
method produces an array of values – in this
case, an array of strings.
Make sure that you start with the correct collection of documents!
You will need to use dot notation for the name of the field
that you pass into the distinct
method, and therefore you
will need to surround that field name with quotes.
Because year values are integers, you will not be able to perform pattern matching on them. Instead, you will need a combination of two inequalities.
In Problem Set 1, you wrote a SQL query to find the shortest top-grossing movie (i.e., the movie with an earnings rank of 200 or less whose runtime is the smallest). Write a MongoDB query to solve the same problem. You may assume that there is a single shortest top-grosser, and thus you don’t need to worry about breaking ties. The result of your query should be a single document containing the name and runtime of the relevant movie.
Hint: You will need to use an aggregation pipeline.
In Problem Set 1, you wrote a SQL query to find find all movies that have won at least 4 of the Oscars in the database. Write a MongoDB query to solve a similar problem. The result of your query should be documents with the following fields:
one called name
for the name of the movie
one called num_oscars
for the number of Oscars the movie
has won
one called awards
whose value is an array containing the
types of awards that the movie has won. If a given movie
won the same award multiple times (e.g., if it had two directors
and they both won Best Director), the type should show up
only once in the array. Note that this means that the
value of num_oscars
may not always be the same as the
number of values in the awards
array.
Sort the results in descending order by the number of Oscars won. If multiple movies are tied for number of Oscars, sort them in ascending order by name.
Hints:
Remember that the order of the fields within a given document doesn’t matter.
For the purposes of this problem, you should assume that all movies that have won an Oscar have a unique name.
In lecture, we mentioned an accumulator called $addToSet
that can be used to construct an array of values. It will be
useful for this problem.
Write a MongoDB query to determine how people in our database were
born in Italy from 1960 to the present. You should use the
single-purpose aggregation method called count
that we covered
covered in lecture, not an aggregation pipeline, and thus
the result of your query will be a single number.
You should assume that all people born in Italy have the string “Italy” at the very end of their place of birth. Make sure that you don’t include people that have “Italy” at some other position within their place of birth.
Hints/notes:
You should ignore the “DeprecationWarning” message that you
will receive when you use the count
method.
Date of birth values are stored as strings, and all people born from 1960 to the present have a date of birth that is greater than or equal to “1960-01-01”.
Now let’s find out more about the people in our database who were born in Italy. This time, you should focus on all people born in Italy, not just those born since 1960. Use a MongoDB aggregation pipeline to produce a single document with the following fields:
one called num_italians
for the number of people born
in Italy (see the previous problem for the assumptions
you should make about the places of birth of these people)
one called oldest_dob
for the date of birth of the Italian
who was born the longest time ago
one called newest_dob
for the date of birth of the Italian
who was born closest to today.
In Problem Set 1, you wrote a SQL query to find, for each movie rating, the shortest runtime, longest runtime and average runtime of the movies with that rating. Write a MongoDB query to solve a similar problem.
The result of your query should be documents with the following fields:
one called rating
for the value of the rating being
summarized (see below for important guidelines about these
ratings)
one called num_movies
for the number of
movies with that rating
one called shortest
for shortest runtime of movies
with that rating
one called longest
for longest runtime of movies
with that rating
one called average
for average runtime of movies
with that rating.
Important: As was the case for the comparable problem in PS 1, you should not include:
movies without a rating; to exclude them, you can make use of the fact that the MongoDB movie database omits the rating field if a movie doesn’t have a rating
any rating that is associated with fewer than 10 of the movies in our database.
In Problem Set 1, you wrote a SQL query to determine how many movies that won Best Picture did not win any of the other Oscars in our database. Write a MongoDB query to solve a similar problem: finding the names of those movies. The result of your query should be documents that each contain a single field called name who value is the name of one of the Best-Picture winners that didn’t win any of the other Oscars in our database.
Hint: In MongoDB, this problem is simpler than it might seem! Think about how you could use the pipeline stages and operators that we’ve covered to find all movies whose only Oscar is of type “BEST-PICTURE”.
Which actors are associated with the most profitable movies? Write a MongoDB query that focuses on all actors who have appeared in at least 5 of the top-grossing movies (movies with an earnings rank of 200 or less), and that finds the 15 actors from that group of actors whose top-grossing movies have the smallest average earnings rank. The result of your query should be documents with the following fields:
one called money_maker
for the name of the person
one called num_top_grossers
for the number of top-grossing
movies the person has appeared in
one called avg_ranking
whose value is the average earnings
rank of the top-grossing movies in which the person has appeared.
Sort the results in ascending order by the average ranking of the movies. If multiple people are tied for average ranking, sort them in descending order by the number of top grossers. If they are still tied, sort them in ascending order by name.
For the purposes of this query, you may assume that person names are unique.
Login to Gradescope by clicking the link in the left-hand navigation bar, and click on the box for CS 460.
Submit your ps5_queries.py
file using these steps:
Click on PS 5: Part II in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your file to the box labeled DRAG & DROP. You can either drag and drop the file from its folder into the box, or you can click on the box itself and browse for the file.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your file. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Notes:
You should see results for each query. If you don’t see any results for a given query, it probably means that you have a syntax or logic error in your query, and you should attempt to fix it and resubmit.
You should keep making changes as needed until you get full credit for a given query. There will be no partial credit awarded for an incorrect query.
Make sure that each query is logically correct, and that it will work for any instance of the movie database that follows the data model outlined in lecture. We reserve the right to ultimately run your queries on a slightly different version of the database to ensure that your queries are logically correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work.
Near the top of the page, click on the box labeled Code. Then click on the name of your file to view its contents. Check to make sure that the file contains the work that you want us to grade.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
Last updated on April 25, 2025.