Part I due by 11:59 p.m. on Tuesday, February 4, 2025
Part II due by 11:59 p.m. on Tuesday, February 18, 2025
If you haven’t already done so, you should complete Lab 0 before beginning this assignment.
In your work on this assignment, make sure to abide by the collaboration policies of the course.
If you have questions, please come to office hours, post them on
Piazza, or email cs460-staff@cs.bu.edu
.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
30 points total
If you haven’t already created a folder named cs460
for your
work in this course, follow these
instructions to do so.
Then create a subfolder called ps1
within your cs460
folder,
and put all of the files for this assignment in that folder.
The problems from Part I will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps1_partI
.
Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose
File->Download->PDF document, and save the PDF file in your
ps1
folder. The resulting PDF file (ps1_partI.pdf
) is the one that
you will submit. See the submission guidelines at the end of Part I.
10 points total
Important
This problem, like most of the problems in this assignment, is individual-only. In general, unless a problem or section of an assignment is explicitly labeled as pair-optional, you should assume that it is individual-only. See our collaboration policies for details about both types of problems.
Before going any further, make sure to follow the instructions above to create your own copy of the Google Drive document that you should use for Part I of the assignment.
The ER diagram shown below is part of the design of a database that
includes information about actors and the parts that they perform in
plays, musicals, movies and TV shows. Each actor has a name and a unique id.
Each part can be uniquely identified by the combination of the id of the
corresponding show and the name of the role, and the database also keeps
track of how many lines of dialogue that each part has (the attribute called
line count
).
In the above version of the ER diagram, there are no constraints on the relationships between actors and parts.
In the ps1_partI
template that we’ve provided on Google Drive (see
above), we’ve included the beginnings of three separate versions of
this ER diagram.
(2 points) Edit the first version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every actor performs at most one (i.e., 0 or 1) part. To do so, you should:
Click on the diagam and then click the Edit link that appears below the diagram.
From the collection of eight connectors that we have provided below the diagram, select the appropriate connectors and use them to connect the two entity sets to the relationship set. Adjust the connectors as needed to make the connections.
Click the Save & Close button.
(2 points) Edit the second version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every actor performs exactly one part.
(2 points) Edit the third version of the diagram, adding the connections needed to create an ER diagram whose only constraints are (1) that every actor performs at least one part, and (2) that every part is performed by at most one actor.
(4 points) Consider your answer to part 1—the ER diagram for the situation in which every actor performs at most one part.
Is it possible to convert that diagram into a relational schema that has only two tables/relations, or would three be needed? Explain your answer briefly, and then specify the schema of the two or three tables.
Give the schema of each table in the form relation_name(attr_name1, attr_name2, ...).
Indicate the primary-key attribute(s) of each relation by underlining them.
12 points total
The ER diagram shown below in Figure 2-1 represents information that is to be stored in a database for a literary agency—i.e., a company that provides agents for authors.
Figure 2-1:
Authors can write books, for which they receive royalties, and they can be managed by agents. Books can be published by publishers.
(2 points) At least one of the relationship sets captures a many-to-one relationship. Which one(s)? In your answer, you should specify the direction of each such relationship (e.g., ________ is many-to-one from _________ to ___________).
(4 points) Fill in the table that we have provided to describe, for each relationship set in the diagram, any constraints on the relationships associated with that relationship set. Use words that describe the problem domain (e.g., Each course meets in at most one room...) rather than technical terminology.
(6 points) Transform this diagram into a relational schema by following the procedure discussed in lecture and fill in the table that we have provided. Here are some additional guidelines:
When appropriate, you should combine relations as discussed in lecture.
In the first column of the table, give the schema of each relation in the form relation_name(attr_name1, attr_name2, ...). Feel free to rename attributes as needed to make them easier to understand.
Indicate the primary-key attribute(s) of each relation by underlining them.
If a relation has one or more foreign-key attributes, specify them in the second column of the table, and specify which attribute each foreign key refers to. For example, if you were working with the MajorsIn relation from the university database covered in lecture, its foreign keys would be specified as follows:
8 points total; 2 pts. each part
We will finish the material needed for this problem in lecture on Wednesday, January 29.
Relation R has attributes a, b and c. Relation S has attributes a, b, and d. You are given the following instances of these relations:
Relation R
a |
b |
c |
---|---|---|
1 |
2 |
3 |
3 |
4 |
3 |
7 |
6 |
5 |
Relation S
a |
b |
d |
---|---|---|
2 |
3 |
1 |
3 |
4 |
9 |
7 |
6 |
4 |
For each of the following questions, use the Insert->Table menu option in Google Drive to insert an appropriately sized table for the answer, and then fill in the cells of the table with the correct values.
Make sure that each table includes a header row with appropriate column names.
Once you have completed Part I in Google Drive, choose
File->Download->PDF document, and save the resulting file
(ps1_partI.pdf
) in your ps1
folder.
Login to Gradescope by clicking the link in the left-hand navigation bar. When logging in, make sure that you use the School Credentials option and select Boston University.
Once you are in logged in, click on the box for CS 460. (If
you don’t see that box, email cs460-staff@bu.edu
ASAP and ask to
be added to the course on Gradescope. If the deadline is fast
approaching and you don’t have access to CS 460 on Gradescope,
email your PDF to cs460-staff@cs.bu.edu
before the
deadline.)
Click on the name PS 1: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps1_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see a question outline along with thumbnails of the pages from your uploaded PDF. For each question in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window. You should see a box saying that your submission was successful.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
70 points total
The problem domain for this assignment is movie trivia.
Before you get started, you should carefully read over the full description of the database. Please do so now!
Go to sqlitebrowser.org/dl.
Find the appropriate download link for your machine/operating system, and download and install that version of DB Browser for SQLite.
If you have a Mac, you should download the appropriate Mac .dmg file. Double-click on the file, which will open up a window. Then drag the application’s icon to your Applications folder.
If you have a Windows machine, you should download the Standard installer for 64-bit Windows (which is an .msi file). Double-click on the file to run the installer. You should be able to use all of the default options.
If you are unable to install DB Browser on your own machine, you can use it on the virtual desktop.
Download the following files into your ps1
folder:
If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.
Launch the program.
Click the Open Database button, and find and open the movie.sqlite
database file that you downloaded above.
To explore the schema of the database, click on the Database Structure tab, and then click on the arrows to the left of the table names.
To explore the contents of the tables, click on the Browse Data tab, and then choose the appropriate table from the drop-down menu.
Use the Execute SQL tab to perform queries on the database. Enter your SQL command in the space provided, and press F5 or Ctrl-R to run it. (There is also a small button that you can click; it has a triangular shape that looks like the Play button of a music player.)
ps1_queries.py
is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the SQL commands needed to solve the problems given below. Test each SQL command in DB Browser for SQLite to make sure that it works.
Once you have finalized the SQL command for a given problem, copy
the command into your ps1_queries.py
file, putting it between
the triple quotes provided for that problem’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Unless otherwise stated, each of the problems must be solved by means of a single query (i.e., each query should have a single semi-colon). Use nested subqueries in the WHERE clause as needed. However, you should NOT use nested subqueries in the SELECT clause or FROM clause.
You must limit yourself to aspects of SQL that we have discussed in lecture, unless the problem indicates otherwise. At a minimum, failure to do so will result in a score of 0 for the corresponding problem.
Your queries should not use a LIMIT
clause.
Your queries should only use the keyword DISTINCT
if it is
necessary to obtain the correct results. Similarly, you should
only use an outer join when it is strictly necessary.
Make sure that the results produced by your queries contain exact answers to the problems. We should not have to infer the answer from the results. For example, if we ask you how many movies meet a given criterion, we want a number, not a list of the movies. In addition, your results should not include any extraneous information.
5 points
Make sure to read the important guidelines above before you get started!
A number of this year’s Oscar nominees were born overseas, including Sebastian Stan (nominated for Best Actor for his work in The Apprentice) and Felicity Jones (nominated for Best Supporting Actress for The Brutalist). Write a query to determine where these two actors were born. The result of the query should be tuples of the form (name of person, place of birth).
Hint: If your initial query does not produce any results, you may
want to reconsider the logical operator (AND
, OR
, NOT
) that you
are using in your WHERE
clause.
5 points
Two of the movies nominated for Oscars this year have names that begin with the word “A”: A Complete Unknown and A Complete Pain. Our current database doesn’t include either of those movies yet, but it does include a number of other movies whose names begin with the word “A”.
Write a query to find the names of all Oscar-winning movies in the database that begin with the word “A”, and the years in which those movies were released. A movie should be considered Oscar-winning if it is associated with any of the Oscars in the database.
Your query should employ pattern-matching with an appropriate pattern. Make sure that you only include movies with names in which the very first word is “A”. It is not enough for the movie’s name to begin with the letter “A”; it must begin with the word “A”.
The result of your query should be tuples of the form (movie name, movie year), and a given movie should appear at most once in the results.
5 points
Two of this year’s Best Picture nominees are international: the movie Emilia Perez from France and the movie I’m Still Here from Brazil. Although our database doesn’t include a movie’s country of origin, one sign that a movie may be international is if its director was born outside of the United States.
Write a query to find the names of all Best Picture winners directed by a foreign-born director, along with the name of the director and the year in which the movie won Best Picture. The result of the query should be tuples of the form (movie name, director name, year of Best Picture win).
Hint: You should assume that all foreign-born directors have a
non-null pob
value that does not end with the string 'USA'
.
5 points
Write a query to find the shortest top-grossing movie (i.e., the movie with an earnings rank of less than or equal to 200 whose runtime is the smallest). If two or more top grossers are tied for the shortest runtime, include them all. The result of your query should be tuples of the form (movie name, year, runtime).
Important
For full credit, your answer must use a subquery. Although SQLite provides a way to solve this problem without a subquery, that technique is not standard SQL, and thus it should not be used as part of your answer.
5 points
Isabella Rossellini is an Italian actress who has been nominated for Best Supporting Actress for her role in Conclave. Write a query to determine the number of people born in Italy who have won any of the Oscars in our database. The result of the query should be a single number.
Hint: You should assume that all people born in Italy have a pob
value that ends with the string 'Italy'
.
5 points
This year, The Brutalist has been nominated for 5 of the 6 types of Oscars that we store in our database (Best Picture, Best Actor, Best Supporting Actor, Best Supporting Actress, and Best Director). Write a query to find all movies that have won at least 4 of the Oscars in the database. The results of your query should be tuples of the form (movie name, number of Oscars, year in which the Oscars were won). Sort the results in descending order by the number of Oscars won. If multiple movies are tied for number of Oscars, sort them in ascending order by name.
Notes:
There are movies in the database that have the same name, but you should assume that the combination of name and year (both movie name and movie year and movie name and Oscar year) is unique.
When creating subgroups, you are allowed to use more than one attribute as the basis of the subgroups.
The year in which a movie wins an Oscar is typically one year later than the year in which the movie was released. For example, a movie that wins an Oscar in 2025 was released in 2024.
If a movie wins two Oscars of the same type (e.g., two Best Director Oscars), each of those Oscars should contribute to the count for that movie.
5 points
Most but not all of the older movies in our database were included because they won one or more Oscars. Write a query that determines how many Oscars were won by each movie in the database released before 1940. The result of the query should be tuples of the form (movie name, number of Oscars). If a movie has won no awards, it should still appear in the table with a value of 0 for the number of Oscars.
Notes:
You may assume that all movies before 1940 have a unique name.
The last two notes in the previous problem also apply here.
5 points
It seems likely that some movie ratings should tend to have shorter movies than others. For example, you might guess that G-rated movies – which are often aimed at children – would typically be shorter than R-rated ones. Write a query to determine, for each movie rating, the shortest runtime, longest runtime and average runtime of the movies with that rating. You should exclude movies with a NULL rating, as well as any movie rating that is associated with fewer than 10 of the movies in our database. The result of the query should be tuples of the form (rating, shortest runtime, longest runtime, average runtime).
5 points
The Oscars have been awarded for over 90 years, but in their early
years, there were no awards for supporting actor or supporting
actress. Write a query that lists the winners of the Oscars in our
database from the 1935 Academy Awards – 90 years ago. The result
of your query should be four tuples of the form (award type, person name,
movie name). In the case of the Best Picture winner, the person name
should be NULL
. Sort the results by the type of the Oscar.
Hint: You will need to take special steps to deal with the fact that
Oscar
tuples for Best Picture winners have a NULL
value for
person_id
.
5 points
Movies that win Best Picture often win other Oscars as well. For example, last year’s Best Picture winner, Oppenheimer, also won a number of other awards, including Best Actor, Best Supporting Actor, and Best Director. Write a query that determines how many Best Picture winners did not win any of the other Oscars in our database. The result of your query should be a single number.
5 points
Write a query to determine how many of the movies that have won an Oscar have runtimes that are longer than the average runtime of all of the movies in the database. The result of your query should be a single number.
5 points
The classic movie Fargo was released in 1996 and won multiple Oscars, including Best Picture. It was directed by Joel Coen and Ethan Coen, who are brothers. Both of the Coen brothers are in our database, as is the movie Fargo. However, our database only credits Joel Coen as the director of this movie. It omits the fact that Ethan Coen also directed it.
Write a SQL command that updates our database to include the fact that Ethan Coen is one of the directors of Fargo.
Hints:
This is the only question in which your answer will not be a SELECT command. You will need to figure out the correct type of SQL command to use.
You will need to begin by using one or more SELECT commands to obtain the information needed to capture the relationship between Ethan Coen and Fargo. For other problems in Part II, using preliminary queries to look up information and then using it as part of your final command isn’t allowed, but it’s necessary to do so for this problem.
The preliminary queries that you perform for this problem should not be included in your solution. Rather, your answer should be the single SQL command that adds the relationship between Ethan Coen and Fargo to our database.
10 points total; pair-optional
This is the only problem of the assignment that you may complete with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.
Getting started
Your work on this problem should go in a separate PDF file called
ps1_problem16.pdf
. To create it, you should:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps1_problem16
.
Add your work for this problem to this file.
When you are finished, choose File->Download->PDF document, and
save the PDF file on your machine. The resulting PDF file
(ps1_problem16.pdf
) should be submitted following the
instructions below.
The problems
Write a relational-algebra query for each of the following
problems. Above, you will write a SQL query for these problems;
here you should solve them using relational algebra instead.
See below for some additional guidelines.
(3 points) problem 8 (Italian Oscar winners), but instead of producing a count, your query should produce the names of the people. For the purposes of this problem, you may assume that person names are unique and that relational algebra includes the same pattern-matching operators as SQL.
(4 points) problem 10 (Oscars for old movies), but instead of
producing counts, your query should produce tuples of the form
(movie name, award type). If a given movie won multiple awards, it
should have one tuple for each award; if it did not win any of the
awards in the database, it should have a single tuple in which the
second value of the tuple is NULL
.
(3 points) problem 13 (Only Best Picture), but instead of
producing a count, your query should produce the ids of the
movies. Hint: Take advantage of the set difference operator
(-
).
Additional guidelines
To form your queries, you should use the templates that we have
provided for each of the relational-algebra operators at the top
of ps1_problem16
. Copy the appropriate template(s), and replace
the placeholders with the appropriate expressions, attributes
and/or predicates to form each query. We have provided an example
in ps1_queries
of using two of the templates to form a single
query.
For a given problem, you are allowed to perform a sequence of two
or more queries in which the results of a given query are assigned
to a variable using the assignment operator (<–), and then that
variable is used in a subsequent query. We have provided an
example of this in ps1_queries
, and we recommend that you do
this whenever a given query would otherwise become too long or
difficult to read.
Note that using a sequence of queries is not allowed in your SQL answers, but it is allowed here.
You will make submissions to two separate assignments on Gradescope. The steps needed for the two submissions are different, so please make sure to follow carefully the procedures outlined below.
PS 1: Problems 4-15
Submit your ps1_queries.py
file using these steps:
Click on the name of the assignment in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your file to the box labeled DRAG & DROP. You can either drag and drop the file from its folder into the box, or you can click on the box itself and browse for the file.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your file. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Notes:
You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If you get a message saying that the Autograder failed to
execute correctly, it is likely the case that one of your
queries is taking too long to execute. Make sure that you have
checked each query in DB Browser to ensure that it executes
correctly, and that you have copied the correct query into
your ps1_queries.py
file. If you have a query that is
causing DB Browser to hang, you should either remove that
query from your file before submitting it, or you should
comment it out by putting a #
symbol at the beginning of
each of line of the query.
If needed, use the Resubmit button at the bottom of the page to resubmit your work.
Near the top of the page, click on the box labeled Code. Then click on the name of your file to view its contents. Check to make sure that it contains the queries that you want us to grade.
PS 1: Problem 16
IMPORTANT: If you chose to work on problem 16 with a partner, only one person from the pair should submit the file, and that person should add the other person as a group member following step 6 below.
Submit your ps1_problem16.pdf
file using these steps:
If you still need to create a PDF file, open your file
on Google Drive, choose File->Download->PDF document, and
save the PDF file in your ps1
folder.
Click on the name of the assignment in the list of assignments on Gradescope. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF button and find the PDF file that you created. Then click the Upload PDF button.
You should see a question outline along with thumbnails of the pages from your uploaded PDF. For each question in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the questions in the question outline, click the Submit button in the lower-right corner of the window. You should see a box saying that your submission was successful.
If you worked with a partner and you are the one who is submitting the file:
Click on the Group Members link that appears at the bottom of the page.
In the pop-up box that appears, click on the Add Student drop-down menu.
Type your partner’s name or choose it from the drop-down menu.
Click the Add button.
Check to ensure that your partner’s name now appears along with your name under the Group heading in the upper-right corner of the page.
Important
It is your responsibility to ensure that the correct version of every file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submissions carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
Last updated on February 27, 2025.