Part I due by 11:59 p.m. on Tuesday, February 3, 2026
Part II due by 11:59 p.m. on Tuesday, February 10, 2026
If you haven’t already done so, you should complete Lab 0 before beginning this assignment.
In your work on this assignment, make sure to abide by the collaboration policies of the course.
If you have questions, please come to office hours, post them on
Piazza, or email cs460-staff@cs.bu.edu.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
30 points total
If you haven’t already created a folder named cs460 for your
work in this course, follow these
instructions to do so.
Then create a subfolder called ps1 within your cs460 folder,
and put all of the files for this assignment in that folder.
Recommended: Configure your browser so that it asks you where each downloaded file should be saved. Both Chrome and Edge allow you to do this as follows:
The problems from Part I will all be completed in a single PDF file. To create it, you should do the following:
Make sure you are signed into your BU Google Drive account. Note that signing into BU Google Mail should accomplish the same thing.
Access the template that we have created by clicking on this link.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps1_partI.
Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose
File->Download->PDF document, and save the PDF file in your ps1
folder. The resulting PDF file (ps1_partI.pdf) is the one that you
will submit. See the submission guidelines at the end of Part I.
10 points total
Important
This problem, like most of the problems in this assignment, is individual-only. In general, unless a problem or section of an assignment is explicitly labeled as pair-optional, you should assume that it is individual-only. See our collaboration policies for details about both types of problems.
Before going any further, make sure to follow the instructions above to create your own copy of the Google Drive document that you should use for Part I of the assignment.
The ER diagram shown below is part of the design of a database that
includes information about actors and the parts that they perform in
plays, musicals, movies and TV shows. Each actor has a name and a
unique id. Each part can be uniquely identified by the combination of
the id of the corresponding show and the name of the role, and the
database also keeps track of how many lines of dialogue that each part
has (the attribute called line count).

In the above version of the ER diagram, there are no constraints on the relationships between actors and parts.
In the ps1_partI template that we’ve provided on Google Drive (see
above), we’ve included the beginnings of three separate versions of
this ER diagram.
(2 points) Edit the first version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every actor performs at least one (i.e., 1 or more) part. To do so, you should:
Click on the diagam and then click the Edit link that appears below the diagram.
From the collection of eight connectors that we have provided below the diagram, select the appropriate connectors and use them to connect the two entity sets to the relationship set. Adjust the connectors as needed to make the connections.
Click the Save & Close button.
(2 points) Edit the second version of the diagram, adding the connections needed to create an ER diagram whose only constraint is that every part is performed by exactly one actor.
(2 points) Edit the third version of the diagram, adding the connections needed to create an ER diagram whose only constraints are (1) that every actor performs at most one part, and (2) that every part is performed by at least one actor.
(4 points) Consider your answer to part 1—the ER diagram for the situation in which every actor has at least one part.
Is it possible to convert that diagram into a relational schema that has only two tables/relations, or would three be needed? Explain your answer briefly, and then specify the schema of the two or three tables.
Give the schema of each table in the form relation_name(attr_name1, attr_name2, ...).
Indicate the primary-key attribute(s) of each relation by underlining them.
12 points total
The ER diagram shown below in Figure 2-1 represents a music database – one that stores information about recording artists, recording labels (e.g., EMI or Sony), songs, and albums.
Figure 2-1:

Note that the diagram includes relationship sets for four types of relationships:
Answer the following questions about this diagram:
(2 points) At least one of the relationship sets captures a many-to-one relationship. Which one(s)? In your answer, you should specify the direction of each such relationship (e.g., ________ is many-to-one from _________ to ___________).
(4 points) Fill in the table that we have provided to describe, for each relationship set in the diagram, any constraints on the relationships associated with that relationship set. Use words that describe the problem domain (e.g., Each course meets in at most one room...) rather than technical terminology.
(6 points) Transform this diagram into a relational schema by following the procedure discussed in lecture and fill in the table that we have provided. Here are some additional guidelines:
When appropriate, you should combine relations as discussed in lecture. Note: You should assume that most artists belong to a label.
In the first column of the table, give the schema of each
relation in the form relation_name(attr_name1, attr_name2,
...). Feel free to rename attributes as needed to make them
easier to understand. However, if you do so, make sure that
you do not use a period or space in an attribute name.
Rather, you should use a combination of letters and underscore
(_) characters.
Indicate the primary-key attribute(s) of each relation by underlining them.
If a relation has one or more foreign-key attributes, specify them in the second column of the table, and specify which attribute each foreign key refers to. For example, if you were working with the MajorsIn relation from the university database covered in lecture, its foreign keys would be specified as follows:
8 points total; 2 pts. each part
We will finish the material needed for this problem in lecture on Friday, January 30.
Relation R has attributes a, b and c. Relation S has attributes d, a, and b. You are given the following instances of these relations:
Relation R
|
a |
b |
c |
|---|---|---|
|
1 |
2 |
3 |
|
3 |
4 |
6 |
|
5 |
6 |
1 |
Relation S
|
d |
a |
b |
|---|---|---|
|
3 |
2 |
1 |
|
5 |
3 |
4 |
|
6 |
1 |
2 |
For each of the following questions, use the Insert->Table menu option in Google Drive to insert an appropriately sized table for the answer, and then fill in the cells of the table with the correct values.
Make sure that each table includes a header row with appropriate column names.
Once you have completed Part I in Google Drive, choose
File->Download->PDF document, and save the resulting file
(ps1_partI.pdf) in your ps1 folder.
Login to Gradescope by clicking the link in the left-hand navigation bar. When logging in, make sure that you use the School Credentials option and select Boston University.
Once you are in logged in, click on the box for CS 460. (If
you don’t see that box, email cs460-staff@bu.edu ASAP and ask to
be added to the course on Gradescope. If the deadline is fast
approaching and you don’t have access to CS 460 on Gradescope,
email your PDF to cs460-staff@cs.bu.edu before the
deadline.)
Click on the name PS 1: Part I in the list of assignments. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps1_partI.pdf that you created in step 1.
Then click the Upload PDF button.
You should see a question outline along with thumbnails of the pages from your uploaded PDF. For each question in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window. You should see a box saying that your submission was successful.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
70 points total
We will complete the material needed for this part of the assignment during the week of February 2.
The problem domain for this part of the assignment is movie trivia.
Before you get started, you should carefully read over the full description of the database. Please do so now!
If you haven’t already done so, download and install DB Browser for SQLite by following these instructions.
Download the following files into your ps1 folder:
If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.
Launch the program.
Click the Open Database button, and find and open the movie.sqlite
database file that you downloaded above.
To explore the schema of the database, click on the Database Structure tab, and then click on the arrows to the left of the table names.
To explore the contents of the tables, click on the Browse Data tab, and then choose the appropriate table from the drop-down menu.
Use the Execute SQL tab to perform queries on the database. Enter your SQL command in the space provided, and press F5 or Ctrl-R to run it. (There is also a small button that you can click; it has a triangular shape that looks like the Play button of a music player.)
ps1_queries.py is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the SQL commands needed to solve the problems given below. Test each SQL command in DB Browser for SQLite to make sure that it works.
Once you have finalized the SQL command for a given problem, copy
the command into your ps1_queries.py file, putting it between
the triple quotes provided for that problem’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Unless otherwise stated, each of the problems must be solved by means of a single query (i.e., each query should have a single semi-colon). Use nested subqueries in the WHERE clause as needed. However, you should NOT use nested subqueries in the SELECT clause or FROM clause.
You must limit yourself to aspects of SQL that we have discussed in lecture, unless the problem indicates otherwise. At a minimum, failure to do so will result in a score of 0 for the corresponding problem. See our course policies for more detail.
Your queries should not use a LIMIT clause.
Your queries should only use the keyword DISTINCT if it is
necessary to obtain the correct results. Similarly, you should
only use an outer join when it is strictly necessary.
Make sure that the results produced by your queries contain exact answers to the problems. We should not have to infer the answer from the results. For example, if we ask you how many movies meet a given criterion, we want a number, not a list of the movies. In addition, your results should not include any extraneous information.
5 points
Important
Make sure to read the important guidelines above before you get started! In particular, note that you may only use aspects of SQL that we have discussed in lecture, and you should NOT use nested subqueries in the SELECT clause or FROM clause.
Jessie Buckley and Chloe Zhao have both been nominated for an Oscar for their work in the movie Hamnet – Buckley for Best Actress and Zhao for Best Director. Write a single query to find the places of birth and dates of birth for these two people. The result of the query should be tuples of the form (name of person, place of birth, date of birth).
Hint: If your initial query does not produce any results, you may
want to reconsider the logical operator (AND, OR, NOT) that you
are using in your WHERE clause.
5 points
One of this year’s nominees for Best Picture is the horror movie
Sinners. Have there been any previous horror movies that won Best
Picture? Write a query to find out! You should assume that any horror
movie has the letter 'H' somewhere in the value of its genre
attribute. The results of your query should be tuples of the form
(movie name, year in which the Oscar was awarded).
5 points
Jacobi Jupe is a 12-year old British actor who plays the role of
Hamnet in the movie of the same name. Jupe hasn’t been added to our
database yet, but once he is, he will be the youngest person from the
UK in the database. Write a query to find the name and date of birth
of the current youngest person from the UK – i.e., the one whose date
of birth is largest. You should assume that all people from the UK
have the letters 'UK' at the very end of their place of birth. If
multiple people are tied for being the youngest, your query should
include all of them. The results of your query should be tuples of the
form (person name, date of birth).
Important
For full credit, your answer must use a subquery. Although SQLite provides a way to solve this problem without a subquery, that technique is not standard SQL, and thus it should not be used as part of your answer.
5 points
How does the number of top-grossing movies (i.e., movies whose earnings rank is not null) change over time? Write a query that determines, for every year from 2015 until the present, the number of top-grossing movies from that year. The result of your query should be tuples of the form (year, number of top grossers for that year). Order the results from 2025 to 2015.
Important: Because of the pandemic, the year 2020 had no
top-grossing movies. However, you should still include that year
in your results with a count of 0. As a result, your query should
not include a predicate in the WHERE clause that excludes movies
with an earnings rank of NULL, because doing so would eliminate all
movies from 2020 – and thus 2020 would not show up in the results!
Instead, you should use the necessary aggregate function in such a way
that it won’t count movies whose earnings rank is NULL.
5 points
Which directors have been most successful at the box office? Write a query that finds all directors who have directed at least four of the 200 top-grossing films. The result of the query should consist of tuples of the form (director, number of top grossers). List them in order of the number of top-grossers they have directed, from the highest number to the lowest. If multiple directors have the same number of top grossers, sort them in ascending order by director name. You may assume that all directors in the database have a unique name.
5 points
Write a query that finds the number of foreign-born directors who have directed one or more movies in which a foreign-born actor or actress appeared. The result of your query should be a single number.
Hints:
'USA' at the very end of their pob field.NULL
value for pob.5 points
Jessie Buckley is from Ireland. Write a query that summarizes, for
each person in the database who was born in Ireland, the Oscars in the
database (if any) that have been won by that person. You should
assume that the relevant people have 'Ireland' at the very end of
their place of birth. The result of the query should be tuples of the
form (person name, Oscar type). If a relevant person has won no awards,
they should still have a single row in the results with a value of NULL
for the type.
5 points
Now let’s focus on directors who have been less successful at the box office. Write a query that finds the number of directors in the database who have not directed any of 200 top-grossing movies. The result of the query should be a single number.
5 points
Write a query that determines the number of movies in the database that are shorter than all of the movies that have been named Best Picture? Hint: You will need a subquery.
5 points
Write a query that determines, for each movie rating included in the database, the following values: the average runtime of all movies associated with that rating, and the number of Oscars (if any) won by movies with that rating. If a given rating has no Oscars associated with it, it should still be included in the results with a count of 0. You should exclude movies whose rating is NULL. The result of the query should be tuples of the form (rating, average runtime, number of Oscars won).
'One'5 points
The movie One Battle After Another has been nominated for a number
of Oscars. Write a query to determine the names and years of all
Oscar-winning movies whose names begin with 'One'. You should
use pattern-matching as needed, and make sure that you only include
movies in which the first word of the name is exactly 'One'. The
result of your query should be tuples of the form (movie name, year in
which the movie was released). If a given movie has won more than one
Oscar, it should appear only once in your results.
Hint: Think about how to construct a pattern that obtains movie
names whose first word is 'One' without getting ones whose first
word begins with 'One' (e.g., 'Onetime') or ones that have 'One'
somewhere after the very beginning.
5 points
The PG-13 rating is a relatively recent one. Write a query to find the oldest movie(s) in the database that have a PG-13 rating – i.e., the movie or movies with a PG-13 rating whose year is furthest from this year. The result of the query should be one or more tuples of the form (movie name, year).
Important
For full credit, your answer must use a subquery. Although SQLite provides a way to solve this problem without a subquery, that technique is not standard SQL, and thus it should not be used as part of your answer.
10 points total; pair-optional
This is the only problem of the assignment that you may complete with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.
Getting started
Your work on this problem should go in a separate PDF file called
ps1_problem16.pdf. To create it, you should:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps1_problem16.
Add your work for this problem to this file.
When you are finished, choose File->Download->PDF document, and
save the PDF file on your machine. The resulting PDF file
(ps1_problem16.pdf) should be submitted following the
instructions below.
The problems
Write a relational-algebra query for each of the following
problems. Above, you will write a SQL query for these problems;
here you should solve them using relational algebra instead.
See below for some additional guidelines.
(3 points) problem 5 (horrorful Best Pictures). You may assume that relational algebra includes the same pattern-matching operators as SQL.
(4 points) problem 10 (Oscars for the Irish).
(3 points) problem 11
(directors with no top grossers), but instead of
producing a count, your query should produce the ids of the
directors. Hint: Take advantage of the set difference operator
(-).
Additional guidelines
To form your queries, you should use the templates that we have
provided for each of the relational-algebra operators at the top
of ps1_problem16. Copy the appropriate template(s), and replace
the placeholders with the appropriate expressions, attributes
and/or predicates to form each query. We have provided an example
in ps1_queries of using two of the templates to form a single
query.
For a given problem, you are allowed to perform a sequence of two
or more queries in which the results of a given query are assigned
to a variable using the assignment operator (<–), and then that
variable is used in a subsequent query. We have provided an
example of this in ps1_queries, and we recommend that you do
this whenever a given query would otherwise become too long or
difficult to read.
Note that using a sequence of queries is not allowed in your SQL answers, but it is allowed here.
Coming soon!
Last updated on February 5, 2026.