Both parts due by 11:59 p.m. on Tuesday, September 30, 2025.
In your work on this assignment, make sure to abide by the collaboration policies of the course.
If you have questions, please come to office hours, post them on
Piazza, or email cs460-staff@cs.bu.edu
.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
40 points total
Create a subfolder called ps2
within your
cs460
folder, and put all of the files for this assignment in that
folder.
This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps2_partI
.
Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose
File->Download->PDF document, and save the PDF file in your ps2
folder. The resulting PDF file (ps2_partI.pdf
) is the one that
you will submit. See the submission guidelines at the end of Part I.
19 points total
Recall the Person
table from our movie database in
PS 1. Assume that we are using a modified
version of that table with the following schema:
Person(id CHAR(7), name VARCHAR(25), dob CHAR(10), num_kids INTEGER)
where num_kids
is a new field for the number of children that the
person has. In addition, we have omitted the pob
field.
Consider the following tuple from that modified version of the table:
('1289434', 'Emily Blunt', '02/23/1983', 2)
(3 points) What would this tuple look like if we stored it in a
fixed-length record? In the 1.1 and 1.2 section of ps2_partI
(see above), put your answer in the table labeled record contents.
You should observe the following conventions:
Give each data value or metadata value its own cell of the table. Adjust the widths of the cells as needed to better fit the sizes of the values, and delete any cells that are not needed.
Use a number sign ('#'
) as a delimiter when it is necessary
to record the end of a variable-length field’s value.
Use hyphens ('-'
) for any “wasted” bytes (i.e, bytes that
are part of the record’s representation but are not actually
storing useful data or metadata).
To illustrate these conventions, imagine that we were working with
the Enrolled
table in our university database, which has the
following schema:
Enrolled(student_id CHAR(9), course_name VARCHAR(10), credit_status VARCHAR(10));
If we wanted take the tuple
('U00000006', 'CS 460', 'ugrad')
and show what it would look like using a fixed-length record, we would fill in the table as follows:
(2 points) What is the length in bytes of the record from part 1? Assume that we are using:
four-byte integer field values
one-byte characters – including any digit
characters that are part of a CHAR
or VARCHAR
.
Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(3 points) What would this tuple look like if we stored it in a variable-length record in which each field is preceded by its length?
In the 1.3 and 1.4 section of ps2_partI
, put your answer in
the table labeled record contents.
In addition to the conventions that we specified for part 1, you should also give each metadata value its own cell of the table. Change the background color of cells containing metadata to distinguish them from cells containing actual data values. You can do so by using the icon that looks like a paint can in the menu bar at the top of Google Docs.
In addition to the assumptions about the sizes of characters and integers that we gave you in part 2, you should assume that integers used for metadata are two bytes long (not four bytes).
(2 points) What is the length in bytes of the record from part 3? Make the same assumptions stated in parts 2 and 3. Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(4 points) What would this tuple look like if we stored it in a variable-length record that begins with a header of offsets?
In the 1.5 and 1.6 section of ps2_partI
, put your answer in
the table labeled record contents. Use the same conventions that
we specified for parts 1 and 3, and use the same assumptions about
the sizes of characters, integer field values, and integer metadata
that we gave you in parts 2 and 3.
Note: In your work on this problem set, you should assume that primary-key values are stored in the same way as any other field, as we showed in the lecture notes on record formats and in Task 1.3 of Lab 3. The lecture notes on marshalling (which we will start before this problem set is due) take a different approach when it comes to storing a primary key, but you should not use that approach on this problem set.
(2 points) What is the length in bytes of the record from part 5? Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(3 points) Now consider the following Person
tuple:
('0048918', 'Sean Baker', NULL, 0)
The NULL
value for dob
reflects the fact that our database is
missing his date of birth.
What would this tuple look like if we stored it in a variable-length record that begins with a header of offsets?
In the 1.7 section of ps2_partI
, put your answer in
the table labeled record contents. You should use:
the approach to NULL
values that we took in lecture
the same conventions that we specified for parts 1 and 3
the same assumptions about the sizes of characters, integer field values, and integer metadata that we gave you in parts 2 and 3.
There is no separate length-computation question for this record.
21 points total; 7 points each part
Let’s say that you want to insert items with the following sequence of keys into a collection of records that uses some form of indexing:
6, 15, 8, 17, 11, 12, 2, 25, 20, 10, 30, 3, 27
Insert this key sequence into an initially empty B-tree of order 2.
In section 2.1 of ps2_partI
, show the tree after each
insertion that causes a split of one or more nodes, and the final
tree.
We have given you a sample diagram that includes nodes of different sizes. Make copies of the diagram so that you can use separate diagrams for the results of each insertion that causes a split, and for the final tree. Note that you do not need to keep the shape of the tree that we have given you. Rather, you should edit it as needed: deleting or adding nodes and edges, replacing the Xs with keys, adding or removing keys, and making whatever other changes are needed.
Insert this same key sequence into an initially empty B+tree (note
the +) of order 2. In section 2.2 of ps2_partI
, show the tree
after each insertion that causes a split of one or more nodes, and
the final tree. Here again, you should make copies of the diagram
that we have given you and edit them as needed.
Insert this same key sequence into a hash table that uses linear
hashing. In section 2.3 of ps2_partI
, use the tables that we
have provided to show the state of the table before and after each
increase in the number of buckets, as well as the final state of
the table.
Important details:
The table should use the hash function h(x) = x, and it should start out with two empty buckets.
Within a given bucket, please list the keys in the order in which they were inserted.
Assume that a bucket is added whenever the number of items in the table exceeds three times the number of buckets.
An item that causes the table to grow should appear in both the before and after tables for that increase. See the PS 2 FAQ for more details.
Login to Gradescope by clicking the link in the left-hand navigation bar. Once you are in logged in, click on the box for CS 460.
Submit your ps2_partI.pdf
file using these steps:
If you still need to create a PDF file, open your file
on Google Drive, choose File->Download->PDF document, and
save the PDF file in your ps2
folder.
Click on PS 2: Part I in the list of assignments on Gradescope. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps1_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
60 points total
30 points total; 6 points each part
In this problem, you will write some additional SQL queries for the movie database that you worked with in PS 1.
Download the following file into your ps2
folder:
If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.
Once you have finalized the SQL command for a given problem, copy the
command into your ps2_queries.py
file, putting it between the
triple quotes provided for that problem’s variable. We have
included a sample query to show you what the format of your
answers should look like.
In your work on this problem, you should follow the same guidelines that we gave you in PS 1. In particular, note that you may only use aspects of SQL that we have discussed in lecture, and you should NOT use nested subqueries in the SELECT clause or FROM clause.
Make sure to read and follow the guidelines referred to above.
Many of the actors in our database were born in one of two states: New York and California. (This may stem from the fact that many actors live and raise families in those states, and children of actors often become actors themselves!) Write a query that finds the total number of actors born in those two states. The result of your query should be a single number.
Notes:
You should assume that all people born in New York state have the
string 'New York, USA'
at the end of their pob
string, and
all people born in California have the string 'California, USA'
at the end of their pob
.
You should consider someone an actor if they have acted in one or more of the movies in our database.
Write a query to find the youngest person in the database who was
born somewhere in New York state. You should use the assumption
mentioned above about the pob
values of such people. If there
are multiple people tied as the youngest person from New York
state, your query should find all of them. The results should be
one or more tuples of the form (name, date of birth).
Write a query that finds, for each movie rating in the database, the number of Oscars in our database associated with movies that have that rating. The result of the query should be tuples of the form (movie rating, number of Oscars), sorted in descending order by the number of Oscars. If a given rating is associated with none of the Oscars in the database, it should still appear in the results with a count of 0.
Notes:
Your query should exclude movies that have a rating of NULL.
It’s possible that there are no ratings in the database that are associated with 0 Oscars, but your query should still work correctly if there were such a rating.
Write a query to find all movie ratings that are associated with
none of the top-grossing movies – i.e., with none of the movies
that have a non-NULL value for earnings_rank
. The result of the
query should be one or more tuples, each of which contains a single
value that is one of the relevant ratings. Here again, you should
exclude movies that have a rating of NULL.
Write a query to find all people in the database who have won the same type of Oscar two years in a row. The result should be tuples with four attributes: the name of the person, the type of the Oscar that they won two years in a row, the year of first win, and the year of second win.
Hints:
FROM
clause to include multiple
instances of at least one of the tables in the database.WHERE
clause can include the standard
arithmetic operators (+
, -
, *
, /
, and %
). 30 points; pair-optional
This is the only problem of the assignment that you may complete with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.
In this problem, you will write several methods in Java that are related to the second type of variable-length record discussed in lecture: the one in which each record begins with a header of offsets.
Download the following file into your ps2
folder:
Problem4.java
If your browser doesn’t allow you to specify where the file should be saved, try right-clicking on the link above and choosing Save as... or Save link as..., which should produce a dialog box that allows you to choose the correct folder for the file.
Launch VS Code on your laptop.
In VS Code, select the File->Open Folder or File->Open menu
option, and use the resulting dialog box to find and open your
ps2
folder. (Note: You must open the folder; it is not
sufficient to simply open the Java file.)
The name of the folder should appear in the Explorer pane on the left-hand side of the VS Code window, along with a list of all of its contents.
Click on the name Problem4.java
in the Explorer pane, which
will open the file that you need to modify.
You will implement three static methods in the Problem4
class.
Important guidelines
We’ve provided the headers of the methods that you will implement. You must not change these headers in any way.
We have included an import
statement for the class called
Arrays
from the java.util
package. This will allow you to
use the built-in method called Arrays.toString
for testing
purposes. In addition, you may use the following basic Java
classes: String
, Integer
, Double
, System
and
IllegalArgumentException
. You must not use any other
classes or add any other import
statements. In addition,
you must not use any Java features that were not present in
Java 8.
We have included a main
method at the bottom of the class
that you can use for testing. Simply add the appropriate test
code to main
and run Problem4
to see if you obtain the
correct results.
Note: If you have trouble running Problem4
using the Run
link or Run button, you should be able to compile and run it
from the command line of the Terminal as follows:
to compile:
javac *.java
to run:
java Problem4
Here are your tasks:
Implement the method called computeOffsets()
whose header we
have provided. It takes an array called fieldVals
that contains
a sequence of field values, and it should create and return an
array of integers for the offsets that would appear at the
beginning of a variable-length record for those fields, based on
the conventions discussed in lecture.
The fieldVals
array is an array of type Object
, which means it
can store references to any type of object. For this problem,
valid elements of fieldVals
fall into one of four categories:
a reference to a String
object that represents a string field
value (e.g., a value of type VARCHAR
or a CHAR
in SQL)
a reference to an Integer
object that represents an integer
field value (e.g., a value of type INTEGER
in SQL)
a reference to a Double
object representing a floating-point
field value (e.g., a value of type REAL
in SQL)
the Java literal null
representing a field value of NULL
.
For example, if you run the following test code (adding it to the
main
method):
Object[] vals1 = {"012345", 200, 3.14, null}; int[] offsets1 = computeOffsets(vals1); System.out.println(Arrays.toString(offsets1));
you should see:
[10, 16, 20, -1, 28]
Important guidelines:
Your method should throw an IllegalArgumentException
for
any of the following cases:
the value of the parameter fieldVals
is null
the array that the parameter refers to has a length 0
one or more elements of the array are invalid because they don’t fall into one of the four categories of valid elements outlined above.
In this method and the other parts of this problem, you should take the same approach to computing the offsets that we used in the lecture notes on record formats and in Task 1.3 of Lab 3. In particular, you should:
Assume that we are using four-byte integer field values, two-byte integers for offsets, and one-byte characters.
Assume that floating-point field values are stored using eight-byte values.
Assume that the primary key is stored in the same way as any other field. As a result, you do not need to worry about which of the field values is the primary key.
Use the same approach to NULL
values that we took in
lecture.
You should determine what type of value a given field has by
using Java’s instanceof
operator. For example, consider
again the array from the test above:
Object[] vals1 = {"012345", 200, 3.14, null};
Given this definition:
The expression (vals1[0] instanceof String)
would evaluate
to true
, but (vals1[0] instanceof Integer)
and
(vals1[0] instanceof Double)
would both evaluate to false
.
The expression (vals1[1] instanceof Integer)
would evaluate
to true
, but (vals1[1] instanceof String)
and
(vals1[1] instanceof Double)
would both evaluate to false
.
The expression (vals1[2] instanceof Double)
would evaluate
to true
, but (vals1[2] instanceof String)
and
(vals1[1] instanceof Integer)
would both evaluate to
false
.
Implement the method called recordString()
whose header we have
provided. It takes an array that contains a sequence of field
values, and it should construct and return a string
representation of the variable-length record that would be
created for those fields. In the string that you return, the
components of the record (both offsets and field values) should be
surrounded by vertical bar (|
) characters.
For example, if you run the following test code (adding it to the
main
method):
Object[] vals1 = {"012345", 200, 3.14, null}; System.out.println(recordString(vals1));
you should see:
|10|16|20|-1|28|012345|200|3.14|
Important guidelines:
Your method must use your computeOffsets()
method to
determine the necessary offsets.
Remember that you are limited in the Java classes that you can
use. In particular, you should not use the StringBuilder
class. Instead, you can build the return value using string
concatenation.
Your method should throw an IllegalArgumentException
for the
same reasons mentioned in the previous method. However,
because your computeOffsets
method should already perform
the necessary error-checking, you shouldn’t need to repeat it
here – provided that you are careful to call computeOffsets
at the start of your new method.
Implement the method called fieldLength()
whose header we have
provided. It takes an integer i
and an array of integers
offsets
, and it should determine and return the length of the
field whose index is i
in a variable-length record whose header
contains the specified offsets. Special case: If the offsets
indicate that the specified field has a value of null
, the
method should return -1.
For example, if you run the following test code:
Object[] vals1 = {"012345", 200, 3.14, null}; int[] offsets1 = computeOffsets(vals1); System.out.println(Arrays.toString(offsets1)); System.out.println(fieldLength(0, offsets1)); System.out.println(fieldLength(1, offsets1)); System.out.println(fieldLength(2, offsets1)); System.out.println(fieldLength(3, offsets1));
you should see:
[10, 16, 20, -1, 28] 6 4 8 -1
Note that you will need to use the provided offsets to compute the length of the specified field. For example:
We can determine that the length of field 0 is 6 by noting that the offset of field 0 itself is 10, the offset of the next field is 16, and 16 – 10 = 6.
Determining the length of field 2 is trickier. Although we know that the offset of field 2 is 20, the offset of the next field is -1, which doesn’t help us. Instead, we need to continue processing offsets until we find one that is not -1. In this case, we find the offset 28, and 28 – 20 = 8, which we can return as the length of field 2.
Important guidelines:
You shouldn’t make any assumptions about the number of NULL values that a record can include.
Your method should throw an IllegalArgumentException
for
any of the following cases:
the value of the offsets
parameter is null
the value of i
is negative
the value of i
is too big in light of the number of
offsets in the array. For example, given the test code
provided above, the call fieldLength(4, offsets1)
should
throw an exception because if a record has 5 offsets,
it would not include a field whose index is 4.
If none of the above cases hold, you may assume
that the integers in the offsets
array are valid offsets
for a well-formed variable-length record.
Login to Gradescope by clicking the link in the left-hand navigation bar, and click on the box for CS 460.
You should submit only the following two files:
ps2_queries.py
Problem4.java
Note: If you worked on Problem 4 with a partner, you should each make your own submission of your joint work, making sure to include your partner’s name and email in the comments at the top of the file.
Here are the steps:
Click on PS 2: Part II in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your files to the box labeled DRAG & DROP. You can either drag and drop the files from their folder into the box, or you can click on the box itself and browse for the files.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your files. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Notes:
You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
If you get a message saying that the Autograder failed to
execute correctly, it is possible that one of your queries
from Problem 3 is taking too long to execute. Make sure that
you have checked each query in DB Browser to ensure that it
executes correctly, and that you have copied the correct query
into your ps2_queries.py
file. If you have a query that is
causing DB Browser to hang, you should either remove that
query from your file before submitting it, or you should
comment it out by putting a #
symbol at the beginning of
each of line of the query.
For Problem 4, you should see some preliminary results. If you’re not seeing any results at all for that problem, it may mean that you’re using features of Java that are not supported by the version of Java that we’re using in the Autograder. In particular, you should NOT:
add any new import statements to the top of your file
use methods from built-in classes like String
that were
added in versions of Java after version 8.
You can determine whether a method is supported by Java 8
by checking if is found in the
API docs.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of each file to view its contents. Check to make sure that the files contain the code that you want us to grade.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
Last updated on September 27, 2025.