Part I due by 11:59 p.m. on Tuesday, February 20, 2024.
Part II due by 11:59 p.m. on Tuesday, March 5, 2024.
In your work on this assignment, make sure to abide by the collaboration policies of the course.
If you have questions, please come to office hours, post them on
Piazza, or email cs460-staff@cs.bu.edu
.
Make sure to submit your work on Gradescope, following the procedures found at the end of Part I and Part II.
40 points total
Create a subfolder called ps2
within your
cs460
folder, and put all of the files for this assignment in that
folder.
This part of the assignment will all be completed in a single PDF file. To create it, you should do the following:
Access the template that we have created by clicking on this link and signing into your Google account as needed.
When asked, click on the Make a copy button, which will save a copy of the template file to your Google Drive.
Select File->Rename, and change the name of the file to
ps2_partI
.
Add your work for the problems from Part I to this file.
Once you have completed all of these problems, choose
File->Download->PDF document, and save the PDF file in your ps2
folder. The resulting PDF file (ps2_partI.pdf
) is the one that
you will submit. See the submission guidelines at the end of Part I.
19 points total
Recall the Person
table from our movie database in
Problem Set 1. Assume that we are using a modified
version of that table with the following schema:
Person(id CHAR(6), name VARCHAR(20), dob CHAR(10), num_kids INTEGER)
where num_kids
is a new field for the number of children that the
person has. In addition, we have omitted the pob
field.
Consider the following tuple from that modified version of the table:
('748620', 'Paul Rudd', '04/06/1969', 1)
(3 points) What would this tuple look like if we stored it in a
fixed-length record? In the 1.1 and 1.2 section of ps2_partI
(see above), put your answer in the table labeled record contents.
You should observe the following conventions:
Give each data value its own cell of the table. Adjust the widths of the cells as needed to better fit the sizes of the values, and delete any cells that are not needed.
Use a number sign ('#'
) as a delimiter when it is necessary
to record the end of a variable-length field’s value.
Use hyphens ('-'
) for any “wasted” bytes (i.e, bytes that
are part of the record’s representation but are not actually
storing useful data or metadata).
To illustrate these conventions, imagine that we were working with
the Enrolled
table in our university database, which has the
following schema:
Enrolled(student_id CHAR(9), course_name VARCHAR(10), credit_status VARCHAR(10));
If we wanted to show what the tuple
('U00000006', 'CS 460', 'ugrad')
would look like using a fixed-length record, we would fill in the table as follows:
(2 points) What is the length in bytes of the record from part 1? Assume that we are using:
four-byte integer field values
one-byte characters – including any digit
characters that are part of a CHAR
or VARCHAR
.
Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(3 points) What would this tuple look like if we stored it in a variable-length record in which each field is preceded by its length?
In the 1.3 and 1.4 section of ps2_partI
, put your answer in
the table labeled record contents.
In addition to the conventions that we specified for part 1, you should also give each metadata value its own cell of the table. Change the background color of cells containing metadata to distinguish them from cells containing actual data values. You can do so by using the icon that looks like a paint can in the menu bar at the top of Google Docs.
In addition to the assumptions about the sizes of characters and integers that we gave you in part 2, you should assume that integers used for metadata are two bytes long (not four bytes).
(2 points) What is the length in bytes of the record from part 3? Make the same assumptions stated in parts 2 and 3. Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(4 points) What would this tuple look like if we stored it in a variable-length record that begins with a header of offsets?
In the 1.5 and 1.6 section of ps2_partI
, put your answer in
the table labeled record contents. Use the same conventions that
we specified for parts 1 and 3, and use the same assumptions about
the sizes of characters, integer field values, and integer metadata
that we gave you in parts 2 and 3.
(2 points) What is the length in bytes of the record from part 5? Put your final answer in the box labeled length in bytes, and show your work in the box below the answer.
(3 points) Now consider the following Person
tuple:
('151726', 'Issa Rae', NULL, 0)
The NULL
value for dob
reflects the fact that this version of
the database is missing her date of birth.
What would this tuple look like if we stored it in a variable-length record that begins with a header of offsets?
In the 1.7 section of ps2_partI
, put your answer in
the table labeled record contents. You should use:
the approach to NULL
values that we took in lecture
the same conventions that we specified for parts 1 and 3
the same assumptions about the sizes of characters, integer field values, and integer metadata that we gave you in parts 2 and 3.
There is no separate length-computation question for this record.
21 points total; 7 points each part
Let’s say that you want to insert items with the following sequence of keys into a collection of records that uses some form of indexing:
9, 13, 16, 22, 7, 21, 25, 18, 20, 14, 27, 19, 4
Insert this key sequence into an initially empty B-tree of order 2.
In section 2.1 of ps2_partI
, show the tree after each
insertion that causes a split of one or more nodes, and the final
tree.
We have given you a sample diagram that includes nodes of different sizes. Make copies of the diagram so that you can use separate diagrams for the results of each insertion that causes a split, and for the final tree. Note that you do not need to keep the shape of the tree that we have given you. Rather, you should edit it as needed: deleting or adding nodes and edges, replacing the Xs with keys, adding or removing keys, and making whatever other changes are needed.
Insert this same key sequence into an initially empty B+tree (note
the +) of order 2. In section 2.2 of ps2_partI
, show the tree
after each insertion that causes a split of one or more nodes, and
the final tree. Here again, you should make copies of the diagram
that we have given you and edit them as needed.
Insert this same key sequence into a hash table that uses linear
hashing. The table should use the hash function h(x) = x, and it
should start out with two empty buckets. Assume that a bucket is
added whenever the number of items in the table exceeds three
times the number of buckets. In section 2.3 of ps2_partI
, use
the tables that we have provided to show the state of the table
before and after each increase in the number of buckets, as well
as the final state of the table.
Login to Gradescope by clicking the link in the left-hand navigation bar. Once you are in logged in, click on the box for CS 460.
Submit your ps2_partI.pdf
file using these steps:
If you still need to create a PDF file, open your file
on Google Drive, choose File->Download->PDF document, and
save the PDF file in your ps2
folder.
Click on PS 2: Part I in the list of assignments on Gradescope. You should see a pop-up window labeled Submit Assignment. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Choose the Submit PDF option, and then click the Select PDF
button and find the ps1_partI.pdf
that you created in step 1.
Then click the Upload PDF button.
You should see an outline of the problems along with thumbnails of the pages from your uploaded PDF. For each problem in the outline:
As you do so, click on the magnifying glass icon for each page and doublecheck that the pages that you see contain the work that you want us to grade.
Once you have assigned pages to all of the problems in the question outline, click the Submit button in the lower-right corner of the window.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
You can use the Resubmit button at the bottom of the page to resubmit your work as many times as needed before the final deadline.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
60 points total
30 points; pair-optional
This is the only problem of the assignment that you may complete with a partner. See the rules for working with a partner on pair-optional problems for details about how this type of collaboration must be structured.
In this problem, you will write a series of methods that can be used
to create an XML version of the Person
table from Problem Set
1. Your methods will be part of a larger program that uses
the JDBC framework to connect to the SQLite database
that you used in PS 1 and to execute the SQL queries needed to extract
the necessary data.
You should begin by downloading the following zip file:
problem3.zip
Unzip/extract the contents of the file.
Depending on your system, after extracting the contents you will either have:
a folder named problem3
that contains all of the files that
you need for this problem
an outer folder called problem3
that contains an inner folder named
problem3
that contains all of the Java files that you need for this
problem.
Take the problem3
folder that actually contains the necessary
files and drag it into your ps2
folder so that you can easily find
and open it from within VS Code.
Read through our overview of the JDBC framework.
Launch VS Code on your laptop.
In VS Code, select the File->Open Folder or File->Open menu
option, and use the resulting dialog box to find and open the
problem3
folder that you created above – the one that contains the
provided files. (Note: You must open the folder; it is not
sufficient to simply open one of the Java files in the folder.)
The name of the folder should appear in the Explorer pane on the left-hand side of the VS Code window, along with a list of all of its contents.
Click on the name XMLforPeople.java
in the Explorer pane, which
will open the file that you need to modify.
Review all of the code that we’ve provided before you start writing any new code. See below for some additional information on what we’ve given you.
The class that you will be completing is called XMLforPeople
.
In this class we’ve given you:
the constructor for the class, which takes the name of a SQLite
file that should contain a relational database with the schema
outlined in Problem Set 1; the constructor
establishes a connection to the SQLite database, and it stores the
resulting Connection
object in a field called db
.
a helper method called simpleElem()
, which takes as inputs the
name and value of a simple element and returns a string of the
form "<name>value</name>"
. This method should only be used
to form simple elements – ones that do not have any child
elements.
a helper method called resultsFor()
, which takes as input a string
presenting a SQL query for the movie database and returns a
ResultSet
object that can be used to process the results of that
query. You are welcome to use this method in the code that you
write, although doing so is not required.
a method called idFor()
, which takes as input the name of a
person and performs a query to find and return the person’s id;
this method will be useful when testing the methods that you
write.
a method called createFile()
, which performs a query to obtain
the id numbers of all of the movies in the Movie
table, and
which processes them one at a time
the main
method, which runs the full program.
Note that many of these methods – and all of the methods that you
will implement – include a throws
clause in their method
header. This clause is needed because the code included in these
methods may throw the exception(s) mentioned in the throws
clause,
and instead of catching them, we are simply declaring that they may be
thrown.
You will implement a number of non-static methods of the
XMLforPeople
class.
Important
We’ve provided the headers of the methods that you will implement. You must not change these headers in any way.
In the code that you write, you must limit yourself to the packages that we’ve imported at the top of the starter file. You must not use classes from any other Java package. In addition, you must not use any Java features that were not present in Java 8.
We have given you a separate Java class TestDriver
that you
can use when testing each method. Simply add the appropriate
test code to the main
method of this class, and run TestDriver
to see if you obtain the correct output.
Note: If you are having trouble running TestDriver
using the
Run link or Run button, you should be able to compile and run it
from the command line of the Terminal as follows:
to compile:
javac -cp 'lib/*' *.java
to run on Windows:
java -cp 'lib/*;.' TestDriver
to run on macOS:
java -cp 'lib/*:.' TestDriver
The two commands for running the program are almost identical,
but in the Windows version there is a semi-colon (;
) before
the period in the string after -cp
, whereas the macOS version
uses a colon (:
).
Here are your tasks:
Implement the method called fieldsFor()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing a sequence of XML
elements – one for each non-null field (i.e., column value) in
that person’s tuple. You may assume that the person’s name will
never be null. If the person’s dob or pob has a null value, that
field should not be included in the returned string. If there
is no person with the specified id, the method should simply
return the empty string.
For example, if you run the following test code (adding it to the
main
method in TestDriver.java
):
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.fieldsFor(xml.idFor("Julianne Moore"))); System.out.println(xml.fieldsFor("1234567")); // no person with that id System.out.println(xml.fieldsFor(xml.idFor("Elisabeth Moss")));
you should see:
<name>Julianne Moore</name> <dob>1960-12-03</dob> <pob>Fayetteville, North Carolina, USA</pob> <name>Elisabeth Moss</name> <dob>1983-01-01</dob>
Notes:
In our database, Elisabeth Moss has a null value for her pob, and thus that element is not included in the string returned for her.
You should see an extra blank line when you print the results
of any call that produces one or more fields (including the
blank line after the results for Elisabeth Moss shown above),
because the string returned by the method should end with a
newline (see below), and the println
method adds its own
newline.
The second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the
call xml.fieldsFor("1234567")
returns an empty string. As a
result, there are two blank lines after the results for
Julianne Moore: one after her fields, and one from the
printing of the empty string.
Important guidelines:
You must begin by performing the appropriate SQL query.
Use the idFor()
method as a model for what you should do.
When processing the results, make sure to follow the approach given in our JDBC overview.
The elements for the non-null fields should appear in the same
order as the corresponding columns in the Person
table:
You must use the provided simpleElem()
method to form the
XML element for a given field.
Each element must be on its own line, which you can accomplish
by including a newline character ("\n"
) after each element.
Each element must be preceded by exactly four spaces, and there should be no extra spaces at the end of a given line.
Implement the method called movieElemsFrom()
whose header we
have provided. It takes a ResultSet
object for a SQL query that
produces tuples of the form (movie year, movie name), and it
creates and returns a string containing the XML for a sequence of
0 or more complex elements of type movie
that contain nested
child elements of type year
and name
. If the ResultSet
object has no results, the method should simply return the empty
string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); String query1 = "SELECT year, name FROM Movie WHERE year = 2020;"; ResultSet results1 = xml.resultsFor(query1); System.out.println(xml.movieElemsFrom(results1)); String query2 = "SELECT year, name FROM Movie WHERE earnings_rank = 1;"; ResultSet results2 = xml.resultsFor(query2); System.out.println(xml.movieElemsFrom(results2));
you should see:
<movie> <year>2020</year> <name>Nomadland</name> </movie> <movie> <year>2020</year> <name>The Father</name> </movie> <movie> <year>2020</year> <name>Judas and the Black Messiah</name> </movie> <movie> <year>2020</year> <name>Minari</name> </movie> <movie> <year>2015</year> <name>Star Wars: The Force Awakens</name> </movie>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method should end with a newline (see below), and the println
method adds its own newline.
Important guidelines:
You shouldn’t make any assumptions about the number of movies
in the ResultSet
. Rather, your code should be able to handle an
arbitrary number of results. See the createFile()
method
for a model of how to do this. If there are no results in
the ResultSet
, the method should return an empty string.
You must use the provided simpleElem()
method to form the
XML for each simple element. However, you should not use
simpleElem
to form the outer movie
elements, which are not
simple because they have children.
The returned string should be formatted as shown above. The
outer start tag, outer end tag, and each child element
should be on its own line, with a newline character ("\n"
)
at the end of each line.
The outer start tag and outer end tag should each be preceded by exactly six spaces, but the child elements should each be preceded by exactly eight spaces. There should be no extra spaces at the end of a given line.
Implement the method called actedIn()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing the XML for a
single complex element named actedIn
that includes nested child
elements of type movie
for all of the movies in the database in
which the person has acted, ordered by the year of the movie and
(in the case of ties) the name of the movie. If there is no person
with the specified id or if the person has not acted in any of the
movies in the database, the method should simply return the empty
string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.actedIn(xml.idFor("Geena Davis"))); System.out.println(xml.actedIn("1234567")); System.out.println(xml.actedIn(xml.idFor("Denzel Washington")));
you should see:
<actedIn> <movie> <year>1988</year> <name>Accidental Tourist, The</name> </movie> <movie> <year>1999</year> <name>Stuart Little</name> </movie> </actedIn> <actedIn> <movie> <year>1989</year> <name>Glory</name> </movie> <movie> <year>1993</year> <name>Philadelphia</name> </movie> <movie> <year>2001</year> <name>Training Day</name> </movie> <movie> <year>2016</year> <name>Fences</name> </movie> </actedIn>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the call
xml.actedIn("1234567")
returns an empty string.
Important guidelines:
You should begin by performing the appropriate SQL query to
get the necessary ResultSet
. Make sure to use an ORDER BY
clause in your query to get the necessary ordering.
After performing the query, almost all of the remaining work
will be done by making the appropriate call to your
movieElemsFrom()
method. You must use that method to
produce the XML for the movie
elements (if any) for the results
in the ResultSet
produced by your query.
Given the string returned by movieElemsFrom()
, you should
take the necessary steps to return either:
a string containing the XML for the appropriate actedIn
element
OR
an empty string, if there is no person with the specified id or if the person did not act in any of the movies in the database.
The returned string should be formatted as shown above. The
outer start tag, outer end tag, and each child element
should be on its own line, with a newline character ("\n"
)
at the end of each line.
You should not use the simpleElem
helper method in this
method.
The outer start tag and outer end tag should each be preceded by
exactly four spaces. The child elements should have the
leading spaces that were given to them by movieElemsFrom
.
There should be no extra spaces at the end of a given line.
Implement the method called directed()
whose header we have
provided. It takes a string representing the id number of a
person, and it should return a string containing the XML for a
single complex element named directed
that includes nested child
elements of type movie
for all of the movies in the database
that the person has directed, ordered by the year of the movie and
(in the case of ties) the name of the movie. If there is no person
with the specified id or if the person has not directed any of the
movies in the database, the method should simply return the empty
string.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.directed(xml.idFor("Geena Davis"))); System.out.println(xml.directed("1234567")); System.out.println(xml.directed(xml.idFor("Denzel Washington")));
you should see:
<directed> <movie> <year>2016</year> <name>Fences</name> </movie> </directed>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the first and second println
statements print only a
blank line because Geena Davis has not directed any of the movies
in the database and there is no person whose id is 1234567, and
thus the corresponding calls to directed
return an empty string.
The guidelines for the previous method also apply here, but for
directed
elements instead of actedIn
elements.
Implement the method called elementFor()
whose header we have
provided. It takes a string representing the id number of a person,
and it should return a string containing the XML for a single
complex element named person
that includes nested child elements
for:
the non-null fields in the person’s tuple (use fieldsFor()
to get these)
the movies (if any) in which the person has acted (use
actedIn()
to get these)
the movies (if any) that the person has directed (use
directed()
to get these).
In addition, each person
element must have an attribute
named id
for the person’s id number.
For example, if you run the following test code:
XMLforPeople xml = new XMLforPeople("movie.sqlite"); System.out.println(xml.elementFor(xml.idFor("Elisabeth Moss"))); System.out.println(xml.elementFor("1234567")); System.out.println(xml.elementFor(xml.idFor("Denzel Washington")));
you should see:
<person id="0005253"> <name>Elisabeth Moss</name> <dob>1983-01-01</dob> <actedIn> <movie> <year>1999</year> <name>Girl, Interrupted</name> </movie> </actedIn> </person> <person id="0000243"> <name>Denzel Washington</name> <dob>1954-12-28</dob> <pob>Mount Vernon, New York, USA</pob> <actedIn> <movie> <year>1989</year> <name>Glory</name> </movie> <movie> <year>1993</year> <name>Philadelphia</name> </movie> <movie> <year>2001</year> <name>Training Day</name> </movie> <movie> <year>2016</year> <name>Fences</name> </movie> </actedIn> <directed> <movie> <year>2016</year> <name>Fences</name> </movie> </directed> </person>
Here again, you should see an extra blank line when you print the
results of any call that produces one or more elements (including
the blank lines shown above), because the string returned by the
method ends with a newline, and the println
method adds its own
newline.
Note that the second println
statement prints only a blank line
because there is no person whose id is 1234567, and thus the call
xml.elementFor("1234567")
returns an empty string.
Important guidelines:
This method should not perform any queries of its own.
Rather, you must use your previous methods to obtain the
necessary child elements. (Note that you should not
use the simpleElem
method in this method.)
Don’t forget to include the person’s id as an attribute within
the start tag for person
, as shown above. In order to
include the quotes around the id, you will need to use the
escape sequence "\""
for each double-quote character.
The outer start tag and outer end tag should each be on their own line preceded by exactly two spaces and followed by a newline character. The child elements should have the same spacing and formatting described in the earlier method specifications, so you won’t need to add any new spaces to them. Once again, there should be no extra spaces at the end of a given line.
Once you have completed and tested all of your methods, running the
XMLforPeople
program should create a file named people.xml
that
represents the entire Person
table in XML!
30 points total
This problem asks you to construct XPath and XQuery queries for an XML version of our entire movie database. The schema of this XML database is described here.
To allow you to check your work, we’ll make use of a freely available XML DBMS called BaseX.
You should begin by downloading the following files into your folder for this assignment:
Make sure to put the files in your ps2
folder. If your browser
doesn’t allow you to specify where the file should be saved, try
right-clicking on the link above and choosing Save as... or Save
link as..., which should produce a dialog box that allows you to
choose the correct folder for the file.
Once you have completed the steps outlined above, you can perform queries by taking the following steps:
Start up the BaseX GUI by double-clicking on the JAR file that
you downloaded above (BaseX851.jar
).
Select the Database->New menu option, click the Browse button,
and use the resulting dialog box to find the imdb.xml
file that you downloaded above.
Click Open to select the file, and click OK to create the database.
To execute a query, enter it in the Editor pane in BaseX, and click the green play button to execute it. (You can also use Ctrl+Enter or Ctrl+Return for this purpose.)
The results (if any) will be displayed in the Result pane.
If you have trouble getting BaseX to work on your machine, here are some possible options:
On newer Macbooks, try downloading a newer version of the BaseX
JAR file: Base107.jar
On MacOS, try running the program from the Terminal.
First, use cd
to navigate to the appropriate folder – the one that
contains the downloaded JAR file.
Then, entering the following from the command line:
java -jar BaseX107.jar
Or, if you have the original JAR file:
java -jar BaseX851.jar
If you are able to launch BaseX but you can’t open the database file, you may need to give Java permission to access the disk. One set of instructions for doing so on Macs can be found here.
If all else fails, you can run BaseX on the virtual desktop. Once you are in the virtual desktop, you can:
imdb.xml
file.If you’re using a Mac, you should disable smart quotes, because they may lead to errors in BaseX and in our testing. There are instructions for doing so here.
ps2_queries.py
is a Python file, so you could use a Python IDE
to edit it, but a regular text editor like TextEdit or Notepad++
would also be fine. However, if you use a text editor, you must
ensure that you save it as a plain-text
file.
Construct the XQuery commands needed to solve the problems given below. Test each command in BaseX to make sure that it works.
Once you have finalized the XQuery command for a given problem, copy
the command into your ps2_queries.py
file, putting it between
the triple quotes provided for that query’s variable. We have
included a sample query to show you what the format of your
answers should look like.
Each of the problems must be solved by means of a single query. Unless the problem specifies otherwise, you may use either a standalone XPath expression or an XQuery FLWOR expression.
The only place that you may use a subquery (i.e., a nested FLWOR
expression) is in the results
clause of an outer FLWOR
expression. You should NOT have a nested FLWOR expression in a
for
clause or a let
clause.
The order of the clauses in each query/subquery must follow the
FLWOR acronym: a for
clause (F), followed optionally by a let
clause (L), followed optionally by a where
clause (W), followed
optionally by an order by
clause (O), followed by a return
clause
(R). You should not put the clauses in a different order –
e.g., for
, followed by let
, followed by another for
, etc. BaseX
may allow you to do this, but it is never necessary to do so, and
such a query will often fail to run to completion in the Autograder.
Your queries should only use information provided in the problem itself. In addition, they should work for any XML database that follows the schema that we have specified.
When the results of a query include nested child elements, those child elements must be in the specified order with respect to each other. See the example results that are provided for each such problem.
You do not need to worry about indenting and line breaks in the results of your queries.
Make sure to read and follow the guidelines given above.
It is worth noting that our movie database was last updated in September of last year, so it doesn’t include information about the most recent movies.
Write a standalone XPath expression (not a FLWOR
expression) to find the names of all people in the database whose
name is Sam. The results of the query should be a collection of
name
elements with the names of these people.
Hints:
You should use the contains
function to perform
substring matching, as discussed in lecture.
Your results should not include people whose last name is Sam, or whose name includes Sam at the beginning of a longer name like Samuel or Samantha.
In Problem Set
1, you wrote a
SQL query to find information about Jodie Foster’s Oscars. Write
a FLWOR expression to solve the same problem. The results of the
query should be new elements of type foster_oscar
that include
three child elements: the name of the movie that she won the Oscar
for, followed by the award type, followed by the year in which the
Oscar was awarded. For example:
<foster_oscar> <name>Silence of the Lambs, The</name> <type>BEST-ACTRESS</type> <year>1992</year> </foster_oscar>
Hint: You will need to use curly braces and commas as part of your
return
clause. See the examples in the lecture notes for
models of how to transform the elements selected by a query
into new types of elements.
Note: Remember that you do not need to worry about indenting or line breaks in the results of the queries for this or any other problem.
In Problem Set 1, you
wrote a SQL query to find, for every year from 2010 until the
present, the average runtime of movies from that year. Write a
FLWOR expression to solve a similar problem, but we will also include
the longest and shortest runtimes for each year. The results of the
query should be new elements of type runtime_info
that include:
a nested child element for the year; see below for more details about this element
a new child element of type average
that has as its
value the average runtime of movies in the database from that
year
a new child element of type longest
that has as its
value the longest runtime of movies from that year
a new child element of type shortest
that has as its
value the shortest runtime of movies from that year.
For example:
<runtime_info> <year>2013</year> <average>121.5</average> <longest>161</longest> <shortest>91</shortest> </runtime_info>
Order the results by year, from 2010 to the present.
Hints:
To ensure that you only consider a given year once, you should
use the distinct-values
function. For example, to iterate
over distinct Oscar types, you could do the following:
for $t in distinct-values(//oscar/type)
distinct-values
gives you the text values of the
corresponding elements, not the elements themselves. As a
result, your return
clause will need to construct new year
elements by adding back in the begin and end tags that were
removed by distinct-values
.
You will need to use the built-in avg()
, min()
and max()
functions. See the lecture notes for an example of a query that
uses avg()
.
In Problem Set 1,
you wrote a SQL query to find all directors who have directed at
least four of the 200 top-grossing films. For this problem, you
should write a FLWOR expression to solve the same problem. The
results of the query should be new elements of type top_director
that include the following child elements:
the name
element of the director
a new child element called num_top_grossers
that has as its
value the number of top-grossing films that the person directed
a new child element called movie
that has as its text value
the name of the associated movie
for each top-grossing movie that the person has directed, a
new child element called movie
that has as its text value
the name and earnings rank of the movie, separated by a hyphen (-
).
For example:
<top_director> <name>Anthony Russo</name> <num_top_grossers>4</num_top_grossers> <movie>Avengers: Endgame - 2</movie> <movie>Avengers: Infinity War - 8</movie> <movie>Captain America: Civil War - 40</movie> <movie>Captain America: The Winter Soldier - 134</movie> </top_director>
Order the results by the name of the director.
Hints:
You will need to use the string()
function to obtain the
values of some elements, without their begin and end
tags.
Your return
clause will need to include the string literal
"-"
. Note that you should not need to include any
spaces in this string literal, because XQuery will automatically
add a space between string values that are separated by a comma
in the return
clause.
You will need to use the built-in count()
function. See the
lecture notes for examples of queries that use it.
You will need to use a subquery (i.e., a nested FLWOR expression).
In Problem Set 1, you
wrote a SQL query to summarize the Oscars won by biography movies
– i.e., movies whose genre attribute includes the letter
'B'
somewhere in its value. Write a FLWOR expression
to solve the same problem. The results of the query should be new
elements of type biopic
that include the following child elements:
name
element of the movieyear
element of the movieaward
that has as its text value the type of the Oscar.For example:
<biopic> <name>Oppenheimer</name> <year>2023</year> </biopic> <biopic> <name>Spotlight</name> <year>2015</year> <award>BEST-PICTURE</award> </biopic>
Order the results by the name of the movie.
Hints:
To handle the fact that not all biopics have won an Oscar, we
recommend that you use a subquery to produce the award
elements (if any) for a given biopic. That way, if there are
one or more Oscar associated with the movie, the subquery will
produce the necessary award
elements, and if there aren’t
any Oscars associated with the movie, the subquery will simply
produce no results.
Here again, you will need to use the string()
function to
obtain the values of some elements, without their begin and
end tags.
Login to Gradescope by clicking the link in the left-hand navigation bar, and click on the box for CS 460.
You should submit only the following two files:
XMLforPeople.java
ps2_queries.py
Here are the steps:
Click on PS 2: Part II in the list of assignments. You should see a pop-up window with a box labeled DRAG & DROP. (If you don’t see it, click the Submit or Resubmit button at the bottom of the page.)
Add your files to the box labeled DRAG & DROP. You can either drag and drop the files from their folder into the box, or you can click on the box itself and browse for the files.
Click the Upload button.
You should see a box saying that your submission was successful.
Click the (x)
button to close that box.
The Autograder will perform some tests on your files. Once it is done, check the results to ensure that the tests were passed. If one or more of the tests did not pass, the name of that test will be in red, and there should be a message describing the failure. Based on those messages, make any necessary changes. Feel free to ask a staff member for help.
Notes:
You will not see a complete Autograder score when you submit. That is because additional tests for at least some of the problems will be run later, after the final deadline for the submission has passed. For such problems, it is important to realize that passing all of the initial tests does not necessarily mean that you will ultimately get full credit on the problem. You should always run your own tests to convince yourself that the logic of your solutions is correct.
For Problem 3, you should see some preliminary results. If you’re not seeing any results at all for that problem, it may mean that you’re using features of Java that are not supported by the version of Java that we’re using in the Autograder. In particular, you should NOT:
add any new import statements to the top of your file
use methods from built-in classes like String
that were
added in versions of Java after version 8.
You can determine whether a method is supported by Java 8
by checking if is found in the
API docs.
For Problem 4:
You should see test results for each query. If you don’t see any results for a given query, it probably means that one or more of the following is true:
You have a syntax or logic error in your query.
You have a nested FLWOR expression in a for
clause
or a let
clause. As stated in the guidelines, you are
not allowed to do that on this assignment.
The order of the clauses in your query/subquery do not follow the order specified by the FLWOR acronym. BaseX may allow you to do this, but it is never necessary to do so, and such a query will often fail to run to completion in the Autograder. See the guidelines for more details.
You should keep making changes as needed until you get full credit for a given query. There will be no partial credit awarded for an incorrect query.
Make sure that each query is logically correct, and that it will work for any instance of the movie database that follows the same schema. We reserve the right to ultimately run your queries on a slightly different version of the database to ensure that your queries are logically correct.
If needed, use the Resubmit button at the bottom of the page to resubmit your work. Important: Every time that you make a submission, you should submit all of the files for that Gradescope assignment, even if some of them have not changed since your last submission.
Near the top of the page, click on the box labeled Code. Then click on the name of each file to view its contents. Check to make sure that the files contain the code that you want us to grade.
Important
It is your responsibility to ensure that the correct version of a file is on Gradescope before the final deadline. We will not accept any file after the submission window for a given assignment has closed, so please check your submission carefully using the steps outlined above.
If you are unable to access Gradescope and there is enough
time to do so, wait an hour or two and then try again. If you
are unable to submit and it is close to the deadline, email
your homework before the deadline to
cs460-staff@cs.bu.edu
Last updated on March 1, 2024.