Problem Set 2 FAQ

Problem 1
Problem 2
Problem 3
Problem 4

Problem 1

Does the integer metadata for a field count towards its allocated length?

Integer metadata values like field lengths and offsets are completely separate from the actual data. They don’t use up the bytes reserved for the field itself, but they do contribute to the length of the record.

For example, in lecture we looked at the way that the record ('1234567', 'comp sci', 200) would be represented using variable-length records with a header of offsets. In that example, we used 2-byte offsets but 4-byte integer values, so the length of the record ended up being 27 bytes:
- 8 bytes for the four offsets (4*2 = 8)
- 7 bytes for ‘1234567’
- 8 bytes for ‘comp sci’
- 4 bytes for the integer 200
When do we need to use a # delimiter in a fixed-length record?

We use a # as a delimiter at the end of a string value in a fixed-length record if:
- the string is in a variable-length field (e.g., a VARCHAR) and
- the string is shorter than the maximum length of the variable-length field.
We don’t need those delimiter characters in any other circumstance.

When we do include a # character, it uses up 1 byte of the field’s allocated number of bytes.

For example, let’s say that we have a VARCHAR(10) field as part of a fixed-length record.

If you needed to store the string 'abc' in that field, you would fill in 10 cells of the answer table using the string 'abc#------' (abc, followed by #, followed by 6 hyphens), and the overall length of that field would be 10.

If you needed to store the string '0123456789' in that field, you would fill in 10 cells of the answer table using the string '0123456789' with no # and no hyphens, since the value itself already uses up all 10 of the available bytes.

Problem 2

In 2.3, when showing the before-and-after pictures for each increase in the number of buckets, should we include the key that causes the increase?

Yes. The problem states that a split should occur whenever the total number of items is greater than (3 * number of buckets). This means that the split occurs after we go over the threshold, not before.

In the “before” table, you should include the key of the item that takes you over the threshold, and then the “after” table will also include that key.

For example, in the lecture notes, when we used the slightly different rule that a split occurs when the total number of items is greater than (2 * number of buckets), you’ll see for example that we have a “before” picture with 3 buckets and 7 items, and then an “after” picture with 4 buckets and the same 7 items.

Problem 3

I’m getting an error when I try to compile/run my code. It says something like “Exception in thread “main” java.sql.SQLException: No suitable driver found for jdbc:sqlite:movie.sqlite”. What can I do?

In VS Code, make sure that you have opened the correct folder – the one that actually contains the Java files. If you have done it correctly, when you look at the Explorer pane, the top-level folder should be your problem3 folder, and the Java files should all be in that folder. If necessary, try using File->Close Folder and then try reopening the correct folder.

If this doesn’t work, you can use VS Code on the virtual desktop.
I’m failing one of the Autograder tests, but my return value looks like it matches the expected return value. What should I do?

Double-check that you:
- have the correct number of spaces at the beginning of each line
- have exactly one (and not more than one!) newline character at the end of each line – and no spaces before the newline!
- have no extra spaces after each newline character.
When I test movieElemsFor or one of the other methods that calls it, I seem to be missing one of the movies that should be in the results. Why could that be?

If you are using JDBC to process a SQL query that can have multiple rows of results, you need to use a while loop. As mentioned in our overview, the structure of these loops looks something like this:
```
while (results.next()) {
    // code to process each row goes here
}
```
If your method is missing one row of the results, it’s possible that you are calling next() before the loop in order to determine whether the query has any results. You shouldn’t do that! Here’s why: If the query has one or more rows of results, the initial call to next() will position the ResultSet object on the first row of results, and then the call to next() in the header of while loop will advance the ResultSet to the second row of results. And because the code to process each row is inside the loop, you won’t actually process the first row!

In order to handle the case in which the query has no results, we recommend that you do the following:
- Before beginning the loop, initialize your result variable to the empty string.
- Inside the loop, concatenate the strings needed for the current row of results.
- If there are no results, the while loop condition will be false the very first time that it is evaluated, and thus the loop will perform 0 repetitions. In that case, your result variable will still have the empty string as its value at the end of the method – and that is the correct return value when there are no results!
When I test for an empty string, should I be using == or the equals() method?

Because strings are objects, you should use the equals method to test if two strings are equivalent to each other.

Problem 4

Note: In addition to the questions above, we strongly encourage you to consult the solutions to Lab 5, which included a number of query problems that are similar to the ones in this assignment.

For query 1, I’m not sure how to use contains to test for people whose name is Sam.

Here’s a hint: We tell you that you shouldn’t include people who last name is Sam, or whose name includes Sam at the beginning of a longer name like Samuel or Samantha. However, it would be okay if the results of your XPath expression included people whose middle name is Sam. What substring could you look for that in the name that would give you people whose first name or middle name is exactly Sam, but not people whose last name is Sam?
My query has a syntax error, but I can’t find it. Do you have any suggestions?

Here are some things to double-check:
- Variable names should begin with a dollar sign ($).
- You should have commas between different components of a for clause, let clause and return clause.
- Each condition in a where clause should a boolean expression – something that would give you true or false – and multiple conditions should be separated by either and or or.
- If you are defining new start and end tags in your return clause, make sure that your end tag includes a backslash. For example, the following is incorrect:
```
<location>{ string($b) }<location>
```
  Instead, it should be:
```
<location>{ string($b) }</location>
```
My query doesn’t have any syntax errors (BaseX says it is OK), but I’m not getting any results or I’m missing some of the results that I should be seeing. Do you know why that would be?

One thing to double-check is that all of your element and attribute names are correct. To remind yourself of what the elements and attributes look like in our database, you can consult our description of its schema.

For example, if you want to obtain all movie elements, the correct XPath expression is:
```
//movie
```
If you accidentally included an s at the end of the word (//movies), you wouldn’t produce an error, but you wouldn’t get any results!

In addition:
- Make sure that all attribute names are preceded by an @ symbol.
- If you use a predicate in an XPath expression, make sure it is in the correct position within the expression. For example, in Lab 5 we considered the following XPath expression:
```
//account[branch = $b]/balance
```
  The following incorrect version of this expression would not produce any results:
```
//account/balance[branch = $b]
```
- If you are testing for the presence of a single ID value within the value of an IDREFS attribute, make sure that you use the contains function and not the = operator.
- When using contains, make sure that the first input is the larger string and the second input is the substring that you are looking for.
See the next question for some other suggestions for debugging your query.
My query produces some results, but they’re not fully correct. Do you have any suggestions for how to debug my query?

One thing that can help is to try executing simpler versions of your query to see if the various components of the query are giving you what you expect them to.

For example, consider the following incorrect query, which is similar to one from Lab 5:
```
for $b in distinct-values(//branch)
let $balances := //account/balance[branch = $b]
return <branch>
       {
          <location>{ $b }</location>,
          <total_balance>{ sum($balances) }</total_balance>
       }         
       </branch>
```
You could start by running just the XPath expression used in the for clause:
```
distinct-values(//branch)
```
If you did so, you would see that it gives you the expected results: the names of all of the branches in the bank database from Lab 5.

Next, you could simplify the query as follows to see if the let clause is producing the correct results:
```
for $b in distinct-values(//branch)
let $balances := //account/balance[branch = $b]
return ($b, $balances)
```
If you ran this simpler query, you would only see the branch names, which indicates that $balances is always being assigned an empty set. This would help you to realize that you need to revise the XPath expression in the let clause.

Next, you could try a revised version of that XPath expression:
```
for $b in distinct-values(//branch)
let $balances := //account[branch = $b]/balance
return ($b, $balances)
```
This time, you would see the correct balance elements for each branch, so you would know that the let clause is functioning correctly.

Finally, you could restore the original return clause and confirm that it works as well.
I know we need to use curly braces in some of our return clauses, but I’m not sure when we need them.

You only need them if you are creating new types of elements. For example, consider the following query from Lab 5:
```
for $c in //customer
let $c_accounts := //account[contains(@owners, $c/@customer_num)]
return <customer>
       {
          $c/name,
          for $a in $c_accounts
          return <account>
                 { string($a/branch), "-", string($a/balance) }
                 </account>
       }         
       </customer>
```
We’re creating new elements of type customer, so we need to specify the start and end tags of those elements, and we also need curly braces inside of those tags so that XQuery will evaluate the expressions needed to produce the contents of the new elements.

In addition, we are creating new elements of type account, so we need to specify their start and end tags and have curly braces for their contents as well.

However, we are using the existing name child element of the customer element assigned to $c – without changing its existing tags – so we can simply say $c/name, without specifying start and end tags and without an additional set of curly braces.
On query 4, I’m not sure what components I need to include in my FLWOR expression. Can you point me to a similar problem that I can use as a model?

Yes. In Lab 5, query problem 4 is somewhat similar to query 4 in the assignment. In the original version of that lab problem, we wanted to produce new elements for every customer in the database. In the solutions, we also show how we could modify the query so that it only produces a new element for customers that have at least one account. That modified query is a good starting point for what you need to do in query 4 of the assignment.

Last updated on March 5, 2024.