Problem Set 3 FAQ

Part II: All problems
Problem 5
Problem 7
Problem 8

Part II: All problems

I’m able to compile the starter code using the specified instructions, but I’m not able to run it. Any suggestions?

Try the following:
1. Uninstall your current installation of VS Code along with any programs that include the words “Java” or “JDK” in their names.
2. Reinstall VS Code using the Alternative method described in Lab 0.
If this doesn’t work, you can use VS Code on the virtual desktop.
I’m able to compile and test my code on my own machine, but when I upload it to Gradescope, the Autograder is saying that the tests cannot be run. Do you have any suggestions?

Make sure that you haven’t added any import statements to the top of any of your files. You should only be using the code that we have provided, the classes from Berkeley DB, and classes like ArrayList that are available in the java.util package. You should NOT be using classes from any other Java package.

In addition, make sure that you are only using Java features that were present in Java 8.
Can you review the big picture of how we will be using Berkeley DB databases for this assignment?

Each table is stored in its own Berkeley DB database (i.e., its own B+tree of key/value pairs). Each row/tuple in a given table will be represented by a single key/value pair in that table’s B+tree. The key portion of the key/value pair will be the value of the primary-key attribute in that row. The value portion of the key/value pair will be a marshalled form of the remaining values in that row. See the lecture notes on the logical-to-physical mapping for more detail.
I’m not sure how to initialize and open the BDB environment.

You don’t have to! It’s already done for you in the code that we’ve given you. Please make sure that you fully understand the code we’ve given you so that you don’t do unnecessary work or write code that doesn’t work with our code.
Do we need to explicitly open the BDB database for a given table?

No! It’s already done for you by the open method/function that we have given you for Table objects. Your code should call that method as needed, and it will take care of opening the underlying BDB database.
Should we be closing tables after we are done with them?

No. You should leave tables open so that they will stay in the table cache for later use.
In the starter code that you have given us, the execute methods have some exceptions in their headers’ throws clause:
```
public void execute() throws DatabaseException, DeadlockException {
```
Can we add exceptions to this clause?

No. Because these methods are implementing the abstract execute method found in the superclass, the header must stay the same. Instead, you should follow the approach that we’ve shown you in the CreateStatement.execute() and DropStatement.execute() methods – putting code that could throw an exception in a throws block, and including appropriate error-handling code in the accompanying catch block.
My program crashed when I was testing it, and now I get an error message whenever I try to run it.

After running the program the first time, you should see a folder called db within your dbms folder. This is the home directory for the Berkeley DB environment, and it will be used to store the files that BDB creates for your database. If your program crashes for any reason, these files may be corrupted. As a result, we recommend that after a crash, you either remove all files from this db directory or simply delete the entire directory.

Note that you may also need to do this if you closed VS Code while you were in the middle of running the program – before entering q to quit the program.

If you end up needing to remove the files in the db folder, you will lose any databases that you created for testing. As a result, you will need to re-enter the CREATE TABLE and INSERT commands needed to build them.

Problem 5

Should the marshall() method add the key-value pair for the row represented by the InsertRow object to the B+tree (i.e., to the Berkeley DB database)?

No – your marshal method marshall() will not interact with Berkeley DB at all.

Rather, it should do the following:
1. Determine the correct offset values and store them in the array to which the offsets field refers.
2. Write the appropriate values into the buffers represented by the keyBuffer and valueBuffer fields.
See the Notes section in Problem 3 for more details on both of these tasks.

The code that you write in Problem 4 for the execute() method of the InsertStatement class will take care of creating the DatabaseEntry objects for the key/value pair and adding them to the Berkeley DB database.
How can I get information about a given column in the InsertRow object – e.g., the type of the column?

You should take the Table object stored in the field called table and:
- use its getColumn() method to get the Column object for the column in question
- use the appropriate method in the Column object to get the information that you need.
How many bytes do REAL values take up?

8 bytes, since we are using the Java type double for them, and double values are 8 bytes. Note that you can find the number of bytes needed for any fixed-length column by:
- using the Table object’s getColumn() method to get the Column object for the column in question
- using the getLength() method in the Column object.
I’m trying to test my marshall() method using the example in the assignment, but I get an error that says:

Movie: a table with this name already exists. Could not create table Movie.

What should I do?

You only need to enter the CREATE TABLE command for a given table once. After that, you should just enter INSERT commands to test your marshall() method.
In the assignment, you give us one example of marshalling and the output that the debugging print statement should produce. Could you give us another one?

Sure – but make sure that you also perform additional tests that you devise!

If you enter the following commands:
```
CREATE TABLE Student (name VARCHAR(64), id CHAR(5) PRIMARY KEY);
INSERT INTO Student VALUES ('Alan Turing', '12345');
```
you should see:
- for the offsets field:
  [6, -2, 17]
  
  Because there are two columns, there are three offsets:
  - The 6 is the offset of the value of the first column (name).
  - The -2 indicates that the second column (id) is the primary key.
  - The 17 is the offset of the end of the record, which is 6 + 11 = 17 in this particular row.
- for the key buffer (i.e., the keyBuffer field):
  [49, 50, 51, 52, 53]
  
  The numbers in this byte array represent the ASCII codes for the characters in the id value '12345': 49 for the character '1', 50 for the character '2', etc.
- for the value buffer (i.e., the valueBuffer field):
  [0, 6, -1, -2, 0, 17, 65, 108, 97, 110, 32, 84, 117, 114, 105, 110, 103]
  
  This byte array begins with 6 bytes for the offset table: [0, 6] for the first offset, [-1, -2] for the special -2 offset, and [0, 17] for the offset of the end of the record.
  
  The remaining 11 bytes represent the ASCII codes for 'Alan Turing': 65 for 'A', 108 for 'l', etc.
I know that my marshall() method should start by figuring out the offset values and storing them in the offsets array. However, I’m having trouble writing the loop that figures out the offsets in all possible cases. Do you have any suggestions?

One thing that can help is to have an accumulator variable that keeps track of the current offset – i.e., how many bytes you are from the start of the record.

The initial value for this accumulator will be based on the number of offsets, since they take up the beginning of the record.

As your loop for computing offsets executes, you will update this accumulator as needed – and use it as needed for the values in the offsets array. Note that we say “as needed” because not every column causes the current offset to increase because of special cases like null values.

Tracing through concrete cases on paper – and figuring out how the value of the accumulator and the values in offsets array should change over time for those concrete cases – should allow you to figure out the necessary logic.
I’m trying to debug my marshall() method by creating a table and inserting rows into it, but I’m getting an error saying that it cannot insert the row.

This means that an exception is being thrown somewhere in the execute() method for InsertStatement – and that the exception is being caught by the catch block that we provided at the end of that method.

It’s possible that the exception is being thrown by your marshall() method, or it may be getting thrown by one of the statements in execute() itself.

You might try adding temporary print() statements at various points in marshall() to see how much progress it makes and whether it fully completes. That would allow you to rule out the possibility that the the exception is being thrown by marshall().

Sometimes you can get an exception if BDB ends up in a bad state for some reason – e.g., because you did not quit the program cleanly by using q. In such cases, you should remove the db subdirectory, rerun the program, and recreate the table(s) that you are using for testing. See the last question in the previous section (Part II: All questions) for more details.
In InsertRow, I’ve included one or more helper methods that are called by marshall. However, when I try to compile my code, I get an error message that says something like “Unhandled exception type IOException”. What am I doing wrong?

If you write a helper method that uses one or more of the RowOutput methods, you must include a throws clause in the header of the method like the one we’ve given you for the marshall method:

public void marshall() throws IOException {

Problem 7

I don’t understand how the WHERE-clause evaluation code works. How can it access the values in the rows?

The WHERE clause is represented by a tree structure that is constructed by the parser. There’s more info about it in part 2 of the DBMS code overview that accompanies the assignment.

The WHERE-clause evaluation code is able to access the values in the rows because the TableIterator constructor goes through all of the columns in the WHERE clause and connects each column from the corresponding table to the table iterator.

Then, when the WHERE clause is evaluated, the getValue() method is invoked to get the value of each column, and this method in turn invokes the table iterator’s getColumnVal() method. As a result, all you need to do for WHERE-clause evaluation is to implement this getColumnVal() method.
Where does the unmarshalled data actually go? Do we need to store it somewhere?

No. At any point in time, the DBMS only has access to a single row from a given table. The execute() method that you write for SELECT commands will call the printAll() method in the TableIterator object for the table, and printAll() will use the TableIterator‘s other methods (including the method that you are writing) to access the appropriate rows from the table one at a time, and to get the column values for the current row and print them.
With CHAR fields, the adjustValue() method in Column objects (which gets called in InsertStatement.execute()) pads values that are shorter than the specified width with spaces. Are we supposed to be able to get back the original width when we unmarshall the value?

No. The DBMS adds spaces as needed to bring CHAR values to their specified width, and you should unmarshall the value with the added spaces. When writing a query involving such values, you can use the LIKE operator (e.g., foo LIKE 'val%') if you don’t want to enter the extra spaces.
In the getColumnVal() method, how do we access the raw data that we need to unmarshall?

When implementing getColumnVal(), you can assume that the DatabaseEntry objects that are represented by the TableIterator‘s key and value fields hold the key/value pair for the row whose column value you need to unmarshall.

This works because the calls to cursor.getNext() that are made in the TableIterator‘s first() and next() methods take those DatabaseEntry objects as inputs, and cursor.getNext() uses them to return the next key/value pair.

Given the current key/value pair, your getColumnVal() method will need to use one or two RowInput objects to unmarshall the value of the specified column.

For example, to create a RowInput object that is based on the value portion of the current key/value pair, you would do something like the following:
```
RowInput valIn = new RowInput(this.value.getData());
```

Problem 8

When our execute() method creates the TableIterator object, what should we be passing in for the third parameter of the constructor – the one called evalWhere?

You should always pass in true for this parameter.
Do we need to worry about the possibility that the column names in the WHERE clause of the SELECT command (if any) may not be valid columns for the table mentioned in the command?

No. You may assume that the column names (if any) that are included in the command are valid.

However, you do need to handle the special cases mentioned in the problem.
I’m able to successfully perform at least some SELECT commands, but I’m getting an error message when quitting that says “database closed while still referenced by other handles.” How can I fix this?

Two things to check in the execute() method of SelectStatement:
1. Make sure that you use the local variable iter that we have given you at the beginning of the method for the TableIterator, rather than declaring your own.
2. Make sure that you don’t return early (using a return statement). It is crucial that you execute the code that we’ve given you at the bottom of that method, which closes the table iterator – and thus its underlying cursor.
When I try to perform a SELECT command, I’m not seeing any rows, and I’m getting an error message that says something about the buffer being null. What am I doing wrong?

Make sure that your getColumnVal() method is not creating its own DatabaseEntry objects for the key and the value. The TableIterator constructor already creates DatabaseEntry objects for the key and value and assigns those objects to appropriately named fields, and you must use those fields in your method.

Last updated on April 24, 2025.