Imagine that we are working with the Room
table from the university
database that we used in Lab 2 and Lab 3.
That table has the following schema:
Room(id CHAR(4), name VARCHAR(20), capacity INT)
Consider the following tuple from that table:
('0123', 'PHO 206', 199)
If we use the fixed-length record format discussed in lecture, what would the record look like for this tuple?
Note that we use a special delimiter (#
) to indicate the end of
the VARCHAR
value and a hyphen (-
) to indicate a “wasted” byte
(i.e., a byte that is part of the record but isn’t storing
useful data or metadata).
With the exception of the delimiter, no per-record metadata is
needed, because each field has the same length in every
record (including the VARCHAR
, which is given its maximum
length).
What is the length in bytes of this record if we assume that:
CHAR
and INT
fields
always have the same length, and fixed-length records give
VARCHAR
fields their maximum length.
len(id) + max_len(name) + len(capacity) = 4 + 20 + 4 = 28
Now assume that we are using the third type of variable-length record discussed in lecture, in which each record begins with a header of field offsets. What will the record look like for the above tuple? In addition to one-byte characters and four-byte integer data values, you should assume that we use two-byte integers for integer metadata like lengths and offsets.
Note that we include the offset of the end of the record, since we may need that value to compute the length of one of the fields.
What is the length in bytes of the record from part 3?
VARCHAR
value, but we need to include four per-record
metadata values (the offsets), each of which is 2 bytes.
len(id) + len('PHO 206') + len(capacity) + metadata = 4 + 7 + 4 + 4*2 = 23
Now imagine that the room didn’t have a name and we used a
value of NULL
to indicate this:
('0123', NULL, 199)
How would the answer to part 3 change?
We use a special offset of -1 for the NULL
value. It is
the second offset, because the NULL
value is the value of the
second field.
Consider the following diagram, which shows a portion of a B-tree of order 2:
We are only showing the keys, which we assume are integers. We are not showing the values associated with each key.
What would the tree look like after we insert each of the following?
Consider the following diagram, which shows a portion of a B+tree (note the + symbol!) of order 2:
Note that this tree is similar to the initial tree in Task 2, but some keys are missing, and other keys appear in two places.
Why does it make sense that we have repeated keys?
When a node is split, the middle key is sent up and inserted in the parent, but the corresponding key-value pair remains at the leaf level, in the new node that is created as part of the split.
What would the tree look like after we insert each of the following?
Things worth noting:
When we split a leaf node in a B+tree of order m, the middle item of the 2m + 1 items goes into the new node, and its key is sent up and inserted into the parent. That’s because full items (i.e., full key-value pairs) are only present in the leaf nodes of a B+tree, so we can only send up the key. The item itself must stay at the leaf level.
When we split an internal node, the middle key is only sent up. It does not go into the new node.
Another type of index structure used by a DBMS is an on-disk hash table. For the purposes of this exercise, assume that:
The hash table is being maintained using dynamic linear hashing.
The items have integer keys, and we are using the following hash function:
h(k) = k % 10
We grow the table whenever the number of items (f) becomes more than twice the number of buckets (n).
Here is the current state of the table:
0 [1000, 972] 1 [713]
Note: With only two buckets, we use the rightmost 1 bit of the hash code’s binary value to determine which bucket it belongs in. That’s because log2 2 = 1. More generally, for a table with n buckets, we use the rightmost i bits, where i is obtained by computing log2 n and rounding up if the result is not an integer.
What would the table look like after we insert each of the following?
an item (i.e., a key-value pair) whose key is 12
0 [1000, 972, 12] 1 [713]
an item whose key is 436
0 [1000, 972, 12, 436] 1 [713]
We now have f = 5 > 2*2, so we have to grow the table by 1 bucket. n += 1 yields n = 3.
Now we must test to see whether we need to add to i. i = log2 3, 1 < i < 2, so we round up to i = 2. Now we grow the table by a bucket to produce bucket 10, and add a leading zero to the current hash values.
00 [1000, 972, 12, 436] 01 [713] 10 []
To determine which bucket to split/rehash, we note that keys that would have belonged in bucket 01 if it had been there were put in bucket 00 instead, so we split bucket 00 to obtain:
00 [1000] (because 1000 hashes to 0000, or 00) 01 [713] (we don't need to look at the contents of this bucket at all) 10 [972, 12, 436] (972 and 12 hash to 0010; 436 hashes to 0110 or 10)
an item whose key is 113
00 [1000] 01 [713, 113] 10 [972, 12, 436]
an item whose key is 116
00 [1000] 01 [713, 113] 10 [972, 12, 436, 116]
We now have f = 7 > 2*3, so we have to grow the table by 1 bucket. n += 1 yields n = 4.
i = log2 4 = 2, which is unchanged, so we add bucket 11:
00 [1000] 01 [713, 113] 10 [972, 12, 436, 116] 11 []
To determine which bucket to split/rehash, we note that keys that would have belonged in bucket 11 if it had been there were put in bucket 01 instead, so we split bucket 01 to obtain:
00 [1000] (we don't need to look at the contents of this bucket at all) 01 [] (nothing remains after rehashing!) 10 [972, 12, 436, 116] (we don't need to look at the contents of this bucket at all) 11 [713, 113] (they both hash to 0011 or 11)
Notice that linear hashing doesn’t necessarily split most efficiently. We’re splitting a bucket with only 2 values, and we’re not even splitting it evenly. If we were limited to, say, 3 values per bucket, we would have to use an overflow bucket to store 116 because it is the fourth value with a hash of 10 (and so we have to “chain” on an overflow bucket):
00 [1000] 01 [] 10 [972, 12, 436]---[116] 11 [713, 113]
Note: You don’t have to worry about overflow buckets in the linear-hashing problem in the problem set!
Consider the Room
table from Task 1:
Room(id CHAR(4), name VARCHAR(20), capacity INT)
and assume that we are again inserting the following record into that table:
('0123', 'PHO 206', 199)
If we use the second type of variable-length record discussed in lecture – in which each field is preceded by its length – what would the record for the above tuple look like?
What is the length in bytes of this record? Use the same assumptions about the sizes of characters, integer data values, and integer metadata that we made in Task 1.
VARCHAR
value, but there are now three per-record
metadata values (the lengths), each of which is 2 bytes.
len(id) + len('PHO 206') + len(capacity) + metadata = 4 + 7 + 4 + 3*2 = 21
Insert the following sequence of keys into an initially empty B-tree of order 2:
51, 3, 10, 77, 20, 40, 34, 28, 61, 80, 68, 93, 90, 97, 14
------- | *51 | ------- ----------- | *3 | 51 | ----------- ----------------- | 3 | *10 | 51 | ----------------- ---------------------- | 3 | 10 | 51 | *77 | ---------------------- ------- | *20 | +-----+ / \ ----------- ----------- | 3 | 10 | | 51 | 77 | ----------- ----------- ------ | 20 | _+----+_ / \ ----------- ----------------- | 3 | 10 | | *40 | 51 | 77 | ----------- ----------------- ------ | 20 | _+----+_ / \ ----------- ---------------------- | 3 | 10 | | *34 | 40 | 51 | 77 | ----------- ---------------------- ----------- | 20 | 40 | __+----+----+___ / | \ ----------- ------------ ----------- | 3 | 10 | | *28 | 34 | | 51 | 77 | ----------- ------------ ----------- ----------- | 20 | 40 | __+----+----+___ / | \ ----------- ------------ ----------------- | 3 | 10 | | 28 | 34 | | 51 | *61 | 77 | ----------- ------------ ----------------- ----------- | 20 | 40 | __+----+----+__ / | \ ----------- ----------- ---------------------- | 3 | 10 | | 28 | 34 | | 51 | 61 | 77 | *80 | ----------- ----------- ---------------------- ----------------- | 20 | 40 | *68 | _+----+----+-----+_ ______/ ___/ \__ \____ / / \ \ ----------- ----------- ----------- ----------- | 3 | 10 | | 28 | 34 | | 51 | 61 | | 77 | 80 | ----------- ----------- ----------- ----------- ---------------- | 20 | 40 | 68 | _+----+----+----+_ ______/ ___/ \__ \_____ / / \ \ ----------- ----------- ----------- ----------------- | 3 | 10 | | 28 | 34 | | 51 | 61 | | 77 | 80 | *93 | ----------- ----------- ----------- ----------------- ---------------- | 20 | 40 | 68 | ________+----+----+----+ ______/ __________/ _/ \____ / / / \ ----------- ----------- ----------- ---------------------- | 3 | 10 | | 28 | 34 | | 51 | 61 | | 77 | 80 | *90 | 93 | ----------- ----------- ----------- ---------------------- -------------------- | 20 | 40 | 68 | 90 | ________+----+----+----+----+____ ______/ __________/ _/ \____ \_________ / / / \ \ ----------- ----------- ----------- ----------- ------------ | 3 | 10 | | 28 | 34 | | 51 | 61 | | 77 | 80 | | 93 | *97 | ----------- ----------- ----------- ----------- ------------ --------------------- | 20 | 40 | 68 | 90 | _+----+----+----+----+_________ _______/ / \__ \__________ \_____________ / / \ \ \ ---------------- ----------- ----------- ----------- ----------- | 3 | 10 | *14 | | 28 | 34 | | 51 | 61 | | 77 | 80 | | 93 | 97 | ---------------- ----------- ----------- ----------- -----------
Insert the same sequence of keys into an initially empty B+tree of order 2.
Before performing the insertions, do you expect the result to be the same or different? Why?
Perform the sequence of insertions and draw the B+tree as it is formed.
------- | *51 | ------- ----------- | *3 | 51 | ----------- ----------------- | 3 | *10 | 51 | ----------------- ---------------------- | 3 | 10 | 51 | *77 | ---------------------- ------- | *20 | +-----+ / \ ----------- ----------------- | 3 | 10 |------| *20 | 51 | 77 | ----------- ----------------- ------ | 20 | +----+ / \ ----------- ---------------------- | 3 | 10 |------| 20 | *40 | 51 | 77 | ----------- ---------------------- ----------- | 20 | 40 | _+----+----+_______ / \ \ ----------- ------------ ---------------- | 3 | 10 |---| 20 | *34 |----| 40 | 51 | 77 | ----------- ------------ ---------------- ----------- | 20 | 40 | _____+----+----+________ / | \ ----------- ----------------- ---------------- | 3 | 10 |---| 20 | *28 | 34 |----| 40 | 51 | 77 | ----------- ----------------- ---------------- ----------- | 20 | 40 | _____+----+----+________ / | \ ----------- ---------------- ---------------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 | *61 | 77 | ----------- ---------------- ---------------------- ---------------- | 20 | 40 | 61 | ____________+----+----+----+_________ / | \ \ ----------- ---------------- ----------- ----------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 77 | *80 | ----------- ---------------- ----------- ----------------- ---------------- | 20 | 40 | 61 | ____________+----+----+----+_________ / | \ \ ----------- ---------------- ----------- ---------------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | *68 | 77 | 80 | ----------- ---------------- ----------- ---------------------- --------------------- | 20 | 40 | 61 | 77 | ______+----+----+----+----+__________ ______/ | \ \_________ \_________ / / \ \ \ ----------- ---------------- ----------- ----------- ----------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 | *93 | ----------- ---------------- ----------- ----------- ----------------- --------------------- | 20 | 40 | 61 | 77 | ______+----+----+----+----+__________ ______/ | \ \_________ \_________ / / \ \ \ ----------- ---------------- ----------- ----------- ---------------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 | *90 | 93 | ----------- ---------------- ----------- ----------- ---------------------- ------ | 61 | _____+----+_____________ / \ ----------- ----------- | 20 | 40 | | 77 | 90 | ______+----+----+ +----+----+_____ ______/ | \ / \___ \______ / / \ / \ \ ----------- ---------------- ----------- ----------- ----------- ----------------- | 3 | 10 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | *97 | ----------- ---------------- ----------- ----------- ----------- ----------------- ------ | 61 | _____+----+_____________ / \ ----------- ----------- | 20 | 40 | | 77 | 90 | ______+----+----+ +----+----+_____ ______/ | \ / \___ \______ / / \ / \ \ ----------------- ---------------- ----------- ----------- ----------- ---------------- | 3 | 10 | *14 |---| 20 | 28 | 34 |----| 40 | 51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | 97 | ----------------- ---------------- ----------- ----------- ----------- ----------------
How would you perform a range search on this tree for all keys in the range (25, 60)? Why is such a query easier in a B+tree than it is in a standard B-tree?
This type of range-search query is easier in a B+tree because we can take advantage of the links connecting the leaf nodes. In standard B-tree, we would need to go back up the tree in order to find the next leaf node.
|| \/ -OO--- O 61 | OOOOO+----+_____________ O \ -----OO---- ----------- | 20 O 40 | | 77 | 90 | ______+----O----+ +----+----+_____ ______/ O \ / \___ \______ / O \ / \ \ ----------------- -----============= ============= ----------- ----------- ---------------- | 3 | 10 | 14 |---| 20 | *28 | *34 |==| *40 | *51 |---| 61 | 68 |---| 77 | 80 |---| 90 | 93 | 97 | ----------------- -----============= ============= ----------- ----------- ----------------
Last updated on October 3, 2024.