We will begin by taking a few minutes to complete evaluations for the lab component of the course.
Your evaluations are anonymous, and we will not receive the results until after all final grades have been submitted.
Comments in the text fields are valued and encouraged. Please try to answer all questions, but if a question is not applicable or you do not wish to answer it, you can simply skip it.
Here is the URL that you should use: https://bu.bluera.com/bu/
Enter your BU login name and Kerberos password, and complete the evaluation for your CS 460 lab section (the one that is not A1). Please do not evaluate the lectures at this time.
When you are done with the survey, please close the separate browser tab.
Thanks in advance for taking the time to provide your feedback about the labs!
Based on the log shown below, answer the following questions.
LSN | transaction | action | item | old value | new value |
---|---|---|---|---|---|
00 | T1 | begin | |||
10 | T1 | update | D1 | 100 | 70 |
20 | T2 | begin | |||
30 | T1 | commit | |||
40 | T2 | update | D1 | 70 | 400 |
60 | T2 | update | D3 | “hello” | “howdy” |
70 | T3 | begin | |||
80 | T3 | update | D2 | 15 | 70 |
90 | T2 | commit | |||
100 | T3 | update | D1 | 400 | 55 |
crash |
Recall that we discussed three types of logging: undo-redo, undo-only and redo-only.
How can we can determine from looking at the log that the system is performing undo-redo logging?
The update log records include both the old and the new values of the data item being updated.
At the start of recovery, what are the possible on-disk values of each data item?
In undo-redo logging, there is no guarantee about when an updated database page will go to disk unless there is a checkpoint. As a result, any of the values in the log might be on disk at the start of recovery.
D1: 100 (its original value), 70 (from T1’s update), 400 (from T2’s update) or 55 (from T3’s update)
D2: 15 (its original value) or 70 (from T3’s update)
D3: “hello” (its original value) or “howdy” (from T2’s update)
If the system were using undo-only logging instead, what are the possible on-disk values of each data item at the start of recovery? You should assume that each data item is on a different page of the database file.
In undo-only logging, the system needs to ensure that no updates will need to be redone during recovery. As a result, when a transaction commits, any pages that it has modified must be forced to disk.
Before a transaction commits, its updates may or may not make it to disk, depending on whether the relevant pages are evicted from the cache.
If we build a table for in-memory and possible on-disk values like the one used in lecture, we start by filling in the original on-disk values:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 100 | |
D2 | 15 | |
D3 | “hello” |
Tracing through the actions in the log in the order in which they occurred, the first update is T1’s update of D1, which creates an in-memory version of D1 and a new possible on-disk value:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 70 | 100, 70 |
D2 | 15 | |
D3 | “hello” |
When T1 then commits, the updated in-memory value of D1 is forced to disk, which means that 100 is no longer possible on disk:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 70 | |
D2 | 15 | |
D3 | “hello” |
T2 then updates D1 and D3, and T3 updates D2:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | ||
D2 | 70 | 15, 70 |
D3 | “howdy” | “hello”, “howdy” |
When T2 commits, the items it modified (D1 and D3) are forced to disk, which means those values are guaranteed to be on disk:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | ||
D2 | 70 | 15, 70 |
D3 | “howdy” |
Finally, after T3’s update of D1, we have:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | ||
D2 | 70 | 15, 70 |
D3 | “howdy” |
Thus, the possible on-disk values at the start of recovery are:
D1: 400 or 55
D2: 15 or 70
D3: “howdy”
If the system were using redo-only logging instead, what are the possible on-disk values of each data item at the start of recovery? You should assume that each data item is on a different page of the database file.
In redo-only logging, the system needs to ensure that no updates will need to be undone during recovery. As a result, it “pins” a modified database page in memory and doesn’t let it go to disk until the transaction that made the change commits.
We again start with the original on-disk values:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 100 | |
D2 | 15 | |
D3 | “hello” |
When T1 updates D1, we get an in-memory version of D1, but because it is pinned in memory, that value can’t be on disk at first:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 70 | 100 |
D2 | 15 | |
D3 | “hello” |
When T1 then commits, the updated in-memory value of D1 is unpinned, which means it can go to memory at any point after the commit. However, it is not forced to disk, so 100 is still possible:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 70 | 100, 70 |
D2 | 15 | |
D3 | “hello” |
After T2 updates D1 and D3, and T3 updates D2, we have new in-memory values, but they can’t be on disk yet because they are pinned until the transactions commit:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 100, 70 | |
D2 | 70 | 15 |
D3 | “howdy” | “hello” |
When T2 commits, the items it modified (D1 and D3) are unpinned and allowed to go to disk:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 100, 70, 400 | |
D2 | 70 | 15 |
D3 | “howdy” | “hello”, “howdy” |
After T3’s update of D1, we have a new in-memory value, but it can’t be on disk because T3 hasn’t committed:
item | in-memory value | possible on-disk values |
---|---|---|
D1 | 100, 70, 400 | |
D2 | 70 | 15 |
D3 | “howdy” | “hello”, “howdy” |
And thus the possible on-disk values at the start of recovery are:
D1: 100, 70 or 400
D2: 15
D3: “hello” or “howdy”
Consider again the log shown below:
LSN | transaction | action | item | old value | new value |
---|---|---|---|---|---|
00 | T1 | begin | |||
10 | T1 | update | D1 | 100 | 70 |
20 | T2 | begin | |||
30 | T1 | commit | |||
40 | T2 | update | D1 | 70 | 400 |
60 | T2 | update | D3 | “hello” | “howdy” |
70 | T3 | begin | |||
80 | T3 | update | D2 | 15 | 70 |
90 | T2 | commit | |||
100 | T3 | update | D1 | 400 | 55 |
crash |
During recovery, what steps would be taken during the first pass through the log if the system is using undo-redo logging?
The first pass is the backwards pass, which starts with the last log record and works backward.
Updates by uncommitted transactions are undone, and committed transactions are added to the commit list. Other log records are skipped.
Here is a summary of what happens:
LSN 100: T3 is not on the commit list. Undo the update, writing a value of 400 for D1.
LSN 90: Add T2 to the commit list.
LSN 80: T3 is not on the commit list. Undo the update, writing a value of 15 for D2.
LSN 70: skip
LSN 60: T2 is on the commit list; skip.
LSN 40: T2 is on the commit list; skip.
LSN 30: Add T1 to the commit list.
LSN 20: skip
LSN 10: T1 is on the commit list; skip.
LSN 00: skip
What steps would be taken during the second pass through the log?
The second pass is the forwards pass, which starts with the first log record and works forward.
Updates by committed transactions are redone, and other log records are skipped.
Here is a summary of what happens:
LSN 00: skip
LSN 10: T1 is on the commit list.
Redo the update, writing a value of 70 for D1.
LSN 20: skip
LSN 30: skip
LSN 40: T2 is on the commit list.
Redo the update, writing a value of 400 for D2.
LSN 60: T2 is on the commit list.
Redo the update, writing a value of “howdy” for D3.
LSN 70: skip
LSN 80: T3 is not on the commit list; skip.
LSN 90: skip
LSN 100: T3 is not on the commit list; skip.
How would the log and the recovery process be different under undo-only logging?
With undo-only logging, we only store the information needed to undo operations (the old values). Our log would not need the “new value” column. During recovery, we would only need to perform the backward pass.
How would the log and the recovery process be different under redo-only logging?
With redo-only logging, we only store the information needed to redo operations (the new values). Our log would not need the “old value” column. During recovery, the backward pass would only build the commit list, and then the forward pass would be performed as usual.
Now assume that the system is performing logical logging, which means:
each on-disk data item must have a corresponding datum LSN, which is the LSN of the log record that produced the value
each update log record must include the old log sequence number (the olsn), as shown below. The olsn is the LSN of the log record that produced the data item’s previous value.
LSN | transaction | action | item | old value | new value | olsn |
---|---|---|---|---|---|---|
00 | T1 | begin | ||||
10 | T1 | update | D1 | 100 | 70 | 0 |
20 | T2 | begin | ||||
30 | T1 | commit | ||||
40 | T2 | update | D1 | 70 | 400 | 10 |
60 | T2 | update | D3 | “hello” | “howdy” | 0 |
70 | T3 | begin | ||||
80 | T3 | update | D2 | 15 | 70 | 0 |
90 | T2 | commit | ||||
100 | T3 | update | D1 | 400 | 55 | 40 |
crash |
What other changes to the log would typically be made under logical logging?
Instead of storing the old and new values of the data items, we would store a logical description of how to get the new value from the old value (e.g., log record 10 could be stored as “decrement D1 by 30”).
What is the purpose of storing the olsn and datum LSN values?
It allows us to only redo and undo updates that are necessary based on the values that made it to disk before the crash.
This is especially important under true logical logging, because logical updates may not be idempotent – i.e., doing them multiple times may not lead to the same final value – and thus it is important to only undo changes that actually made it to disk and to only redo changes that didn’t make it to disk.
At the start of recovery, assume that we have the following on-disk datum LSNs:
Trace through the steps taken during the backward and forward passes for the log above.
100: T3 is not on the commit list
D1’s datum LSN = 10
LSN of log record = 100
Because these LSNs are not equal, we don’t need to
undo this change because it didn’t make it to disk.
90: add T2 to the commit list
80: T3 is not on the commit list
D2’s datum LSN = 80
LSN of log record = 80
These LSNs are equal, so we need to undo the change.
This gives D2 a value of 15 and a datum LSN of 0.
70: skip
60: T2 is on the commit list, so we skip this record for now
40: T2 is on the commit list, so we skip this record for now
30: add T1 to the commit list
20: skip
10: T1 is on the commit list, so we skip this record for now
00: skip
Forward pass
00: skip
10: T1 is on the commit list
D1’s datum LSN = 10
OLSN in log record = 0
datum LSN != OLSN, so no need to redo, because this change is
already on disk (or was overwritten by a later change)
20: skip
30: skip
40: T2 is on the commit list
D1’s datum LSN = 10
OLSN in log record = 10
datum LSN == OLSN, so we need to redo the change.
This gives D1 a value of 400 and a datum LSN of 40.
60: T2 is on the commit list
D3’s datum LSN = 60
OLSN in log record = 0
datum LSN != OLSN, so no need to redo, because this change is
already on disk (or was overwritten by a later change)
70: skip
80: T3 is not on the commit list; skip
90: skip
100: T3 is not on the commit list; skip
Now let’s assume that a dynamic checkpoint occurred between log records 40 and 60:
LSN | transaction | action | item | old value | new value |
---|---|---|---|---|---|
00 | T1 | begin | |||
10 | T1 | update | D1 | 100 | 70 |
20 | T2 | begin | |||
30 | T1 | commit | |||
40 | T2 | update | D1 | 70 | 400 |
50 | checkpoint | ||||
60 | T2 | update | D3 | “hello” | “howdy” |
70 | T3 | begin | |||
80 | T3 | update | D2 | 15 | 70 |
90 | T2 | commit | |||
100 | T3 | update | D1 | 400 | 55 |
crash |
What other information must be included in log record 50?
Given the checkpoint, do all of the log records shown need to be examined during the backward pass? During the forward pass? Why or why not?
Backward pass
For this log, the backward pass only needs to go back to log
record 50 — the one describing the checkpoint.
In general, we need to go back to the oldest begin record for any transaction that needs to be undone (i.e., a transaction that is listed as active in the checkpoint record but isn’t on the commit list). This allows us to undo any changes made by such transactions.
In this case, the only transaction listed in the checkpoint record is T2, and it’s on the commit list. Because any changes that T2 made before the checkpoint were forced to disk during the checkpoint, we don’t need to undo them, and thus there’s no need to go any further back.
Forward pass
Because a checkpoint forces all changed data items to disk, the
forward pass can always begin after the checkpoint. In this case,
this means that it can begin at log record 60.
Does the existence of the checkpoint change the possible on-disk values at the start of recovery under undo-redo logging?
Yes. Because the checkpoint forces all changed data items to disk, it is no longer possible for D1 to have a value of 100 or 70 at the start of recovery. Rather, it must either be 400 (the value forced to disk at the checkpoint) or 55 (the value written by T3 after the checkpoint).
Last updated on December 6, 2024.