Imagine that a bank is using a relational database for data about its customers and their accounts.
Customer(number CHAR(8) PRIMARY KEY, name VARCHAR(20), address VARCHAR(30)); Account(number CHAR(10) PRIMARY KEY, branch VARCHAR(20), balance FLOAT); Owns(account_number CHAR(10), customer_number CHAR(8), PRIMARY KEY (account_number, customer_number), FOREIGN KEY (account_number) REFERENCES Account, FOREIGN KEY (customer_number) REFERENCES Customer );
What is the purpose of the Owns
table? Would it be possible to eliminate
this table and include its data in one of the other two tables?
Owns
is to capture relationships between
customers and the accounts that they own.
If we assume that these “owns” relationships are many-to-many, it
would not be possible to eliminate the Owns
table. That’s
because the relational model doesn’t allow for multi-valued
attributes. As a result:
you can’t capture the “owns” relationships in Customer
, because
a customer can have more than one account.
you can’t capture these relationships in Account
, because
an account can be owned by more than one customer.
We could also represent the bank’s data in XML, which would make it easier to exchange data among applications and to integrate it with data from other sources. (There would also be some disadvantages, including the fact that storing data in XML is typically less efficient – both in the amount of storage used and in the speed with which queries can be executed.)
You can see a small instance of the XML version of this database here:
sample bank database in XML
Note that it includes elements for each type of entity in the database:
account
elements for the data found in the Account
table
customer
elements for the data found in the Customer
table.
It also uses nested child elements for some of the information
about accounts (e.g., balance
) and customers (e.g., name
and
address
).
How are the primary keys of Customer
and Account
captured?
ID
in the start tags of the elements:
account_num
in account
elements
customer_num
in customer
elements.
If we tell the XML DBMS that these attributes are of type ID
, it
will ensure that the values assigned to these attributes are
unique.
The XML version of the database does not include separate
elements for the relationships in the Owns
table. How are those
relationships captured?
Owns
are cpatured using attributes of
type IDREFS
in the start tags of the elements:
the owners
attribute in each account
element, which stores
the customer numbers of all of the account’s owners
the owns
attribute in each customer
element, which stores
the account numbers of all their accounts.
Recall that:
attributes of type IDREF
must have a single value that
comes from an ID
attribute elsewhere in the document
attributes of type IDREFS
must consist of a list of ID
values
that appear elsewhere in the document.
These types of attributes function like foreign keys in a relational database.
Note that XML databases are able to capture many-to-many relationships without needing a separate type of element for the relationships themselves!
For the rest of this lab, we’ll write queries for the XML version of the bank database discussed above.
Testing your queries
You can test the queries that you write by taking the following steps:
Download the following XML file to a known directory:
bank.xml
Note: Depending on your browser, you may need to right-click the link and click Save Link As... (or similar).
Open the file in BaseX. (For details about how to use BaseX, see the last problem of Problem Set 2. Remember that if you can’t get BaseX to work on your own machine, you can use it on the virtual desktop.)
Test your queries by entering them in the Editor window in BaseX.
Recall that XPath models an XML document as a tree in which the nodes represent elements and attributes, and that it allows us to access collections of nodes from the tree.
Write an XPath expression to obtain the account numbers of all accounts with a balance that is greater than 400.
/bank/account[balance > 400]/@account_num
or
//account[balance > 400]/@account_num
or
//account/@account_num[../balance > 400]
Notes:
An XPath expression that begins with /
starts at
the root element of the document, whereas an expression that
begins with //
will find elements wherever they occur in the
document.
We use an @
symbol before attribute names.
Predicates are surrounded by square brackets, and they are applied to the elements or attributes that are selected by the path expression that precedes them.
The ..
operator takes up one level in the document tree.
This allows us to access “siblings” of the components selected
by the preceding path expression. In this case, ../balance
gives us balance
elements that are children of the same
elements as the selected account_num
attributes.
Write an XQuery FLWOR expression to obtain the same account numbers as the XPath expression from the previous problem.
for $a in //account where $a/balance > 400 return $a/@account_num
or
for $a in //account[balance > 400] return $a/@account_num
Important notes:
The order of the clauses in each query/subquery must
follow the FLWOR acronym: a for
clause (F), followed
optionally by a let
clause (L), followed optionally by a
where
clause (W), followed optionally by an order by
clause (O), followed by a return
clause (R). You should
not put the clauses in a different order – e.g., for
,
followed by let
, followed by another for
, etc. BaseX may
allow you to do this, but it is never necessary to do so, and
such a query will often fail to run to completion in the
Autograder.
In your work for this class, the only place that you may use a
subquery (i.e., a nested FLWOR expression) is in the return
clause of an outer FLWOR expression. You should NOT have a
nested FLWOR expression in a for
clause or a let
clause,
because such a query will often fail to run to completion in the
Autograder.
Write an XQuery FLWOR expression to produce the join of the
account
and customer
elements. The result of the query should
consist of new elements of type cust_acct
, each of which contains a
nested customer
element followed by an associated nested
account
element. Order the results by customer number.
for $a in //account, $c in //customer where contains($a/@owners, $c/@customer_num) order by $c/@customer_num return <cust_acct>{ $c, $a }</cust_acct>
or
for $a in //account, $c in //customer where contains($c/@owns, $a/@account_num) order by $c/@customer_num return <cust_acct>{ $c, $a }</cust_acct>
or
for $a in //account, $c in //customer[contains(@owns, $a/@account_num) order by $c/@customer_num return <cust_acct>{ $c, $a }</cust_acct>
or
for $c in //customer, $a in //account[contains(@owners, $c/@customer_num)] order by $c/@customer_num return <cust_acct>{ $c, $a }</cust_acct>
Notes:
We use the contains
function to determine if the current
account $a
is owned by the current customer $c
. This will
the case:
if the value of $c
‘s customer_num
attribute is
contained in the list of values assigned to $a
‘s owners
attribute
or, equivalently,
if the value of $a
‘s account_num
attribute is
contained in the list of values assigned to $c
‘s owns
attribute.
Because we are creating new elements, we need to use curly braces around the contents of the new start and end tags. Doing so tells XQuery to evaluate what is inside of the curly braces, rather than treating it as literal text.
Write an XQuery FLWOR expression to create new elements
of type customer
that include:
name
child element for the name of the customeraccount
elements should have as
their value the location of the account’s branch, followed
by a hyphen, followed by the account’s balance.For example:
<customer> <name>Jose Delgado</name> <account>Burlington, MA - 7000</account> <account>Burlington, MA - 300</account> </customer>
for $c in //customer return <customer> { $c/name, for $a in //account where contains($a/@owners, $c/@customer_num) return <account> { string($a/branch), "-", string($a/balance) } </account> } </customer>
or
for $c in //customer let $c_accounts := //account[contains(@owners, $c/@customer_num)] return <customer> { $c/name, for $a in $c_accounts return <account> { string($a/branch), "-", string($a/balance) } </account> } </customer>
Notes:
We need a subquery so that we can group together all of a
customer’s accounts into a single new customer
element.
This is in contrast to the prior query, in which customers
with multiple accounts ended up with multiple <cust_acct>
elements.
We use the built-in string()
function to remove the tags for
the branch
and balance
elements. This allows us to use just
the values of those elements as the basis of the new account
elements. This type of transformation can be useful when
exchanging data or integrating data from heterogeneous sources.
We don’t to include any spaces around the hyphen, because XQuery will automatically add spaces between string values that are separated by a comma.
If a customer has no accounts, they will still be included in the results. Using a subquery as we’ve done above allows us to include “unmatched” entities – which is comparable to what a left outer join does in relational queries!
If you wanted to exclude customers with no accounts, you could modify the second version of the query as follows:
for $c in //customer let $c_accounts := //account[contains(@owners, $c/@customer_num)] where count($c_accounts) > 0 return <customer> { $c/name, for $a in $c_accounts return <account> { string($a/branch), "-", string($a/balance) } </account> } </customer>
Note that conditions that involve aggregate functions go in the
where
clause. There isn’t a separate having
clause for these
conditions like there is in SQL.
Write an XQuery FLWOR expression to create new branch
elements
for each branch of the bank. These branch
elements should have
nested child elements for the location of the branch and the total
balance of all of the accounts at that branch. For example:
<branch> <location>Burlington, MA</location> <total_balance>8300</total_balance> </branch>
Hint: There is an aggregate function called sum()
that can be
used to add up a set of numeric element or attribute values.
for $b in distinct-values(//branch) let $balances := //account[branch = $b]/balance return <branch> { <location>{ $b }</location>, <total_balance>{ sum($balances) }</total_balance> } </branch>
Notes:
We use distinct-values()
to avoid considering a given branch
more than once. Note that distinct-values()
gives us the
distinct values of each branch
element. This means that
$b
will contain strings throughout the FLWOR expression.
If we weren’t using distinct-values()
here, we’d need to use
string()
to properly extract the value inside the branch
elements so that we could use it as the value of the new
location elements.
We need a let
clause so that we can assign the entire set of
balance
elements for a given branch to the variable
$balances
. This is what allows us to then use the aggregate
function add
to compute the sum of those balances.
If we had instead tried to obtain the balances in the for
clause, we would have assigned only one balance at a time to
the variable, and this wouldn’t have allowed us to compute
the total balance.
It isn’t necessary to explicitly extract the values of the
balance
elements. The sum
function takes care of
extracting the values for us.
Write an XQuery FLWOR expression to create elements of type
customer_summary
that include, for each customer, child elements
for the name of the customer, the address of the customer, and the
number of accounts that the customer owns. For example:
<customer_summary> <name>Jose Delgado</name> <address>Zero Longhorn Ave., Belmont, MA</address> <num_accounts>2</num_accounts> </customer_summary>
for $c in //customer let $accounts := //account[contains(@owners, $c/@customer_num)] return <customer_summary> { $c/name, $c/address, <num_accounts>{ count($accounts) }</num_accounts> } </customer_summary>
If you have extra time, feel free to work on PS 2: Part II and ask the TAs for help.
Last updated on October 11, 2024.