Assignment 10: Databases and SQL
Objective and Overview
Objective: Develop experience using databases and the structured query language.
Preliminaries
In your work on this assignment, make sure to abide by the collaboration policies of the course.
For each problem in this problem set, we will be writing or evaluating some Python code. You are encouraged to use the Spyder IDE which will be discussed/presented in class, but you are welcome to use another IDE if you choose.
If you have questions while working on this assignment, please post them on Piazza! This is the best way to get a quick response from your classmates and the course staff.
Important Guidelines: Comments and Docstrings
-
Refer to the class Coding Standards for important style guidelines. The grader will be awarding/deducting points for writing code that comforms to these standards.
-
Every program file must begin with a descriptive header comment that includes your name, username/BU email, and a brief description of the work contained in the file.
-
Every function must include a descriptive docstring that explains what the function does and identifies/defines each of the parameters to the function.
Task 1: Writing SQL Queries
100 points; individual-only
In this task, you will write several progressively more challenging SQL queries to
obtain results from the sqlite3
database contained in the file
portfolio.db
.
You must save these as python string variables in the python file
a10task1.py
(see sample in question 0).
You must save each query with the specified variable name so that the autograder script
can find it. You will NOT submit your portfolio.db
file; rather, the autograder will
use a different version of this file that contains the same database tables, but
different records. If your queries are created corrected, they will work on any data
that follows the specified database schema.
To simplify running the queries, we will use the pandas
library, which contains a
helper function to execute an SQL query and return the results as a pd.DataFrame
object. An example is shown in the starter code file.
Begin by downloading the two files: portfolio.db
and
a10task1.py
. Do this entire task in the file a10task1.py
,
which contains some import statements and a helper function run_query
.
-
To begin, you must discover the names of the tables and the field names for each table. Here are four initial queries to try. You will notice these in the starter code file:
example_0a = '''SELECT name FROM sqlite_master''' example_0b = '''pragma table_info(clients)''' example_0c = '''pragma table_info(trades)''' example_0d = '''pragma table_info(price_history)''' ################################################################################ if __name__ == '__main__': # Ask Pandas to run a query and return the resulting data set as a pd.DataFrame object: print(run_query(example_0a))
Notice that the only statements in the global scope are variable declarations and comments. Do not print anything in the global scope, as it will cause the autograder to fail.
-
Write a
SELECT
query to retrieve all records from theprice_history
table for thedate
of2020-12-31
, in order bysecurity
.
Save the query as a string in the variable namedsql_01
. -
Write a
SELECT
query to retrieve all records from thetrades
table for the clientAngela Merkel
, in order bytrade_date
. Save the query as a string in the variable namedsql_02
.Hints:
- You may use a single table for this query, and specify the
client_id
in theWHERE
clause.
- You may use a single table for this query, and specify the
-
Write a
SELECT
query to retrieve all records from thetrades
table corresponding to trades made during the calendar year 2018. Sort the records bytrade_date
. Save the query as a string in the variable namedsql_03
.Hints:
-
You may use a single table for this query, and specify the
trade_date
in theWHERE
clause. -
Use the boolean
AND
to join multiple conditions together.
-
-
Write a
SELECT
query to produce a report of all securities and the number of trades for each, in order bysecurity
. Save the query as a string in the variable namedsql_04
.Hints:
- Use the
COUNT
aggregate function and an appropriateGROUP BY
clause.
- Use the
-
Write a
SELECT
query to produce a report showing all clients, and the number of trades for each client. Your results must include the client’s first and last names, as well as a count. Sort the results in descending orer by the number of trades. Save the query as a string in the variable namedsql_05
.Hints:
-
Use the
INNER JOIN
clause to join two tables together into one result set. Specify the column from each table on which to join in theON
clause. -
Use the
COUNT
aggregate function and an appropriateGROUP BY
clause.
-
-
Write a
SELECT
query to produce a report showing all trades of the securityCSCO
. Your result set must include the date of the trade, the quantity traded, and the client’s first and last names. Sort the results by the date of the trade. Save the query as a string in the variable namedsql_06
.Hints:
- Use the
INNER JOIN
clause to join two tables together into one result set. Specify the column from each table on which to join in theON
clause.
- Use the
-
Write a
SELECT
query to produce a report showing all trades by all clients for the calendar year of 2019. Your result set must include the date of the trade, the client’s first and last names, the security traded, and the quantity of the trade. Sort the results by the date of the trade. Save the query as a string in the variable namedsql_07
. -
Write a
SELECT
query to produce a report showing all trades (for all dates) by Angela Merkel, including the date of the trade, the security traded, the quantity, and the price (we assume a single price per date, i.e., the adjusted closing price).Your result set must include the date of the trade, the security traded, and the quantity of the trade. We do not need to include the client’s name since we will be specifying a single client.
Sort the results in ascending order by the date of the trade. Save the query as a string in the variable named
sql_08
.Hints:
-
You will need to join together 2 tables:
trades
andprice_history
. -
Use the
WHERE
clause to specify that results should be for a single client only.
For example, the report of all trades by Hermione Granger might look like this:
trade_date security quantity price 0 2002-13-02 AMZN 100.0 24.110001 1 2004-10-11 NKE 300.0 4.588508 2 2008-08-05 WMT 400.0 45.496872 3 2009-02-05 DIS 300.0 16.111172 4 2010-11-05 VZ 100.0 21.618628 5 2011-01-26 UNH 200.0 34.745735 6 2011-05-17 PFE 400.0 15.394896 7 2011-13-01 UTX 250.0 63.199425 8 2013-01-24 DIS 250.0 35.142239 9 2013-05-16 BA 250.0 81.244209 10 2013-05-20 AAPL 250.0 56.105534 11 2013-09-30 UTX 300.0 92.840584 13 2014-01-31 MCD 400.0 78.082794 13 2014-09-10 AXP 200.0 80.801811 14 2015-04-07 PFE 50.0 28.807238 15 2016-05-25 AXP 200.0 61.259422 16 2020-02-10 UTX 200.0 155.808517
* NOTE: The sample output was produced with a *different database, so you will not match these exact results. ****
-
-
Write a
SELECT
query to produce a report showing the value of all securities in
Angela Merkel‘s portfolio as of 2020-12-31.Your result set must include which security, the total quantity owned, the price per share, and the value of the position (quantity times price). We do not need to include the client’s name since we will be specifying a single client.
Sort the results in descending order by the value of the position. Save the query as a string in the variable named
sql_09
.Hints:
-
You will need to join together 2 tables:
trades
andprice_history
. -
Use the
WHERE
clause to specify that results should be for a single client only, and to specify date. -
Use the aggregate function
SUM
and an appropriateGROUP BY
clause to calculate the totalquantity
for eachsecurity
.
For example, a partial report of all positions held by Hermione Granger on 2020-03-31 might look like this:
security quantity price value 0 AMZN 100.0 1949.719971 194971.997070 1 UTX 750.0 94.330002 70747.501373 2 MCD 400.0 165.350006 66140.002441 3 AAPL 250.0 254.289993 63572.498322 4 DIS 550.0 96.599998 53139.999161 5 WMT 450.0 113.620003 51139.001336 6 UNH 200.0 249.380005 49876.000977 7 BA 250.0 149.139999 37284.999847 8 AXP 400.0 85.137077 34054.830933 9 NKE 300.0 82.739998 24821.999359 10 PFE 450.0 32.639999 14687.999725 11 VZ 100.0 53.158306 5315.830613
* NOTE: The sample output was produced with a different database, so you will not match these exact results. *
Your results will be different – for Angela Merkel, and for 2020-12-31 – so you will need to check them for correctness!
-
-
Write a
SELECT
query to produce a report showing entire portfolio, with cost basis and current value for all positions (i.e., individual purchases) held in
of Angela Merkel‘s portfolio as of2020-12-31
.The result set must include each trade, the security, the quantity bought in that trade, the price per share on the date of purchase, and the cost of the position (quantity times price), the current price per share (as of 2020-03-31), and the current value of the position (quantity time price).
We do not need to include the client’s name since we will be specifying a single client.
Sort the results in ascending order by the security’s name, and by the date on which the position was acquired. Save the query as a string in the variable named
sql_10
.Hints:
-
You will need to do 2
JOIN
s, to join together 2 tables:In the first
JOIN
, you will connect the each trade to theprice_history
, to find the historical price as of the trade date.In the second
JOIN
, you will connect the trade to theprice_history
as of2020-12-31
, to find the ‘current price’. -
Give each
JOIN
its own alias, i.e.,SELECT [table_a.field1, table_z.field2, ...] FROM table_a INNER JOIN table_b as "table_z" ON [.... ]
When you do this, you can refer to fields by the alias’ table name (i.e.,
table_z
in the example above). This will help you prevent ambiguity when refering to the prices. -
Use the
WHERE
clause to specify that results should be for a single client only, and to specify date.
For example, a partial report of the positions held by Hermione Granger on 2020-03-31 might look like this:
trade_date security quantity purch. price cost current price value 0 2005-05-23 AMZN 100.0 35.680000 3568.000031 1949.719971 194971.997070 1 2011-07-13 AXP 100.0 46.076759 4607.675934 85.137077 8513.707733 2 2008-02-08 BA 50.0 57.977184 2898.859215 149.139999 7456.999969 3 2006-02-27 CSCO 50.0 15.313878 765.693903 38.940796 1947.039795 4 2009-09-16 DIS 250.0 24.429390 6107.347488 96.599998 24149.999619 5 2016-06-21 DIS 200.0 93.231056 18646.211343 96.599998 19319.999695 6 2015-02-10 GOOG 200.0 535.469849 107093.969727 1162.810059 232562.011719 7 2005-10-17 KO 200.0 11.136735 2225.346947 44.250000 8850.000000 8 2005-13-06 KO 200.0 11.394858 2278.971672 44.250000 8850.000000 9 2009-08-10 KO 50.0 16.218536 810.926819 44.250000 2213.500000 10 2011-03-24 KO 50.0 23.625408 1181.270409 44.250000 2213.500000 11 2018-07-10 MCD 250.0 153.943283 38485.820770 165.350006 41337.501526 13 2004-11-08 MRK 250.0 14.770099 3692.524672 76.940002 19235.000610 13 2005-11-07 MRK 250.0 17.225214 4306.303501 76.940002 19235.000610 14 2011-02-14 PFE 300.0 13.739881 4131.964169 32.639999 9791.999817 15 2015-07-24 PFE 300.0 28.810135 8643.040466 32.639999 9791.999817 16 2003-01-13 PG 50.0 25.874138 1393.706417 109.271553 5463.577652 17 2009-06-24 SO 400.0 18.685911 7474.364471 54.139999 21655.999756 18 2009-11-03 VZ 400.0 16.533220 6613.288116 53.158306 21363.322449 19 2011-03-25 VZ 400.0 24.430988 9772.395325 53.158306 21363.322449 20 2017-13-07 WMT 300.0 92.251381 27675.414276 113.620003 34086.000824
* NOTE: The sample output was produced with a *different database, so you will not match these exact results. ****
Your results will be different – for Angela Merkel, and for 2020-12-31 – so you will need to check them for correctness!
-
Submitting Your Work
20 points; will be assignmed by code review
Use the link to GradeScope (left) to submit your work.
Be sure to name your files correctly!
Under the heading for Assignment 10, attach these 2 required file:
a10task1.py
and portfolio.db
.
When you upload the files, the autograder will test your program.
Notes:
- You may resubmit multiple times, but only the last submission will be graded.