MF 602

Assignment 10: Databases and SQL

Objective and Overview

Objective: Develop experience using databases and the structured query language.

Preliminaries

In your work on this assignment, make sure to abide by the collaboration policies of the course.

For each problem in this problem set, we will be writing or evaluating some Python code. You are encouraged to use the Spyder IDE which will be discussed/presented in class, but you are welcome to use another IDE if you choose.

If you have questions while working on this assignment, please post them on Piazza! This is the best way to get a quick response from your classmates and the course staff.

Important Guidelines: Comments and Docstrings

  • Refer to the class Coding Standards for important style guidelines. The grader will be awarding/deducting points for writing code that comforms to these standards.

  • Every program file must begin with a descriptive header comment that includes your name, username/BU email, and a brief description of the work contained in the file.

  • Every function must include a descriptive docstring that explains what the function does and identifies/defines each of the parameters to the function.

Task 1: Writing SQL Queries

100 points; individual-only

In this task, you will write several progressively more challenging SQL queries to obtain results from the sqlite3 database contained in the file portfolio.db.

You must save these as python string variables in the python file a10task1.py (see sample in question 0).

You must save each query with the specified variable name so that the autograder script can find it. You will NOT submit your portfolio.db file; rather, the autograder will use a different version of this file that contains the same database tables, but different records. If your queries are created corrected, they will work on any data that follows the specified database schema.

To simplify running the queries, we will use the pandas library, which contains a helper function to execute an SQL query and return the results as a pd.DataFrame object. An example is shown in the starter code file.

Begin by downloading the two files: portfolio.db and a10task1.py. Do this entire task in the file a10task1.py, which contains some import statements and a helper function run_query.

  1. To begin, you must discover the names of the tables and the field names for each table. Here are four initial queries to try. You will notice these in the starter code file:

        example_0a = '''SELECT name FROM sqlite_master'''
        example_0b = '''pragma table_info(clients)'''
        example_0c = '''pragma table_info(trades)'''
        example_0d = '''pragma table_info(price_history)'''
    
        ################################################################################
        if __name__ == '__main__':
    
            # Ask Pandas to run a query and return the resulting data set as a pd.DataFrame object:
            print(run_query(example_0a))
    

    Notice that the only statements in the global scope are variable declarations and comments. Do not print anything in the global scope, as it will cause the autograder to fail.

  2. Write a SELECT query to retrieve all records from the price_history table for the date of 2020-12-31, in order by security.
    Save the query as a string in the variable named sql_01.

  3. Write a SELECT query to retrieve all records from the trades table for the client Angela Merkel, in order by trade_date. Save the query as a string in the variable named sql_02.

    Hints:

    • You may use a single table for this query, and specify the client_id in the WHERE clause.
  4. Write a SELECT query to retrieve all records from the trades table corresponding to trades made during the calendar year 2018. Sort the records by trade_date. Save the query as a string in the variable named sql_03.

    Hints:

    • You may use a single table for this query, and specify the trade_date in the WHERE clause.

    • Use the boolean AND to join multiple conditions together.

  5. Write a SELECT query to produce a report of all securities and the number of trades for each, in order by security. Save the query as a string in the variable named sql_04.

    Hints:

    • Use the COUNT aggregate function and an appropriate GROUP BY clause.
  6. Write a SELECT query to produce a report showing all clients, and the number of trades for each client. Your results must include the client’s first and last names, as well as a count. Sort the results in descending orer by the number of trades. Save the query as a string in the variable named sql_05.

    Hints:

    • Use the INNER JOIN clause to join two tables together into one result set. Specify the column from each table on which to join in the ON clause.

    • Use the COUNT aggregate function and an appropriate GROUP BY clause.

  7. Write a SELECT query to produce a report showing all trades of the security CSCO. Your result set must include the date of the trade, the quantity traded, and the client’s first and last names. Sort the results by the date of the trade. Save the query as a string in the variable named sql_06.

    Hints:

    • Use the INNER JOIN clause to join two tables together into one result set. Specify the column from each table on which to join in the ON clause.
  8. Write a SELECT query to produce a report showing all trades by all clients for the calendar year of 2019. Your result set must include the date of the trade, the client’s first and last names, the security traded, and the quantity of the trade. Sort the results by the date of the trade. Save the query as a string in the variable named sql_07.

  9. Write a SELECT query to produce a report showing all trades (for all dates) by Angela Merkel, including the date of the trade, the security traded, the quantity, and the price (we assume a single price per date, i.e., the adjusted closing price).

    Your result set must include the date of the trade, the security traded, and the quantity of the trade. We do not need to include the client’s name since we will be specifying a single client.

    Sort the results in ascending order by the date of the trade. Save the query as a string in the variable named sql_08.

    Hints:

    • You will need to join together 2 tables: trades and price_history.

    • Use the WHERE clause to specify that results should be for a single client only.

    For example, the report of all trades by Hermione Granger might look like this:

        trade_date  security    quantity    price
    0   2002-13-02  AMZN        100.0       24.110001
    1   2004-10-11  NKE         300.0       4.588508
    2   2008-08-05  WMT         400.0       45.496872
    3   2009-02-05  DIS         300.0       16.111172
    4   2010-11-05  VZ          100.0       21.618628
    5   2011-01-26  UNH         200.0       34.745735
    6   2011-05-17  PFE         400.0       15.394896
    7   2011-13-01  UTX         250.0       63.199425
    8   2013-01-24  DIS         250.0       35.142239
    9   2013-05-16  BA          250.0       81.244209
    10  2013-05-20  AAPL        250.0       56.105534
    11  2013-09-30  UTX         300.0       92.840584
    13  2014-01-31  MCD         400.0       78.082794
    13  2014-09-10  AXP         200.0       80.801811
    14  2015-04-07  PFE         50.0        28.807238
    15  2016-05-25  AXP         200.0       61.259422
    16  2020-02-10  UTX         200.0       155.808517
    

    * NOTE: The sample output was produced with a *different database, so you will not match these exact results. ****

  10. Write a SELECT query to produce a report showing the value of all securities in
    Angela Merkel‘s portfolio as of 2020-12-31.

    Your result set must include which security, the total quantity owned, the price per share, and the value of the position (quantity times price). We do not need to include the client’s name since we will be specifying a single client.

    Sort the results in descending order by the value of the position. Save the query as a string in the variable named sql_09.

    Hints:

    • You will need to join together 2 tables: trades and price_history.

    • Use the WHERE clause to specify that results should be for a single client only, and to specify date.

    • Use the aggregate function SUM and an appropriate GROUP BY clause to calculate the total quantity for each security.

    For example, a partial report of all positions held by Hermione Granger on 2020-03-31 might look like this:

    security    quantity    price   value
    0   AMZN    100.0   1949.719971 194971.997070
    1   UTX     750.0   94.330002   70747.501373
    2   MCD     400.0   165.350006  66140.002441
    3   AAPL    250.0   254.289993  63572.498322
    4   DIS     550.0   96.599998   53139.999161
    5   WMT     450.0   113.620003  51139.001336
    6   UNH     200.0   249.380005  49876.000977
    7   BA      250.0   149.139999  37284.999847
    8   AXP     400.0   85.137077   34054.830933
    9   NKE     300.0   82.739998   24821.999359
    10  PFE     450.0   32.639999   14687.999725
    11  VZ      100.0   53.158306   5315.830613
    

    * NOTE: The sample output was produced with a different database, so you will not match these exact results. *

    Your results will be different – for Angela Merkel, and for 2020-12-31 – so you will need to check them for correctness!

  11. Write a SELECT query to produce a report showing entire portfolio, with cost basis and current value for all positions (i.e., individual purchases) held in
    of Angela Merkel‘s portfolio as of 2020-12-31.

    The result set must include each trade, the security, the quantity bought in that trade, the price per share on the date of purchase, and the cost of the position (quantity times price), the current price per share (as of 2020-03-31), and the current value of the position (quantity time price).

    We do not need to include the client’s name since we will be specifying a single client.

    Sort the results in ascending order by the security’s name, and by the date on which the position was acquired. Save the query as a string in the variable named sql_10.

    Hints:

    • You will need to do 2 JOINs, to join together 2 tables:

      In the first JOIN, you will connect the each trade to the price_history, to find the historical price as of the trade date.

      In the second JOIN, you will connect the trade to the price_history as of 2020-12-31, to find the ‘current price’.

    • Give each JOIN its own alias, i.e.,

          SELECT [table_a.field1, table_z.field2, ...]
          FROM table_a
          INNER JOIN table_b as "table_z"
          ON [.... ]
      

      When you do this, you can refer to fields by the alias’ table name (i.e., table_z in the example above). This will help you prevent ambiguity when refering to the prices.

    • Use the WHERE clause to specify that results should be for a single client only, and to specify date.

    For example, a partial report of the positions held by Hermione Granger on 2020-03-31 might look like this:

        trade_date  security    quantity    purch. price    cost            current price   value
    0   2005-05-23  AMZN        100.0       35.680000       3568.000031     1949.719971     194971.997070
    1   2011-07-13  AXP         100.0       46.076759       4607.675934     85.137077       8513.707733
    2   2008-02-08  BA          50.0        57.977184       2898.859215     149.139999      7456.999969
    3   2006-02-27  CSCO        50.0        15.313878       765.693903      38.940796       1947.039795
    4   2009-09-16  DIS         250.0       24.429390       6107.347488     96.599998       24149.999619
    5   2016-06-21  DIS         200.0       93.231056       18646.211343    96.599998       19319.999695
    6   2015-02-10  GOOG        200.0       535.469849      107093.969727   1162.810059     232562.011719
    7   2005-10-17  KO          200.0       11.136735       2225.346947     44.250000       8850.000000
    8   2005-13-06  KO          200.0       11.394858       2278.971672     44.250000       8850.000000
    9   2009-08-10  KO          50.0        16.218536       810.926819      44.250000       2213.500000
    10  2011-03-24  KO          50.0        23.625408       1181.270409     44.250000       2213.500000
    11  2018-07-10  MCD         250.0       153.943283      38485.820770    165.350006      41337.501526
    13  2004-11-08  MRK         250.0       14.770099       3692.524672     76.940002       19235.000610
    13  2005-11-07  MRK         250.0       17.225214       4306.303501     76.940002       19235.000610
    14  2011-02-14  PFE         300.0       13.739881       4131.964169     32.639999       9791.999817
    15  2015-07-24  PFE         300.0       28.810135       8643.040466     32.639999       9791.999817
    16  2003-01-13  PG          50.0        25.874138       1393.706417     109.271553      5463.577652
    17  2009-06-24  SO          400.0       18.685911       7474.364471     54.139999       21655.999756
    18  2009-11-03  VZ          400.0       16.533220       6613.288116     53.158306       21363.322449
    19  2011-03-25  VZ          400.0       24.430988       9772.395325     53.158306       21363.322449
    20  2017-13-07  WMT         300.0       92.251381       27675.414276    113.620003      34086.000824
    

    * NOTE: The sample output was produced with a *different database, so you will not match these exact results. ****

    Your results will be different – for Angela Merkel, and for 2020-12-31 – so you will need to check them for correctness!


Submitting Your Work

20 points; will be assignmed by code review

Use the link to GradeScope (left) to submit your work.

Be sure to name your files correctly!

Under the heading for Assignment 10, attach these 2 required file: a10task1.py and portfolio.db.

When you upload the files, the autograder will test your program.

Notes: