Project


Getting hands-on experience with database systems!

Project 0    Class Project

Comp115 Projects

Building and working with database systems has a very practical side. We use them in virtually all our activities. Part of the learning goal of this class is to know both how to use them and how to build them. To that end we have two projects: Project 0 focuses on how to use and benchmark database systems, and Project 1 focuses on understanding and building the internals of a database system.

Project 0

Due on Friday, February 17th, 11:59pm.

A common question in production is which system to use for a specific use-case. A good data engineer is able to provide such answers through experience, benchmarking, and intuition. The goal of this project is to start building these skills, starting with benchmarking.

Traditional DBMS architectures today follow two main approaches: a row-major and column-major approach. In this project we will create groups of 2 students, who will study these approaches. One of the students will work on the row-major system and the other one of the column-major system. We will use as a row-major system PostgreSQL and as a column-major system MonetDB. The goal of the project is to compare the performance of these two systems for a set of analytical queries taken from an industry-grade database systems benchmark.

To that end each group will prepare a document (about 5 pages) where the two systems will be compared for 8 different queries from the TPC-H benchmark. The students are free to select any 8 queries, and present their performance (query latency). The reported performance should be accompanied with experimental setup, any tuning done to the system, and information with regards to standard deviation. Finally, the report should discuss which of the two systems is preferable for what type of queries based on the observed results and the intuition developed throughout the experimentation with the systems.

Resources

Below we provide some resources including the default links for the database systems to use and the TPC-H benchmark. While the instructions show how to install the systems in a virtual machine (VM) you are free to use your own machine locally (in fact this is even better in terms of understanding the behavior of the systems). We offer the VM instructions to make sure that even if you cannot install the systems on your physical machine you can continue with Project 0.

Project0 Startup & Installation instructions (MonetDB and PostgreSQL)
TPC-H Benchmark
TPC-H Benchmark Spec File (go over Chapters 0, 1, and 2)
TPC-H Benchmark Data/Query Generator & DDL-Files for PostgreSQL and MonetDB

Class Project

Due at the end of the semester.

The goal of the project is to allow Comp115 students to learn more about the internals of database systems. In this assignment you will build a buffer layer that will offer the abstraction of a database file, and then on top of it you will build the index layer.

Project A: The Buffer Layer

The first part of the project implements the buffer layer that interacts with the file system to create and delete files, and subsequently open and close them and allocate new blocks when needed. The buffer layer is responsible for maintaining the most useful part of the files in memory. The files are read into the buffer one page at a time (when needed) and the pages are maintained in it and moved out only when the buffer is full.

Project B: The Heap File Layer

The second layer implements the abstraction of files. Each file consists of a number of blocks. The Heap File abstraction allows the application to access a file sequentially (or to ask for a specific block of the file) without knowing whether each (this) block resided in the buffer or on disk. Each file page stores records of information with fixed size. The heap file offers the abstraction of a sequential scan.

Class Project Documents

ProjectA

Project A (Buffer Management) Description & Startup
Project Header File/Tests/Dummy API Implementation
You can also copy the files to your local account by giving when logged in to the homework.eecs.tufts.edu machines:

cp /g/115/2017s/public_html/project/updated_code_0.1.zip .
and then:
unzip updated_code_0.1.zip

ProjectB

Project B (Heap File and File Scan) Description & Startup
Project Header Files (projectB.h and record.h) Version 2
Project Tests/Dummy API Implementation (needs the header files of projectA and projectB) Version 2