NOTE: This page contains all the examples presented during the lectures, as well as all the assigned projects. Click here to go back to the main page with the course information and schedule.
email@example.com, the unique identifier for their team will be
alice_bob, where, in cases with two members, the two usernames are in ascending alphabetical order and separated by an underscore. For a one-member team, it is simply the BU login name of the sole member (e.g.,
https://github.com/Data-Mechanics/course-2016-fal-proj-zero, adding a single folder named using the group identifier (
alice_bob) that contains a single ASCII text file
members.txt. Each team member should commit a line to that text file specifying a mapping from their GitHub username and their BU username. For example, if Alice and Bob have the GitHub usernames
bobgh, respectively, then the file should look as follows:
members.txtfile you add above).
import urllib.request import json url = "https://data.cityofboston.gov/resource/awu8-dc52.json?$limit=10" response = urllib.request.urlopen(url).read().decode("utf-8") print(json.dumps(json.loads(response), sort_keys=True, indent=2))
alice_bob(in accordance with the requirements specified in Project #0).
https://github.com/Data-Mechanics/course-2016-fal-proj. Note that we may commit and push a few additional fixes or updates to this repository before the deadline, so check for updates and pull them into your fork on a regular basis. Set up a MongoDB and Python environment as necessary for the project (including the installation of all dependencies). You should be able to run the
setup.jsscript to prepare the repository, and then to start your MongoDB server with authentication enabled and run the
alice_bobwithin the top-level directory of the project. All the code constituting your submission, including all of your scripts and algorithm files (i.e., for retrieval of data and for transforming data already within the repository), should be placed within this folder. Do not place data files within this folder, or submit any data sets or data files via GitHub.
dml.Algorithmbase class (such that the class name matches exactly the filename). Consult
alice_bob/example.pyfor a working example script and algorithm.
class example(dml.Algorithm): contributor = 'alice_bob' reads =  writes = ['alice_bob.lost', 'alice_bob.found'] @staticmethod def execute(trial = False): ... @staticmethod def provenance(doc = prov.model.ProvDocument(), startTime = None, endTime = None): ...
README.mdfile within your directory (along with any documentation you may write in that file).
dml.Algorithmsubclasses and defining their
writesfields, as well as their
executemethods). Any authentication credentials your scripts use should be included in the file
auth.json(such that the file conforms to the
auth.schema.jsonschema). All your scripts should retrieve the credential information they need from your copy of the file by using the existing
dmlfunctionality for doing so.
auth.jsonfile and do not include hard-coded authentication credentials in your code. We already added it to the
.gitignorefile to ensure you do not accidentially submit the credentials file. The course staff will use their own authentication credentials when running your code. Your
README.mdfile should list any idiosyncratic details associated with the services and/or credentials needed to run your scripts.
README.mdfile how to obtain and run those tools).
provenancemethods). Each algorithm should generate a single provenance document when its
provenance()method is invoked, and should insert that document into the repository using
alice_bobin accordance with Project #0. Teams consisting of up to four people are permitted for this project.
README.mdfile within your directory (you may only need to update your existing file).
README.mdfile how to obtain and run those tools). As in Project #1, your algorithms should be implemented within scripts that extend the
dml.Algorithmbase class, should follow reasonable modularity and encapsulation practices, and should perform logically related operations.
trialparameter of the
execute()method is set to
True. In trial mode, the algorithm should complete its execution very quickly (in at most a few seconds) by operating on a very small portion of the input data set(s). However, it should still run through most (or, ideally, all) the code paths in the algorithm definition when it does so. This will make it possible to easily test the algorithm without running it on the entire data set.
README.mdfile within your repository, though HTML or PDF are also acceptable). It is expected that the report should come out to at least 3-5 pages (if printed in a 12-point font on 8.5 by 11 in. sheets), but there's no upper limit on length.
|||"World's population increasingly urban with more than half living in urban areas". https://www.un.org/development/desa/en/news/population/world-urbanization-prospects.html|
|||Luís M. A. Bettencourt, José Lobo, Dirk Helbing, Christian Kühnert, and Geoffrey B. West. "Growth, innovation, scaling, and the pace of life in cities". Proceedings of the National Academy of Sciences of the United States of America 2007;104(17):7301-7306. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1852329/|
|||Robert Albright, and Alan Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee, Rick Keilty, Gregory Sadowski, Ben Sowell, and Walker White. "SGL: A Scalable Language for Data-driven Games". Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data 2008. http://www.cs.cornell.edu/~sowell/2008-sigmod-games-demo.pdf|
|||Walker White, Benjamin Sowell, Johannes Gehrke, and Alan Demers. "Declarative Processing for Computer Games". Proceedings of the 2008 ACM SIGGRAPH Symposium on Video Games 2008. http://www.cs.cornell.edu/~sowell/2008-sandbox-declarative.pdf|
|||W3C Working Group. "PROV Model Primer". https://www.w3.org/TR/prov-primer/|
|||Robert Ikeda and Jennifer Widom. "Data Lineage: A Survey". http://ilpubs.stanford.edu:8090/918/1/lin_final.pdf|
|||Y. Cui and J. Widom. "Lineage Tracing for General Data Warehouse Transformations". The VLDB Journal 2003;12(1):41-58. http://ilpubs.stanford.edu:8090/525/1/2001-5.pdf|
|||Gerd Gigerenzer. "Mindless statistics". The Journal of Socio-Economics 2004;33(5):587-606. http://www.unh.edu/halelab/BIOL933/papers/2004_Gigerenzer_JSE.pdf|
|||Nihar B. Shah and Dengyong Zhou. "Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing". CoRR 2014. http://www.eecs.berkeley.edu/~nihar/publications/double_or_nothing.pdf|
YYYYis the year,
SSSSis the semester (
two, and so on.
course-YYYY-SSS-proj-zero) will be a public repository so that those who are not members of the Data-Mechanics organozation can see and fork it in the steps below, as that will be how the course staff obtain everyone's GitHub usernames.
BUlogin1_BUlogin2_BUlogin3(depending on the number of members), where the login names
BUloginNare the official Boston University login names for the members, and they are ordered in ascending alphabetical order and separated by underscores. All changes constituting work on the project should be made within this subdirectory and nowhere else, unless otherwise specified in the posted project instructions.
masterbranch will represent the group's submission. At some point before the project deadline, the group must submit a pull request to official submit their work. Only the changes that were committed before the merge request is made will be accepted as submitted.
GEO2Dindex using PyMongo: http://api.mongodb.org/python/current/examples/geo.html, and
provlibrary for Python on Windows, you may have some trouble installing the
lxmlpackage. Try obtaining the precompiled package here, instead, and installing it using
pip install *.whl;
optimize.linprogmodule for solving linear optimization problems, and the library is relatively straightforward to install: