email@example.com, the unique identifier for their team will be
alice_bob, where, in cases with two members, the two usernames are in ascending alphabetical order and separated by an underscore. For a one-member team, it is simply the BU login name of the sole member (e.g.,
https://github.com/Data-Mechanics/course-2016-spr-proj-zero, adding a single folder named using the group identifier (
alice_bob) that contains a single ASCII text file
members.txt. Each team member should commit a line to that text file specifying a mapping from their GitHub username and their BU username. For example, if Alice and Bob have the GitHub usernames
bobgh, respectively, then the file should look as follows:
members.txtfile you add above).
alice_bobin accordance with Project #0.
https://github.com/Data-Mechanics/course-2016-spr-proj-one. Note that we may commit and push a few additional fixes or updates to this repository before the deadline, so check for updates and pull them into your fork on a regular basis. Set up a MongoDB and Python environment as necessary for the project. You should be able to run the
setup.jsscript to prepare the repository, and then to start your MongoDB server with authentication enabled and run the
alice_bobwithin the top-level directory of the project. All your scripts for retrieval of data and for transforming data already within the repository should be placed within this folder. Note that the file name for each script should be reasonably descriptive, as it will act as the identifier for the script. The scripts should also follow reasonable modularity and encapsulation practices, and should logically perform related operations. A larger number of smaller simpler, and more reusable scripts that each perform a small task is preferable.
README.mdfile within your directory (along with any documentation you may write in that file).
auth.jsonfile, and all your scripts should retrieve the credential information from that file when it is passed to them over the command line, as seen below.
auth.jsonfile and do not include hard-coded authentication credentials in your code. You can use a
.gitignorefile to ensure you do not accidentially submit the credentials file. The course staff will use their own authentication credentials when running your code. Your
README.mdfile should list any idiosyncratic details associated with the credentials needed to run your scripts.
README.mdfile how to obtain and run those tools).
plan.jsonin the directory that encodes the overall plan describing what all the scripts do. It should not have explicit time stamps, but it should be a provenance document conforming to the PROV standard that is the union of the all the PROV documents that the individual scripts generate. The information in this file, together with the scripts, should be sufficient to reproduce the data obtained and generated by the scripts.
alice_bobin accordance with Project #0. Teams consisting of up to four people are permitted for this project.
README.mdfile within your directory (you may only need to update your existing file).
README.mdfile how to obtain and run those tools). If you anticipate that an algorithm or technique you employ could be applied to much larger data sets, you should try to implement it as a transformation in the relational model or the MapReduce paradigm. In most cases, the results will consist of new data sets; these should be inserted into the repository along with all the others in the usual way (and should be considered derived data sets).
plan.jsonfile in your project directory that encodes the overall plan describing what all the scripts do. It should not have explicit time stamps, but it should be a provenance document conforming to the PROV standard that is the union of the all the PROV documents that the individual scripts generate. As before, the information in this file, together with the scripts, should be sufficient to reproduce the data obtained and generated by the scripts.
|||"World's population increasingly urban with more than half living in urban areas". https://www.un.org/development/desa/en/news/population/world-urbanization-prospects.html|
|||Luís M. A. Bettencourt, José Lobo, Dirk Helbing, Christian Kühnert, and Geoffrey B. West. "Growth, innovation, scaling, and the pace of life in cities". Proceedings of the National Academy of Sciences of the United States of America 2007;104(17):7301-7306. http://www.ncbi.nlm.nih.gov/pmc/articles/PMC1852329/|
|||Robert Albright, and Alan Demers, Johannes Gehrke, Nitin Gupta, Hooyeon Lee, Rick Keilty, Gregory Sadowski, Ben Sowell, and Walker White. "SGL: A Scalable Language for Data-driven Games". Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data 2008. http://www.cs.cornell.edu/~sowell/2008-sigmod-games-demo.pdf|
|||Walker White, Benjamin Sowell, Johannes Gehrke, and Alan Demers. "Declarative Processing for Computer Games". Proceedings of the 2008 ACM SIGGRAPH Symposium on Video Games 2008. http://www.cs.cornell.edu/~sowell/2008-sandbox-declarative.pdf|
|||W3C Working Group. "PROV Model Primer". https://www.w3.org/TR/prov-primer/|
|||Robert Ikeda and Jennifer Widom. "Data Lineage: A Survey". http://ilpubs.stanford.edu:8090/918/1/lin_final.pdf|
|||Y. Cui and J. Widom. "Lineage Tracing for General Data Warehouse Transformations". The VLDB Journal 2003;12(1):41-58. http://ilpubs.stanford.edu:8090/525/1/2001-5.pdf|
|||Gerd Gigerenzer. "Mindless statistics". The Journal of Socio-Economics 2004;33(5):587-606. http://www.unh.edu/halelab/BIOL933/papers/2004_Gigerenzer_JSE.pdf|
|||Nihar B. Shah and Dengyong Zhou. "Double or Nothing: Multiplicative Incentive Mechanisms for Crowdsourcing". CoRR 2014. http://www.eecs.berkeley.edu/~nihar/publications/double_or_nothing.pdf|
YYYYis the year,
SSSSis the semester (
two, and so on.
course-YYYY-SSS-proj-zero) will be a public repository so that those who are not members of the Data-Mechanics organozation can see and fork it in the steps below, as that will be how the course staff obtain everyone's GitHub usernames.
BUlogin1_BUlogin2_BUlogin3(depending on the number of members), where the login names
BUloginNare the official Boston University login names for the members, and they are ordered in ascending alphabetical order and separated by underscores. All changes constituting work on the project should be made within this subdirectory and nowhere else, unless otherwise specified in the posted project instructions.
masterbranch will represent the group's submission. At some point before the project deadline, the group must submit a pull request to official submit their work. Only the changes that were committed before the merge request is made will be accepted as submitted.
GEO2Dindex using PyMongo: http://api.mongodb.org/python/current/examples/geo.html, and
provlibrary for Python on Windows, you may have some trouble installing the
lxmlpackage. Try obtaining the precompiled package here, instead, and installing it using
pip install *.whl;
optimize.linprogmodule for solving linear optimization problems, and the library is relatively straightforward to install: