In addition, don’t forget that there are additional questions and answers on Piazza.
In our answers for 1.1 and 1.3, are we supposed to use the specific values found in the relational tuples (e.g., ‘Alan Turing’), or something more general?
You should use the specific values.
However, as indicated in the second guideline, you need to take into account the entire sets of relationships that the database will need to capture.
For example, consider the following document from our movies
collection for Part II (which we also discussed in lecture):
{ _id: "0499549", name: "Avatar", year: 2009, rating: "PG-13", runtime: 162, genre: "AVYS", earnings_rank: 4, actors: [ { id: "0000244", name: "Sigourney Weaver" }, { id: "0002332", name: "Stephen Lang" }, { id: "0735442", name: "Michelle Rodriguez" }, { id: "0757855", name: "Zoe Saldana" }, { id: "0941777", name: "Sam Worthington" } ], directors: [ { id: "0000116", name: "James Cameron" } ] }
If we only needed to capture information about Avatar, we could
have used a field called director
whose value was a single
embedded subdocument. However, because some movies may have more
than one director, we needed to use a field called directors
whose value is an array of one or more embedded subdocuments.
In the relational database, there are four tuples for the information that we are trying to capture. Does that mean that we also need to include four documents in our answers for 1.1 and 1.3?
No, you may not need four documents. To see why, consider our movie database. The relational version required 5 tables: Movie, Person, Oscar, Actor and Director. The MongoDB version only requires 3 collections: movies, people, and oscars. The difference has to do with how the two logical models capture many-to-many relationships.
In the relational model, we need to use separate tables like Actor
and Director to capture many-to-many relationships, because the
relational model doesn’t allow for multi-valued attributes. But
MongoDB allows for multi-valued attributes, so we can capture
those relationships inside the documents that store information
about the entities. For example, in the above movie document, the
actors
field captures the relationships between movies and the
people who acted in them, so we don’t need separate “actor”
documents.
When creating the documents for 1.3, should we take an approach like the one used in the above movie document, in which a person’s name is grouped with their id?
No. Our movie database takes a hybrid approach that is mostly reference-based, but that also uses some embedding because of the inclusion of the name of a people or movie whenever we use a reference.
In 1.3, you should use a purely reference-based approach with no embedding. For example, here is what a purely reference-based approach would look like for the movie document above:
{ _id: "0499549", name: "Avatar", year: 2009, rating: "PG-13", runtime: 162, genre: "AVYS", earnings_rank: 4, actors: [ "0000244", "0002332", "0735442", "0757855", "0941777" ], directors: [ "0000116" ] }
When capturing relationships, should we include information about a relationship in the documents for both of the entities involved, or should we only include in the document for one of the two entities? And if we only include it with one of the entities, how do we decide which one?
It depends. For example, in our MongoDB movie database, we only included information about the relationships between a movie and its actors in the document for the movie. We decided not to include it in the people documents of the actors, because the number of movies in which a person has acted could grow significantly over time and cause the document to become large enough that it would need to be moved on disk.
It’s worth noting that the possible growth of the document over time is more of a concern when using an embedded or hybrid approach, since an array of embedded subdocuments can take up significantly more room than just an array of references.
I understand that the _id
field is supposed to function as the
key of the document. This seems easy to implement when the primary
key of the corresponding tuple is a single value. What should we do
when the primary key is a combination of values?
You can let MongoDB assign the _id
value, as we did in the
documents from the oscars
collection in the movie database.
When you show an example of a document for which MongoDB is
assigning the _id
value, you can use notation like the
following:
_id: ObjectID1,
and specify that ObjectID1
is an ObjectID value generated by
MongoDB.
Will the number of documents needed for 1.1 be the same as the number of documents needed for 1.3?
It depends on how much embedding you decide to do in 1.1.
For example, in our movie database, we could have decided to only
have two collections: one for person
documents and one for
movie
documents. In this approach, we could have embedded
information about acting and directing Oscars in the corresponding
person
documents, and information about Best-Picture Oscars in
the corresponding movie documents.
In yet another approach, we could have just used a single collection for movie documents – and embedded people and Oscar information within those documents.
I’m unsure about how to approach problems 3.1, 3.2 and 3.3. Do you have any suggestions?
These problems are similar to the practice problems from pages 314-316 and pages 326-327 in the coursepack, which we will do together in lecture during the week of December 2. Consult your coursepack or the lecture videos for a reminder of how we solved those problems.
We will also cover similar problems in Lab 10.
MongoDB Compass is telling me that it can’t connect to localhost, even though it did so in the past. What can I do?
Try restarting your laptop. If you are running the MongoDB server as a service in the background, it can sometimes stop running. Restarting your laptop should restart the MongoDB server and allow Compass to connect to it. On MacOS, you may also need to explicitly restart the MongoDB server by entering the following command from the Terminal:
brew services start mongodb-community@8.0
In the results of our queries, do we need to worry about the order of the documents or the order of the fields within a given document?
Not unless the problem explicitly specifies that you should sort the results. The Autograder should give you full credit as long as you have all of the necessary documents, field names, and field values.
The values in my actual query results look the same as the ones in the expected results, but the Autograder is saying that the results are incorrect. Any suggestions?
Make sure that your field names are correct. For example, for
Query 7, make sure that you use a field name of num_action_in_top_15
.
For Query 3, the Autograder is telling me that countDocuments
is not a function. Any suggestions?
countDocuments
is a function name that is only available on
newer versions of MongoDB. In the version that is available on the
Autograder, the function is called simply count
.
For Query 5, I’m trying to apply the conditions needed to focus on years from 2010-2019, but it doesn’t appear to be working. Any suggestions?
Don’t forget that when forming a selection document that uses an implicit logical AND, you can’t have two separate subconditions that both involve the same field. For example, if we wanted to find all movies with earnings ranks between 10 and 20, the following selection document would not work:
// does NOT work! { earnings_rank: { $gte: 10 }, earnings_rank: { $lte: 20 } }
This doesn’t work because a JSON document can’t have two fields with the same name.
As discussed in lecture, one way to get around this is to use an
explicit $and
operator:
{ $and: [ { earnings_rank: { $gte: 10 } }, { earnings_rank: { $lte: 20 } } ] }
Since the earnings_rank
fields now belong to two separate
subdocuments, they don’t violate the rule that you can’t have two
fields with the same name.
Another option is to group the two inequality operators together using an implicit logical AND as follows:
{ earnings_rank: { $gte: 10, $lte: 20 } }
For Query 6, I’m not sure how to compute someone’s age so as to find the youngest three people from England. Any ideas?
You don’t need to compute the ages. Strings in MongoDB can be
compared using the same operators as integers, and because the
dob
values in the documents are strings of the form
yyyy-mm-dd
, the larger a dob
string is, the later the person
was born and the younger the person is.
Last updated on December 2, 2024.