Recall that the MongoDB version of our movie database consists of three collections:
movies
containing documents about moviespeople
containing documents about actors and directorsoscars
containing documents about Academy Awards.In addition, remember that:
Whenever we refer to a person or movie, we also embed the associated entity’s name.
Person documents include one or both of the boolean fields
hasActed
and hasDirected
. These fields are only included when
their values are true.
Here are some sample documents:
{ _id: "0499549", name: "Avatar", year: 2009, rating: "PG-13", runtime: 162, genre: "AVYS", earnings_rank: 4, actors: [ { id: "0000244", name: "Sigourney Weaver" }, { id: "0002332", name: "Stephen Lang" }, { id: "0735442", name: "Michelle Rodriguez" }, { id: "0757855", name: "Zoe Saldana" }, { id: "0941777", name: "Sam Worthington" } ], directors: [ { id: "0000116", name: "James Cameron" } ] } { _id: "0000059", name: "Laurence Olivier", dob: "1907-5-22", pob: "Dorking, Surrey, England, UK", hasActed: true, hasDirected: true } { _id: ObjectID("528bf38ce6d3df97b49a0569"), year: 2013, type: "BEST-ACTOR", person: { id: "0000358", name: "Daniel Day-Lewis" }, movie: { id: "0443272", name: "Lincoln" } }
If you have installed and configured everything according to our directions, feel free to try out your queries using MongoDB Compass. However, doing so is not required.
Notes:
You may only use aspects of the MongoDB query language that we discussed in lecture.
The results of the query should include only the requested
information, with no extraneous fields. In particular, you should
exclude the _id
field from the results of your queries unless
the problem indicates otherwise.
You do not need to worry about the order of the fields in the results.
Here are the problems:
Find the top ten highest grossing movies in the database. For each such movie, output a document with the movie’s name and the name(s) of the director(s).
How could we solve this problem using the find
method?
db.movies.find( { earnings_rank: { $lte: 10 }}, { name: 1, "directors.name": 1, _id: 0 } )
How could we solve this problem using an aggregation pipeline
(i.e., the aggegrate
method)? In addition to finding the
same results as before, rename the field for the names of the
directors, giving it the name directors
.
db.movies.aggregate( { $match: { earnings_rank: { $lte: 10 } } }, { $project: { name: 1, directors: "$directors.name", _id: 0 } } )
How many actors in the database were born in California?
How could we solve this problem using the count
method?
db.people.count({ hasActed: true, pob: /, California/ })
How could we solve this problem using an aggregation pipeline?
The result should be a single document with a single field
named num_actors
.
db.people.aggregate( { $match: { hasActed: true, pob: /, California/ } }, { $group: { _id: null, num_actors: { $sum: 1 } } }, { $project: { num_actors: 1, _id: 0 } } )
Note: Using null
as the basis of the $group
stage
creates a single group for all of the documents that satisfy
the $match
stage, since we just want a single count of all
of those documents.
Which people in the database have directed a movie in which Tom
Hanks has acted? Use the distinct
method to produce a list
containing the names of these directors, with each name appearing
only once.
db.movies.distinct("directors.name", { "actors.name": "Tom Hanks" })
Find all years before 1970 that have 6 or more movies in the
database. Each result document should include a field called
year
for the year and a field called count
for the number of
movies from that year.
db.movies.aggregate( { $match: { year: { $lt: 1970 } } }, { $group: { _id: "$year", count: { $sum: 1 } } }, { $match: { count: { $gte: 6 } } }, { $project: { year: "$_id", count: 1, _id: 0 } } )
Find the person in the database who has acted in the most
movies. The final result should be a single document with fields
named actor
and num_movies
. You may assume that actor names
are unique. If there are multiple actors who are tied for the
most movies, the result can be any one of them.
Hint: Begin by using an $unwind stage to “unwind” the arrays of actors in all of the movie documents. Doing so will allow you to create a separate group for each actor.
db.movies.aggregate( { $unwind: "$actors" }, { $group: { _id: "$actors.name", num_movies: { $sum: 1 } } }, { $sort: { num_movies : -1 } }, { $limit: 1 }, { $project: { _id: 0, actor: "$_id", num_movies: 1 } } )
For extra practice on your own: Who directs the longest movies?
Of all directors who have directed 5 or more movies, find the 10
directors whose movies have the longest average runtime. The final
result should consist of documents with fields named director
(for the director’s name), avg_runtime
, and num_movies
.
db.movies.aggregate( { $unwind: "$directors" }, { $group: { _id: "$directors.name", avg_runtime: { $avg: "$runtime" }, num_movies: { $sum: 1 } } }, { $match: { num_movies: { $gte: 5 } } }, { $sort: { avg_runtime : -1 } }, { $limit: 10 }, { $project: { _id: 0, director: "$_id", avg_runtime: 1, num_movies: 1 } } )
Last updated on November 21, 2024.