In addition, don’t forget that there are additional questions and answers on Piazza.
I can’t run my program on my own machine. What am I doing wrong?
Nothing! It isn’t possible to run these programs locally unless you install Hadoop, which is a complicated thing to do. Instead, you should follow the procedures we’ve given you to eliminate any syntax errors locally and then test your program on Gradescope.
I understand how to find and eliminate syntax errors, but how can I fully find and eliminate any logic errors? Just seeing how my results compare to the expected results on a single input file isn’t enough.
There are two things you can do:
Add temporary println
statements to your code. If you do so,
Gradescope will include the output of those statements as part
of the results that it displays.
Use option 2b under the Testing and debugging section to substitute your own input file(s) for the default one. In such cases, we don’t tell you if your results are correct, but we do show you the results that your program produces, and you can see if those results are the ones that you would expect for the input file that you upload.
Can we assume that each user has a single record in the data set?
Yes. Each user has one record that contains all of their information.
I’m getting an error about types (either a ClassCastException
or another message indicating incompatible types). Why is this
happening?
Make sure that the types are correct in the headers of your mapper
and reducer classes, and in the headers of the map()
and
reduce()
methods. In particular:
Make sure that your mapper classes have an input-key type of
either LongWritable
or Object
. Because the mappers obtain
their data from text files, the key is an offset from the
beginning of the text file, and the map-reduce framework uses
the type LongWritable
for those offsets.
However, because we ignore these offsets, it’s fine to just
use the Object
type for the input-key type, since any type
of object can be assigned to a parameter of type Object
.
Make sure that your map()
method has a header in which the
types of the key
and value
parameters match the first
two types in the header of the class.
Make sure that your reducer classes – and their reduce()
methods – have input types that match the output types of the
corresponding mapper class. For example, if the mapper has
output types of Text
and IntWritable
, the corresponding
reducer should have input types of Text
and IntWritable
,
and its reduce()
method should have a key
parameter of
type Text
and a values
parameter of type
Iterable<IntWritable>
.
In addition, make sure that your main()
method specifies the
correct types for the output keys and output values of your
mapper(s) and reducer(s).
See the lecture notes for more details and examples.
All of my reducer values appear to be the same. Why is this?
There is a subtlety in the way that the Iterable<T>
of values
passed to the reducer method is handled. Since there could
potentially be many items in the Iterable<T>
, Java uses the same
memory to serve each value to the reducer function
one-by-one. This means that if you use a for-each loop over the
Iterable<T>
and try to save each value in memory (such as in an
array), you will end up with an array all containing a reference
to the same value. See
here
for more information.
To avoid this, it’s worth noting that you shouldn’t really need to
save the values of the Iterable<T>
in memory anyway. You should
process them one by one and use them to update variables that are
local your method.
The results that the Autograder is displaying appear to be the output of my mapper, rather than the output of my reducer. Why would that be?
It’s possible that your reduce
method is never getting called
because of one or more issues with how you declared the types in
either (1) the extends
clause of your reducer class or (2) the
header of your reduce
method. If those types are not properly
declared, the reduce
method will not be called.
You can check if a method is being called by adding a temporary println
statement to the start of the method. The Autograder will display
anything that your code prints, so you can upload the program with
the included println
and see if anything is printed. If not, it
means that the method is never being called.
One thing to check: When it comes to the extends
clause of your
reducer class, make sure that you are not including Iterable
in that clause. Iterable
should be in the header of the reduce
method as part of the type of the collection of values that are
passed in, but it should not be in the extends
clause.
Is it okay if the numbers aren’t aligned in the output?
Yes. The map-reduce system puts a tab character between the keys and the values in the output files, and because different keys (in this case, different email domains) have different lengths, this can cause the values to not be aligned.
I’m getting a type mismatch error for the second mapper in my chain of map-reduce jobs. What am I doing wrong?
Because the second mapper is still reading from text files (the
ones produced by the first reducer), its input types should be the
same as for any other mapper: Object
for the input keys, and
Text
for the input values. Those input values will include both
the keys and values produced by the first reducer, separated by a
tab character. See the notes accompanying Problem 4 itself
for more detail.
I’m not getting any results at all for one of these problems. Any suggestions?
In addition to the issue mentioned in the previous question – which may also cause there to be no results, but without an explicit error message – here are two other things to check:
More generally, make sure that you have declared the correct
types in your extends
clauses and your method headers. See
question 3 in the General Questions section above for more on
this.
Make sure that your map
and reduce
methods actually
override the inherited methods. To do so, they must have the
same name and the same number of parameters. Among other
things, this means that the map
method in both mappers must be
called map
(not map1, map2, etc.) and the reduce
method in
both reducers must be called reduce
(not reduce1, reduce2,
etc.).
Last updated on November 13, 2024.