Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit 17ed262

Browse files
Slightly Improved Hadoop Word Count Example and Documentation
1 parent d9b2052 commit 17ed262

File tree

3 files changed

+12
-22
lines changed

3 files changed

+12
-22
lines changed

‎hadoop/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -20,9 +20,11 @@ If a word occurs multiple times in a line, one token is emitted for each occurre
2020

2121
<Text (the word), Iterable<WriteableInteger> (number of occurrences)>
2222

23-
The reducer also acts as combinator, meaning that a reduction step is also performed locally before the results of each mapper is sent to the central reduction step. It will add up all the occurrences to a single number and thus emit tuples of the form
23+
Hadoop as put all the `WriteableInteger` generated by the mapping step which belong to the same `Text (the word)` key into an `Iterable` list for us. Thus, for each word that the mapper has discovered, we get a list with numbers. All we have to do is to add them up and emit tuples of the form:
2424

2525
<Text (the word), WriteableInteger (total number of occurrences)>
26+
27+
The reducer here also acts as combinator, meaning that a reduction step is also performed locally before the results of each mapper is sent to the central reduction step. This way we can already add up some word counts locally and the amount of data that needs to be sent to the central reducer decreases, as two tuples for the same word are already merged. This is possible in this simple form because the output of the reducer is the same as the output of the mapper, just that the `WriteableInteger` part will not necessarily have value `1` afterwards.
2628

2729
After the reduction step, we therefore know how often each word occurred in the text. Furthermore, since the tuples are sorted automatically before reduction, the word/occurrences list is also nicely sorted alphabetically.
2830

‎hadoop/wordCount/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,7 @@
3636
<project.build.sourceEncoding>${encoding}</project.build.sourceEncoding>
3737
<project.reporting.outputEncoding>${encoding}</project.reporting.outputEncoding>
3838
<jdk.version>1.7</jdk.version>
39-
<project.mainClass>wordCount.WordCountDriver</project.mainClass>
39+
<project.mainClass>wordCount.WordCountDriver</project.mainClass>
4040
</properties>
4141

4242
<licenses>

‎hadoop/wordCount/src/main/java/wordCount/WordCountDriver.java

Lines changed: 8 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -16,21 +16,17 @@
1616
public class WordCountDriver extends Configured implements Tool {
1717

1818
public static void main(final String[] args) throws Exception {
19-
try {
20-
final int res = ToolRunner.run(new Configuration(),
21-
new WordCountDriver(), args);
22-
System.exit(res);
23-
} catch (final Exception e) {
24-
e.printStackTrace();
25-
System.exit(255);
26-
}
19+
System.exit(ToolRunner.run(new Configuration(), //
20+
new WordCountDriver(), args));
2721
}
2822

2923
@Override
3024
public int run(final String[] args) throws Exception {
25+
final Configuration conf;
26+
final Job job;
3127

32-
finalConfigurationconf = new Configuration();
33-
finalJobjob = Job.getInstance(conf, "Your job name");
28+
conf = new Configuration();
29+
job = Job.getInstance(conf, "Word Count Map-Reduce");
3430

3531
job.setJarByClass(WordCountDriver.class);
3632

@@ -39,25 +35,17 @@ public int run(final String[] args) throws Exception {
3935
}
4036

4137
job.setMapperClass(WordCountMapper.class);
42-
43-
// job.setMapOutputKeyClass(Text.class);
44-
// job.setMapOutputValueClass(IntWritable.class);
45-
4638
job.setReducerClass(WordCountReducer.class);
4739
job.setCombinerClass(WordCountReducer.class);
4840

4941
job.setOutputKeyClass(Text.class);
5042
job.setOutputValueClass(IntWritable.class);
5143

5244
job.setInputFormatClass(TextInputFormat.class);
53-
5445
job.setOutputFormatClass(TextOutputFormat.class);
5546

56-
final Path filePath = new Path(args[0]);
57-
FileInputFormat.setInputPaths(job, filePath);
58-
59-
final Path outputPath = new Path(args[1]);
60-
FileOutputFormat.setOutputPath(job, outputPath);
47+
FileInputFormat.setInputPaths(job, new Path(args[0]));
48+
FileOutputFormat.setOutputPath(job, new Path(args[1]));
6149

6250
job.waitForCompletion(true);
6351
return 0;

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /