Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit e6b4f4b

Browse files
Improved Documentation for Hadoop
1 parent 2e59951 commit e6b4f4b

File tree

2 files changed

+44
-17
lines changed

2 files changed

+44
-17
lines changed

‎hadoop/README.md‎

Lines changed: 37 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -215,6 +215,36 @@ In order to run Hadoop in a pseudo-distributed fashion, we need to enable passwo
215215
<ol>
216216
<li>In the terminal, execute <code>ssh localhost</code> to test if you can open a <a href="https://en.wikipedia.org/wiki/Secure&#95;Shell">secure shell</a> connection to your current, local computer <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Setup&#95;passphraseless&#95;ssh">without needing a password</a>.
217217
</li>
218+
<li>It may say something like:
219+
<pre>ssh: connect to host localhost port 22: Connection refused</pre>.
220+
If it does say this, then do
221+
<pre>sudo apt-get install ssh</pre>
222+
and it may say something like
223+
<pre>
224+
Reading package lists... Done
225+
Building dependency tree
226+
Reading state information... Done
227+
The following extra packages will be installed:
228+
libck-connector0 ncurses-term openssh-server openssh-sftp-server
229+
ssh-import-id
230+
Suggested packages:
231+
rssh molly-guard monkeysphere
232+
The following NEW packages will be installed:
233+
libck-connector0 ncurses-term openssh-server openssh-sftp-server ssh
234+
ssh-import-id
235+
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
236+
Need to get 661 kB of archives.
237+
After this operation, 3,528 kB of additional disk space will be used.
238+
Do you want to continue? [Y/n] y
239+
...
240+
Setting up ssh-import-id (4.5-0ubuntu1) ...
241+
Processing triggers for ufw (0.34-2) ...
242+
Setting up ssh (1:6.9p1-2ubuntu0.2) ...
243+
Processing triggers for libc-bin (2.21-0ubuntu4.1) ...
244+
Processing triggers for systemd (225-1ubuntu9.1) ...
245+
Processing triggers for ureadahead (0.100.0-19) ...
246+
</pre>
247+
OK, now you've got SSH installed. Do <code>ssh localhost</code> again.</li>
218248
<li>It may ask you something like
219249
<pre>
220250
The authenticity of host 'localhost (127.0.0.1)' can't be established.
@@ -245,10 +275,10 @@ Are you sure you want to continue connecting (yes/no)?
245275
</pre>
246276
which you would answer with <code>yes</code> followed by a hit to the enter button. If, after that, you get a message like <code>0.0.0.0: packet&#95;write&#95;wait: Connection to 127.0.0.1: Broken pipe</code>, enter <code>sbin/stop-dfs.sh</code>, hit return, and do <code>sbin/start-dfs.sh</code> again.</li>
247277
<li>In your web browser, open <code>http://localhost:50070/</code>. It should display a web page giving an overview about the Hadoop system now running on your local computer.</li>
248-
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.
278+
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code>&lt;userName&gt;</code> with your user/login name on your current machine.
249279
<pre>
250280
bin/hdfs dfs -mkdir /user
251-
bin/hdfs dfs -mkdir /user/<userName>
281+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
252282
bin/hdfs dfs -put etc/hadoop input
253283
</pre></li>
254284
<li>We can now run the job via
@@ -269,18 +299,18 @@ cat output/*
269299
We now want to run one of the provided examples. Let us assume we want to run the <code>wordCount</code> example. For other examples, just replace <code>wordCount</code> with their names in the following text. I assume that the <code>distributedComputingExamples</code> repository is located in a folder <code>Y</code> on your machine.
270300
<ol>
271301
<li>Open a terminal and enter your Hadoop installation folder. I assume you installed Hadoop version <code>2.7.2</code> into a folder named <code>X</code>, so you would <code>cd</code> into <code>X/hadoop-2.7.2/</code>.</li>
272-
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.
302+
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code>&lt;userName&gt;</code> with your local login/user name.
273303
<pre>
274304
bin/hdfs namenode -format
275305
</pre>
276306
(answer with <code>Y</code> when asked whether to re-format the file system)
277307
<pre>
278308
sbin/start-dfs.sh
279309
bin/hdfs dfs -mkdir /user
280-
bin/hdfs dfs -mkdir /user/<userName>
310+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
281311
</pre>
282312
If you actually properly cleaned up the file system after running your last examples (see the second-to-last step here), you just need to do <code>sbin/start-dfs.sh</code> and do not need to format the HDFS.</li>
283-
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
313+
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/hadoop/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
284314
<li>Do <code>bin/hdfs dfs -ls input</code> to check if the files have properly been copied.</li>
285315
<li>You can now do <code>bin/hadoop jar Y/distributedComputingExamples/hadoop/wordCount/target/wordCount-full.jar input output</code>. This command will start the main class of the example, which resides in the fat jar <code>wordCount-full.jar</code>, with the parameters <code>input</code> and <code>output</code>. <code>input</code> here is the input folder, which we previously have copied to the Hadoop file system. <code>output</code> is the output folder to be created. If you execute this command, you will see lots of logging information.</li>
286316
<li>Do <code>bin/hdfs dfs -ls output</code>. You will see output like
@@ -332,13 +362,13 @@ Sometimes, you may try to copy some file or folder to HDFS and get an error that
332362

333363
<ol>
334364
<li>Execute <code>sbin/stop-dfs.sh</code></li>
335-
<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>
365+
<li>Delete the folder <code>/tmp/hadoop-&lt;userName&gt;</code>, where <code>&lt;userName&gt;</code> is to replaced with your local login/user name.</li>
336366
<li>Now perform
337367
<pre>
338368
bin/hdfs namenode -format
339369
sbin/start-dfs.sh
340370
bin/hdfs dfs -mkdir /user
341-
bin/hdfs dfs -mkdir /user/<userName>
371+
bin/hdfs dfs -mkdir /user/&lt;userName&gt;
342372
</pre>
343373
</li><li>
344374
If you now repeat the operation that failed before, it should succeed.

‎hadoop/webFinder/src/main/java/webFinder/WebFinderDriver.java‎

Lines changed: 7 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -29,18 +29,18 @@ public static void main(final String[] args) throws Exception {
2929

3030
@Override
3131
public int run(final String[] args) throws Exception {
32+
final Configuration conf;
33+
final Job job;
3234

33-
finalConfigurationconf = new Configuration();
34-
finalJobjob = Job.getInstance(conf, "Your job name");
35+
conf = new Configuration();
36+
job = Job.getInstance(conf, "Your job name");
3537

3638
job.setJarByClass(WebFinderDriver.class);
3739

3840
if (args.length < 2) {
3941
return 1;
4042
}
41-
42-
if (args.length > 2) {// set max depth
43-
// pass parameter to mapper
43+
if (args.length > 2) {// set max depth and pass parameter to mapper
4444
conf.setInt("maxDepth", Integer.parseInt(args[2]));
4545
}
4646

@@ -56,11 +56,8 @@ public int run(final String[] args) throws Exception {
5656
job.setInputFormatClass(TextInputFormat.class);
5757
job.setOutputFormatClass(TextOutputFormat.class);
5858

59-
final Path filePath = new Path(args[0]);
60-
FileInputFormat.setInputPaths(job, filePath);
61-
62-
final Path outputPath = new Path(args[1]);
63-
FileOutputFormat.setOutputPath(job, outputPath);
59+
FileInputFormat.setInputPaths(job, new Path(args[0]));
60+
FileOutputFormat.setOutputPath(job, new Path(args[1]));
6461

6562
job.waitForCompletion(true);
6663
return 0;

0 commit comments

Comments
(0)

AltStyle によって変換されたページ (->オリジナル) /