You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: hadoop/README.md
+37-7Lines changed: 37 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -215,6 +215,36 @@ In order to run Hadoop in a pseudo-distributed fashion, we need to enable passwo
215
215
<ol>
216
216
<li>In the terminal, execute <code>ssh localhost</code> to test if you can open a <ahref="https://en.wikipedia.org/wiki/Secure_Shell">secure shell</a> connection to your current, local computer <ahref="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Setup_passphraseless_ssh">without needing a password</a>.
217
217
</li>
218
+
<li>It may say something like:
219
+
<pre>ssh: connect to host localhost port 22: Connection refused</pre>.
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
236
+
Need to get 661 kB of archives.
237
+
After this operation, 3,528 kB of additional disk space will be used.
238
+
Do you want to continue? [Y/n] y
239
+
...
240
+
Setting up ssh-import-id (4.5-0ubuntu1) ...
241
+
Processing triggers for ufw (0.34-2) ...
242
+
Setting up ssh (1:6.9p1-2ubuntu0.2) ...
243
+
Processing triggers for libc-bin (2.21-0ubuntu4.1) ...
244
+
Processing triggers for systemd (225-1ubuntu9.1) ...
245
+
Processing triggers for ureadahead (0.100.0-19) ...
246
+
</pre>
247
+
OK, now you've got SSH installed. Do <code>ssh localhost</code> again.</li>
218
248
<li>It may ask you something like
219
249
<pre>
220
250
The authenticity of host 'localhost (127.0.0.1)' can't be established.
@@ -245,10 +275,10 @@ Are you sure you want to continue connecting (yes/no)?
245
275
</pre>
246
276
which you would answer with <code>yes</code> followed by a hit to the enter button. If, after that, you get a message like <code>0.0.0.0: packet_write_wait: Connection to 127.0.0.1: Broken pipe</code>, enter <code>sbin/stop-dfs.sh</code>, hit return, and do <code>sbin/start-dfs.sh</code> again.</li>
247
277
<li>In your web browser, open <code>http://localhost:50070/</code>. It should display a web page giving an overview about the Hadoop system now running on your local computer.</li>
248
-
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.
278
+
<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.
249
279
<pre>
250
280
bin/hdfs dfs -mkdir /user
251
-
bin/hdfs dfs -mkdir /user/<userName>
281
+
bin/hdfs dfs -mkdir /user/<userName>
252
282
bin/hdfs dfs -put etc/hadoop input
253
283
</pre></li>
254
284
<li>We can now run the job via
@@ -269,18 +299,18 @@ cat output/*
269
299
We now want to run one of the provided examples. Let us assume we want to run the <code>wordCount</code> example. For other examples, just replace <code>wordCount</code> with their names in the following text. I assume that the <code>distributedComputingExamples</code> repository is located in a folder <code>Y</code> on your machine.
270
300
<ol>
271
301
<li>Open a terminal and enter your Hadoop installation folder. I assume you installed Hadoop version <code>2.7.2</code> into a folder named <code>X</code>, so you would <code>cd</code> into <code>X/hadoop-2.7.2/</code>.</li>
272
-
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.
302
+
<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.
273
303
<pre>
274
304
bin/hdfs namenode -format
275
305
</pre>
276
306
(answer with <code>Y</code> when asked whether to re-format the file system)
277
307
<pre>
278
308
sbin/start-dfs.sh
279
309
bin/hdfs dfs -mkdir /user
280
-
bin/hdfs dfs -mkdir /user/<userName>
310
+
bin/hdfs dfs -mkdir /user/<userName>
281
311
</pre>
282
312
If you actually properly cleaned up the file system after running your last examples (see the second-to-last step here), you just need to do <code>sbin/start-dfs.sh</code> and do not need to format the HDFS.</li>
283
-
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
313
+
<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/hadoop/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>
284
314
<li>Do <code>bin/hdfs dfs -ls input</code> to check if the files have properly been copied.</li>
285
315
<li>You can now do <code>bin/hadoop jar Y/distributedComputingExamples/hadoop/wordCount/target/wordCount-full.jar input output</code>. This command will start the main class of the example, which resides in the fat jar <code>wordCount-full.jar</code>, with the parameters <code>input</code> and <code>output</code>. <code>input</code> here is the input folder, which we previously have copied to the Hadoop file system. <code>output</code> is the output folder to be created. If you execute this command, you will see lots of logging information.</li>
286
316
<li>Do <code>bin/hdfs dfs -ls output</code>. You will see output like
@@ -332,13 +362,13 @@ Sometimes, you may try to copy some file or folder to HDFS and get an error that
332
362
333
363
<ol>
334
364
<li>Execute <code>sbin/stop-dfs.sh</code></li>
335
-
<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>
365
+
<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>
336
366
<li>Now perform
337
367
<pre>
338
368
bin/hdfs namenode -format
339
369
sbin/start-dfs.sh
340
370
bin/hdfs dfs -mkdir /user
341
-
bin/hdfs dfs -mkdir /user/<userName>
371
+
bin/hdfs dfs -mkdir /user/<userName>
342
372
</pre>
343
373
</li><li>
344
374
If you now repeat the operation that failed before, it should succeed.
0 commit comments