Commit e6b4f4b

committed

Improved Documentation for Hadoop

1 parent 2e59951 commit e6b4f4bCopy full SHA for e6b4f4b

File tree

2 files changed

+44

-17

lines changed

hadoop
- README.md
- webFinder/src/main/java/webFinder
  - WebFinderDriver.java

2 files changed

+44

-17

lines changed

`‎hadoop/README.md‎`

Lines changed: 37 additions & 7 deletions

Original file line number	Diff line number	Diff line change
`@@ -215,6 +215,36 @@ In order to run Hadoop in a pseudo-distributed fashion, we need to enable passwo`
`215`	`215`	`<ol>`
`216`	`216`	`<li>In the terminal, execute <code>ssh localhost</code> to test if you can open a <a href="https://en.wikipedia.org/wiki/Secure_Shell">secure shell</a> connection to your current, local computer <a href="http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/SingleCluster.html#Setup_passphraseless_ssh">without needing a password</a>.`
`217`	`217`	`</li>`
	`218`	`+<li>It may say something like:`
	`219`	`+<pre>ssh: connect to host localhost port 22: Connection refused</pre>.`
	`220`	`+If it does say this, then do`
	`221`	`+<pre>sudo apt-get install ssh</pre>`
	`222`	`+and it may say something like`
	`223`	`+<pre>`
	`224`	`+Reading package lists... Done`
	`225`	`+Building dependency tree`
	`226`	`+Reading state information... Done`
	`227`	`+The following extra packages will be installed:`
	`228`	`+ libck-connector0 ncurses-term openssh-server openssh-sftp-server`
	`229`	`+ ssh-import-id`
	`230`	`+Suggested packages:`
	`231`	`+ rssh molly-guard monkeysphere`
	`232`	`+The following NEW packages will be installed:`
	`233`	`+ libck-connector0 ncurses-term openssh-server openssh-sftp-server ssh`
	`234`	`+ ssh-import-id`
	`235`	`+0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.`
	`236`	`+Need to get 661 kB of archives.`
	`237`	`+After this operation, 3,528 kB of additional disk space will be used.`
	`238`	`+Do you want to continue? [Y/n] y`
	`239`	`+...`
	`240`	`+Setting up ssh-import-id (4.5-0ubuntu1) ...`
	`241`	`+Processing triggers for ufw (0.34-2) ...`
	`242`	`+Setting up ssh (1:6.9p1-2ubuntu0.2) ...`
	`243`	`+Processing triggers for libc-bin (2.21-0ubuntu4.1) ...`
	`244`	`+Processing triggers for systemd (225-1ubuntu9.1) ...`
	`245`	`+Processing triggers for ureadahead (0.100.0-19) ...`
	`246`	`+</pre>`
	`247`	`+OK, now you've got SSH installed. Do <code>ssh localhost</code> again.</li>`
`218`	`248`	`<li>It may ask you something like`
`219`	`249`	`<pre>`
`220`	`250`	`The authenticity of host 'localhost (127.0.0.1)' can't be established.`
`@@ -245,10 +275,10 @@ Are you sure you want to continue connecting (yes/no)?`
`245`	`275`	`</pre>`
`246`	`276`	`which you would answer with <code>yes</code> followed by a hit to the enter button. If, after that, you get a message like <code>0.0.0.0: packet_write_wait: Connection to 127.0.0.1: Broken pipe</code>, enter <code>sbin/stop-dfs.sh</code>, hit return, and do <code>sbin/start-dfs.sh</code> again.</li>`
`247`	`277`	`<li>In your web browser, open <code>http://localhost:50070/</code>. It should display a web page giving an overview about the Hadoop system now running on your local computer.</li>`
`248`		`-<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.`
	`278`	`+<li>Now we can setup the required stuff for the example jobs (making HDFS directories and copying the input files). Make sure to replace <code><userName></code> with your user/login name on your current machine.`
`249`	`279`	`<pre>`
`250`	`280`	`bin/hdfs dfs -mkdir /user`
`251`		`-bin/hdfs dfs -mkdir /user/<userName>`
	`281`	`+bin/hdfs dfs -mkdir /user/<userName>`
`252`	`282`	`bin/hdfs dfs -put etc/hadoop input`
`253`	`283`	`</pre></li>`
`254`	`284`	`<li>We can now run the job via`
`@@ -269,18 +299,18 @@ cat output/*`
`269`	`299`	`We now want to run one of the provided examples. Let us assume we want to run the <code>wordCount</code> example. For other examples, just replace <code>wordCount</code> with their names in the following text. I assume that the <code>distributedComputingExamples</code> repository is located in a folder <code>Y</code> on your machine.`
`270`	`300`	`<ol>`
`271`	`301`	`<li>Open a terminal and enter your Hadoop installation folder. I assume you installed Hadoop version <code>2.7.2</code> into a folder named <code>X</code>, so you would <code>cd</code> into <code>X/hadoop-2.7.2/</code>.</li>`
`272`		`-<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.`
	`302`	`+<li>We want to start with a "clean" file system, so let us repeat some of the setup steps. Don't forget to replace <code><userName></code> with your local login/user name.`
`273`	`303`	`<pre>`
`274`	`304`	`bin/hdfs namenode -format`
`275`	`305`	`</pre>`
`276`	`306`	`(answer with <code>Y</code> when asked whether to re-format the file system)`
`277`	`307`	`<pre>`
`278`	`308`	`sbin/start-dfs.sh`
`279`	`309`	`bin/hdfs dfs -mkdir /user`
`280`		`-bin/hdfs dfs -mkdir /user/<userName>`
	`310`	`+bin/hdfs dfs -mkdir /user/<userName>`
`281`	`311`	`</pre>`
`282`	`312`	`If you actually properly cleaned up the file system after running your last examples (see the second-to-last step here), you just need to do <code>sbin/start-dfs.sh</code> and do not need to format the HDFS.</li>`
`283`		`-<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>`
	`313`	`+<li>Copy the input data of the example into HDFS. You find this data in the example folder <code>Y/distributedComputingExamples/hadoop/wordCount/input</code>. So you will perform <code>bin/hdfs dfs -put Y/distributedComputingExamples/hadoop/wordCount/input input</code>. Make sure to replace <code>Y</code> with the proper path. If copying fails, go to "2.6. Troubleshooting".</li>`
`284`	`314`	`<li>Do <code>bin/hdfs dfs -ls input</code> to check if the files have properly been copied.</li>`
`285`	`315`	<li>You can now do <code>bin/hadoop jar Y/distributedComputingExamples/hadoop/wordCount/target/wordCount-full.jar input output</code>. This command will start the main class of the example, which resides in the fat jar <code>wordCount-full.jar</code>, with the parameters <code>input</code> and <code>output</code>. <code>input</code> here is the input folder, which we previously have copied to the Hadoop file system. <code>output</code> is the output folder to be created. If you execute this command, you will see lots of logging information.</li>
`286`	`316`	`<li>Do <code>bin/hdfs dfs -ls output</code>. You will see output like`
`@@ -332,13 +362,13 @@ Sometimes, you may try to copy some file or folder to HDFS and get an error that`
`332`	`362`
`333`	`363`	`<ol>`
`334`	`364`	`<li>Execute <code>sbin/stop-dfs.sh</code></li>`
`335`		`-<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>`
	`365`	`+<li>Delete the folder <code>/tmp/hadoop-<userName></code>, where <code><userName></code> is to replaced with your local login/user name.</li>`
`336`	`366`	`<li>Now perform`
`337`	`367`	`<pre>`
`338`	`368`	`bin/hdfs namenode -format`
`339`	`369`	`sbin/start-dfs.sh`
`340`	`370`	`bin/hdfs dfs -mkdir /user`
`341`		`-bin/hdfs dfs -mkdir /user/<userName>`
	`371`	`+bin/hdfs dfs -mkdir /user/<userName>`
`342`	`372`	`</pre>`
`343`	`373`	`</li><li>`
`344`	`374`	`If you now repeat the operation that failed before, it should succeed.`

`‎hadoop/webFinder/src/main/java/webFinder/WebFinderDriver.java‎`

Lines changed: 7 additions & 10 deletions

Original file line number	Diff line number	Diff line change
`@@ -29,18 +29,18 @@ public static void main(final String[] args) throws Exception {`
`29`	`29`
`30`	`30`	`@Override`
`31`	`31`	`public int run(final String[] args) throws Exception {`
	`32`	`+ final Configuration conf;`
	`33`	`+ final Job job;`
`32`	`34`
`33`		`- finalConfigurationconf = new Configuration();`
`34`		`- finalJobjob = Job.getInstance(conf, "Your job name");`
	`35`	`+ conf = new Configuration();`
	`36`	`+ job = Job.getInstance(conf, "Your job name");`
`35`	`37`
`36`	`38`	`job.setJarByClass(WebFinderDriver.class);`
`37`	`39`
`38`	`40`	`if (args.length < 2) {`
`39`	`41`	`return 1;`
`40`	`42`	`}`
`41`		`-`
`42`		`- if (args.length > 2) {// set max depth`
`43`		`- // pass parameter to mapper`
	`43`	`+ if (args.length > 2) {// set max depth and pass parameter to mapper`
`44`	`44`	`conf.setInt("maxDepth", Integer.parseInt(args[2]));`
`45`	`45`	`}`
`46`	`46`
`@@ -56,11 +56,8 @@ public int run(final String[] args) throws Exception {`
`56`	`56`	`job.setInputFormatClass(TextInputFormat.class);`
`57`	`57`	`job.setOutputFormatClass(TextOutputFormat.class);`
`58`	`58`
`59`		`- final Path filePath = new Path(args[0]);`
`60`		`- FileInputFormat.setInputPaths(job, filePath);`
`61`		`-`
`62`		`- final Path outputPath = new Path(args[1]);`
`63`		`- FileOutputFormat.setOutputPath(job, outputPath);`
	`59`	`+ FileInputFormat.setInputPaths(job, new Path(args[0]));`
	`60`	`+ FileOutputFormat.setOutputPath(job, new Path(args[1]));`
`64`	`61`
`65`	`62`	`job.waitForCompletion(true);`
`66`	`63`	`return 0;`

0 commit comments

Comments

(0)

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit e6b4f4b

File tree

2 files changed

2 files changed

`‎hadoop/README.md‎`

`‎hadoop/webFinder/src/main/java/webFinder/WebFinderDriver.java‎`

0 commit comments