Hadoop Streaming using python

Asked 10 years, 5 months ago

Viewed 941 times

I am trying to execute the map reduce code as below:

hadoop jar /usr/lib/Hadoop/Hadoop-streaming-0.20.2-cdh3u2.jar –file mapper.py –mapper mapper.py –file reducer.py – reducer reducer.py –input /user/training/samplypy.txt –ouput /user/training/pythonMR/output

getting below exception -

Exception in thread "main" java.lang.ClassNotFoundException: –file
 at java.net.URLClassLoader1ドル.run(URLClassLoader.java:366)
 at java.net.URLClassLoader1ドル.run(URLClassLoader.java:355)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
 at java.lang.Class.forName0(Native Method)
 at java.lang.Class.forName(Class.java:264)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:149)

I am using Hadoop 1.0.3. I've tried with multiple versions of hadoop-streaming jar like:

hadoop-streaming-0.20.2-cdh3u2.jar 
hadoop-streaming-1.2.0.jar 
hadoop-streaming.jar

Improve this question

edited Aug 2, 2015 at 19:35

johnnyRose's user avatar

johnnyRose

7,53017 gold badges43 silver badges61 bronze badges

asked Aug 2, 2015 at 18:47

Vishal Mishra's user avatar

Vishal Mishra

591 silver badge5 bronze badges

Where is the document that tells you to run these command?

suztomo
– suztomo

2015年08月02日 19:41:41 +00:00
Commented Aug 2, 2015 at 19:41
Have you got your $HADOOP_HOME env variable set?

user2046117
– user2046117

2015年08月04日 18:42:27 +00:00
Commented Aug 4, 2015 at 18:42
Refer this, stackoverflow.com/questions/16701979/….

srikanth
– srikanth

2015年08月11日 11:18:31 +00:00
Commented Aug 11, 2015 at 11:18

Add a comment |

1 Answer 1

Sorted by: Reset to default

One thing I can tell is that you did not use full path for '-file' statement:

–file /mapper/location/mapper.py (use full path with the file name here)

–mapper mapper.py (correct, mapper file name only)

–file /reducer/location/reducer.py (use full path with the file name here)

– reducer reducer.py (correct, reducer file name only)
make sure your -input and -output are pointing to HDFS not local path

Here is the code I used:

hadoop jar /opt/cloudera/parcels/hadoop-streaming.jar \
-D mapred.reduce.tasks=15 -D stream.map.input.field.separator=',' -D stream.map.output.field.separator=',' \
-D mapred.textoutputformat.separator=',' \
-input /user/temp/in/ \
-output /user/temp/out \
-file /app/qa/python/mapper.py \
-mapper mapper.py \
-file /app/qa/python/reducer.py \
-reducer reducer.py

Improve this answer

answered Nov 17, 2015 at 16:29

kennyut's user avatar

kennyut

3,8611 gold badge32 silver badges31 bronze badges

Comments

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

lang-py

CollectivesTM on Stack Overflow

Hadoop Streaming using python

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

CollectivesTM on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related