How to improve performace of this Map Reduce function, Python mrjob

Asked 12 years, 5 months ago

Viewed 952 times

\$\begingroup\$

I'm trying to get the most out of this code, so I would understand what should I look for in the future. The code below, works fine, I just want to make it more efficient.

Any suggestions?

from mrjob.job import MRJob
import operator
import re
# append result from each reducer 
output_words = []
class MRSudo(MRJob):
 def init_mapper(self):
 # move list of tuples across mapper
 self.words = []
 def mapper(self, _, line):
 command = line.split()[-1]
 self.words.append((command, 1))
 def final_mapper(self):
 for word_pair in self.words:
 yield word_pair
 def reducer(self, command, count): 
 # append tuples to the list
 output_words.append((command, sum(count)))
 def final_reducer(self):
 # Sort tuples in the list by occurence
 map(operator.itemgetter(1), output_words)
 sorted_words = sorted(output_words, key=operator.itemgetter(1), reverse=True)
 for result in sorted_words:
 yield result
 def steps(self):
 return [self.mr(mapper_init=self.init_mapper,
 mapper=self.mapper,
 mapper_final=self.final_mapper,
 reducer=self.reducer,
 reducer_final=self.final_reducer)]
if __name__ == '__main__':
 MRSudo.run()

python

asked Apr 5, 2013 at 20:40

Vor's user avatar

Vor Vor

1313 bronze badges

\$\endgroup\$

Add a comment |

1 Answer 1

Sorted by: Reset to default

\$\begingroup\$

Since the reduce function in this case is commutative and associative you can use a combiner to pre-aggregate values.

def combiner_count_words(self, word, counts):
 # sum the words we've seen so far
 yield (word, sum(counts))
def steps(self):
 return [self.mr(mapper_init=self.init_mapper,
 mapper=self.mapper,
 mapper_final=self.final_mapper,
 combiner= self.combiner_count_words,
 reducer=self.reducer,
 reducer_final=self.final_reducer)]

answered Aug 19, 2015 at 9:31

Ashraf Abdul's user avatar

Ashraf Abdul Ashraf Abdul

212 bronze badges

\$\endgroup\$

Add a comment |

Your Answer

Draft saved

Draft discarded

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

python

See similar questions with these tags.

lang-py

Stack Exchange Network

How to improve performace of this Map Reduce function, Python mrjob

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

How to improve performace of this Map Reduce function, Python mrjob

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions