3
\$\begingroup\$

I'm trying to get the most out of this code, so I would understand what should I look for in the future. The code below, works fine, I just want to make it more efficient.

Any suggestions?

from mrjob.job import MRJob
import operator
import re
# append result from each reducer 
output_words = []
class MRSudo(MRJob):
 def init_mapper(self):
 # move list of tuples across mapper
 self.words = []
 def mapper(self, _, line):
 command = line.split()[-1]
 self.words.append((command, 1))
 def final_mapper(self):
 for word_pair in self.words:
 yield word_pair
 def reducer(self, command, count): 
 # append tuples to the list
 output_words.append((command, sum(count)))
 def final_reducer(self):
 # Sort tuples in the list by occurence
 map(operator.itemgetter(1), output_words)
 sorted_words = sorted(output_words, key=operator.itemgetter(1), reverse=True)
 for result in sorted_words:
 yield result
 def steps(self):
 return [self.mr(mapper_init=self.init_mapper,
 mapper=self.mapper,
 mapper_final=self.final_mapper,
 reducer=self.reducer,
 reducer_final=self.final_reducer)]
if __name__ == '__main__':
 MRSudo.run()
asked Apr 5, 2013 at 20:40
\$\endgroup\$

1 Answer 1

2
\$\begingroup\$

Since the reduce function in this case is commutative and associative you can use a combiner to pre-aggregate values.

def combiner_count_words(self, word, counts):
 # sum the words we've seen so far
 yield (word, sum(counts))
def steps(self):
 return [self.mr(mapper_init=self.init_mapper,
 mapper=self.mapper,
 mapper_final=self.final_mapper,
 combiner= self.combiner_count_words,
 reducer=self.reducer,
 reducer_final=self.final_reducer)]
answered Aug 19, 2015 at 9:31
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.