Both basically have 2 parts:
Part 1: Divide up your data into many smaller parts that is small enough for you to work on each individually.
Part 2: Merge the results from all the small parts to one final result.
The only difference is that merge-sort is done on one computer while mapreduce is done on a distributed system. Thoughts?
2 Answers 2
They're similar from that perspective, but the big difference is that mergesort is a sorting algorithm, while MapReduce does arbitrary processing work on the data.
They both belong to a larger class of algorithms known as divide and conquer. The idea is that you break a large problem down into a bunch of smaller pieces and farm them out to be worked on separately and then combine the results. The above article says:
This divide and conquer technique is the basis of efficient algorithms for all kinds of problems, such as sorting (e.g., quicksort, merge sort), multiplying large numbers (e.g. the Karatsuba algorithm), finding the closest pair of points, syntactic analysis (e.g., top-down parsers), and computing the discrete Fourier transform (FFTs).