Clojure MapReduce Reducer

Asked 13 years, 10 months ago

Viewed 521 times

\$\begingroup\$

This program forms the reducer of a Hadoop MapReduce job. It reads data in from stdin that is tab delimited.

foo 1
foo 1
bar 1

and outputs

foo 2
bar 1

Any suggestions for improvements?

(use '[clojure.string :only [split]])
(def reducer (atom {}))
(defn update-map [map key]
 (merge-with + map {key 1}))
(doseq [line (line-seq (java.io.BufferedReader. *in*))]
 (let [k (first (split line #"\t"))]
 (swap! reducer update-map k)))
(doseq [kv @reducer]
 (println (format "%s\t%s" (first kv) (second kv))))

clojure

edited Feb 23, 2012 at 8:52

asked Feb 21, 2012 at 10:55

MattyW's user avatar

MattyW

1434 bronze badges

\$\endgroup\$

Add a comment |

3 Answers 3

Sorted by: Reset to default

\$\begingroup\$

probably a bit too late to help OP, but in case anyone else stumbles upon this question, here's a nice succinct way of doing it, using the frequencies function:

(doseq [[word freq] (frequencies
 (map
 #(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
 (line-seq (java.io.BufferedReader. *in*))))]
 (println (str word "\t" freq)))

answered Aug 18, 2012 at 17:11

David Sheldrick's user avatar

David Sheldrick

561 bronze badge

\$\endgroup\$

Add a comment |

\$\begingroup\$

Why don't you use reduce instead of the first doseq? Something along the lines (untested, entered directly here):

(def response
 (reduce (fn [map line]
 (let [k (fist (split line #"\t"))]
 (update-map map k)))
 {} (line-seq (java.io.BufferedReader. *in*)))
(doseq [kv response]
 (println (format "%s\t%s" (first kv) (second kv))))

Then you won't need the atom either.

answered Apr 26, 2012 at 7:28

ivant's user avatar

ivant

1212 bronze badges

\$\endgroup\$

Add a comment |

\$\begingroup\$

Can ouput contain numbers other than 1? Like:

foo 1
foo 3
bar 10

If so, then:

(use '[clojure.string :only [split]])
(def parsed-input
 (for [line (line-seq (java.io.BufferedReader. *in*))
 :let [[k v] (split line #"\t")]]
 {k (Double/parseDouble v)}))
(def table (apply (partial merge-with + {}) parsed-input))
(doseq [[k v] table]
 (println (str k "\t" v)))

Outputs:

bar 10.0
foo 4.0

If it's just 1's frequencies will do as suggested.

answered Jun 10, 2013 at 12:09

Art's user avatar

Art

1011 bronze badge

\$\endgroup\$

Add a comment |

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

clojure

See similar questions with these tags.

lang-clj

Stack Exchange Network

Clojure MapReduce Reducer

3 Answers 3

You must log in to answer this question.

Hot Network Questions

Clojure MapReduce Reducer

3 Answers 3

You must log in to answer this question.

Related

Hot Network Questions