4
\$\begingroup\$

This program forms the reducer of a Hadoop MapReduce job. It reads data in from stdin that is tab delimited.

foo 1
foo 1
bar 1

and outputs

foo 2
bar 1

Any suggestions for improvements?

(use '[clojure.string :only [split]])
(def reducer (atom {}))
(defn update-map [map key]
 (merge-with + map {key 1}))
(doseq [line (line-seq (java.io.BufferedReader. *in*))]
 (let [k (first (split line #"\t"))]
 (swap! reducer update-map k)))
(doseq [kv @reducer]
 (println (format "%s\t%s" (first kv) (second kv))))
asked Feb 21, 2012 at 10:55
\$\endgroup\$

3 Answers 3

4
\$\begingroup\$

probably a bit too late to help OP, but in case anyone else stumbles upon this question, here's a nice succinct way of doing it, using the frequencies function:

(doseq [[word freq] (frequencies
 (map
 #(re-find #"^[^\t]+" %) ;; just get the first non-tab characters
 (line-seq (java.io.BufferedReader. *in*))))]
 (println (str word "\t" freq)))
answered Aug 18, 2012 at 17:11
\$\endgroup\$
2
\$\begingroup\$

Why don't you use reduce instead of the first doseq? Something along the lines (untested, entered directly here):

(def response
 (reduce (fn [map line]
 (let [k (fist (split line #"\t"))]
 (update-map map k)))
 {} (line-seq (java.io.BufferedReader. *in*)))
(doseq [kv response]
 (println (format "%s\t%s" (first kv) (second kv))))

Then you won't need the atom either.

answered Apr 26, 2012 at 7:28
\$\endgroup\$
0
\$\begingroup\$

Can ouput contain numbers other than 1? Like:

foo 1
foo 3
bar 10

If so, then:

(use '[clojure.string :only [split]])
(def parsed-input
 (for [line (line-seq (java.io.BufferedReader. *in*))
 :let [[k v] (split line #"\t")]]
 {k (Double/parseDouble v)}))
(def table (apply (partial merge-with + {}) parsed-input))
(doseq [[k v] table]
 (println (str k "\t" v)))

Outputs:

bar 10.0
foo 4.0

If it's just 1's frequencies will do as suggested.

answered Jun 10, 2013 at 12:09
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.