I wanted to do the following:
- Count the frequencies of words in a text (over 5 letters)
- Invert the map of words to frequencies, but group together words that have the same frequency in the inversion.
- Sort the inverted map by keys descending order and take the top 25.
Here is the code I came up with. Did I re-invent the wheel with map-invert-preserve-dups
? Is there a more concise way to do anything I did? Am I doing anything unnecessarily (i.e. (~k)
?
(defn map-invert-preserve-dups
[m]
(reduce
(fn [m [k v]]
(if (contains? m v)
(assoc m v (cons k (get m v)))
(assoc m v `(~k))))
{}
m))
(->> "http://www.weeklyscript.com/Pulp%20Fiction.txt"
(slurp)
(re-seq #"\w{5,}")
(frequencies)
(map-invert-preserve-dups)
(sort)
(reverse)
(take 25))
1 Answer 1
Well, the most obvious fix is indeed map-invert-preserving-dups
- the whole thing could be more easily written as:
(defn map-invert-preserving-dups [m]
(apply merge-with into
(for [[k v] m]
{v [k]})))
The for
expression yields a sequence of maps like [{a [1]} {b [2]} {a [5]}]
. Apply calls merge-with into
on all of those maps. If you look up the definition of merge-with
, you can see that this means basically: "Merge all of these maps together, and if the same key exists twice, with values x
and y
, then make its value (into x y)
".