There have been already many similar questions to this one (e.g. this and this, just to point two), but I did not find any for clojure.
The goal is to compress a string in a RLE-like way: i.e. each character "run" is replaced by the character and the count of the run. Some examples:
aaaabbccc --> a4b2c3
b --> b1
aaaaaaaaaaaa --> a9a3
Some remarks:
- For now, I do not care about the possible optimisation that runs with the length of 1 can omit the count. (E.g. the output could be
b
instead ofb1
.) - If there are more than
n > 9
characters in a run, then it is split up into two or more runs, like9 + 9 + 9 + ... + k = n
, wherek <= 9
. - For now, I do not care if the "compressed" string will be longer than the original one (this is the case e.g. for an input containing only single characters, like:
abcde --> a1b1c1d1e1
).
That said, please see my implementation below:
Source
(defn countrepetitions [s]
(if (empty? s) 0
(loop [acts (vec s) ch (first acts) cnt 1]
(if (= ch (first(rest acts)))
(recur (rest acts) ch (inc cnt))
cnt))))
(defn mycompress [s]
(loop [acts s encoded "" cnt1 (countrepetitions acts)]
(if (= 0 cnt1)
encoded
(let [cnt2 (if (< cnt1 10) cnt1 9)]
(recur (subs acts cnt2) (str encoded (first acts) cnt2) (countrepetitions (subs acts cnt2)))))))
Tests
(deftest a-test
(testing "compressor"
(is (= "" (mycompress "")))
(is (= "a2" (mycompress "aa")))
(is (= "a1b8a1c7a3" (mycompress "abbbbbbbbacccccccaaa")))
(is (= "b9b3" (mycompress "bbbbbbbbbbbb")))))
Questions
In particular, I would be interested in the following, but any other remark/suggestion is also welcome:
- Is there a more efficient way (either in terms of readability or speed) to implement
countrepetitions
, or maybe leave it out completely? - Are there any other tests worth adding? Or can you find a "counter-example" (i.e. a test case, for which the implementation does not work correctly)?
1 Answer 1
Whenever you're splitting a list up into a number of chunks, it's worth looking into the partition
function (and related functions).
A string is just a list of characters, and you just want to split it when the character changes, so (partition-by identity characters)
should be much simpler than what you have above. You can pair that with another partition to make sure you have no chunks of> 9 characters:
(defn rle-chunks [string]
(->> string
(partition-by identity)
(mapcat #(partition-all 9 %))))
Actually performing the compression is then just a case of calling first
& count
on each of these chunks, then converting back to a string:
(defn rle [input]
(->> input
rle-chunks
(map (juxt first count))
(apply concat)
(apply str)))
-
\$\begingroup\$ Thank you, I think this much more readable than the solution I came up with based on the suggestion of @Shlomi: it is definitely more readable. Also, I was not aware of
mapcat
and did not remember the threading macro->>
. Just one question: I think(apply concat)
can be replaced with(flatten)
in this case, to make it even shorter. What do you think? \$\endgroup\$Attilio– Attilio2016年01月19日 19:12:51 +00:00Commented Jan 19, 2016 at 19:12 -
\$\begingroup\$ @Attilio yeah, you're probably right. \$\endgroup\$obmarg– obmarg2016年01月19日 19:59:53 +00:00Commented Jan 19, 2016 at 19:59
(map (juxt first count) (partition-by identity "abbbbbbbbacccccccaaa"))
\$\endgroup\$(apply str (flatten (map (juxt first count) (apply concat (map #(partition-all 9 %) (partition-by identity s)))))))
. Definitely shorter then my original attempt :) \$\endgroup\$