RLE-like string compression in clojure

Question 1

There have been already many similar questions to this one (e.g. this and this, just to point two), but I did not find any for clojure.

The goal is to compress a string in a RLE-like way: i.e. each character "run" is replaced by the character and the count of the run. Some examples:

aaaabbccc --> a4b2c3
b --> b1
aaaaaaaaaaaa --> a9a3

Some remarks:

For now, I do not care about the possible optimisation that runs with the length of 1 can omit the count. (E.g. the output could be b instead of b1.)
If there are more than n > 9 characters in a run, then it is split up into two or more runs, like 9 + 9 + 9 + ... + k = n, where k <= 9.
For now, I do not care if the "compressed" string will be longer than the original one (this is the case e.g. for an input containing only single characters, like: abcde --> a1b1c1d1e1).

That said, please see my implementation below:

Source

(defn countrepetitions [s]
 (if (empty? s) 0
 (loop [acts (vec s) ch (first acts) cnt 1]
 (if (= ch (first(rest acts)))
 (recur (rest acts) ch (inc cnt))
 cnt))))
(defn mycompress [s]
 (loop [acts s encoded "" cnt1 (countrepetitions acts)]
 (if (= 0 cnt1)
 encoded
 (let [cnt2 (if (< cnt1 10) cnt1 9)]
 (recur (subs acts cnt2) (str encoded (first acts) cnt2) (countrepetitions (subs acts cnt2)))))))

Tests

(deftest a-test
 (testing "compressor"
 (is (= "" (mycompress "")))
 (is (= "a2" (mycompress "aa")))
 (is (= "a1b8a1c7a3" (mycompress "abbbbbbbbacccccccaaa")))
 (is (= "b9b3" (mycompress "bbbbbbbbbbbb")))))

Questions

In particular, I would be interested in the following, but any other remark/suggestion is also welcome:

Is there a more efficient way (either in terms of readability or speed) to implement countrepetitions, or maybe leave it out completely?
Are there any other tests worth adding? Or can you find a "counter-example" (i.e. a test case, for which the implementation does not work correctly)?

Question 2

You could use partition-by to count consecutive occurrences, instead of the implicit recursion, for example: (map (juxt first count) (partition-by identity "abbbbbbbbacccccccaaa"))

Question 3

@Shlomi thank you, after some experimentation I came up with the following based on your suggestion: (apply str (flatten (map (juxt first count) (apply concat (map #(partition-all 9 %) (partition-by identity s))))))). Definitely shorter then my original attempt :)

Question 4

Whenever you're splitting a list up into a number of chunks, it's worth looking into the partition function (and related functions).

A string is just a list of characters, and you just want to split it when the character changes, so (partition-by identity characters) should be much simpler than what you have above. You can pair that with another partition to make sure you have no chunks of> 9 characters:

(defn rle-chunks [string]
 (->> string
 (partition-by identity)
 (mapcat #(partition-all 9 %))))

Actually performing the compression is then just a case of calling first & count on each of these chunks, then converting back to a string:

(defn rle [input]
 (->> input
 rle-chunks
 (map (juxt first count))
 (apply concat)
 (apply str)))

Question 5

Thank you, I think this much more readable than the solution I came up with based on the suggestion of @Shlomi: it is definitely more readable. Also, I was not aware of mapcat and did not remember the threading macro ->>. Just one question: I think (apply concat) can be replaced with (flatten) in this case, to make it even shorter. What do you think?

Question 6

@Attilio yeah, you're probably right.

obmarg obmarg 6983 silver badges11 bronze badges · Accepted Answer · 2016-01-17 20:48:03Z

Whenever you're splitting a list up into a number of chunks, it's worth looking into the partition function (and related functions).

A string is just a list of characters, and you just want to split it when the character changes, so (partition-by identity characters) should be much simpler than what you have above. You can pair that with another partition to make sure you have no chunks of> 9 characters:

(defn rle-chunks [string]
 (->> string
 (partition-by identity)
 (mapcat #(partition-all 9 %))))

Actually performing the compression is then just a case of calling first & count on each of these chunks, then converting back to a string:

(defn rle [input]
 (->> input
 rle-chunks
 (map (juxt first count))
 (apply concat)
 (apply str)))

Thank you, I think this much more readable than the solution I came up with based on the suggestion of @Shlomi: it is definitely more readable. Also, I was not aware of mapcat and did not remember the threading macro ->>. Just one question: I think (apply concat) can be replaced with (flatten) in this case, to make it even shorter. What do you think?

Stack Exchange Network

RLE-like string compression in clojure

Source

Tests

Questions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

RLE-like string compression in clojure

Source

Tests

Questions

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions