HackerRank, Haskell simple "compression" algorithm

Question 1

The question asked is very straight-forward and is simple enough to solve. What I am looking for is that hopefully I can get some understanding for using the constructs and built-ins of the Haskell language itself.

That said, the question is as follows:

Joseph and Jane are making a contest for apes. During the process, they have to communicate frequently with each other. Since they are not completely human, they cannot speak properly. They have to transfer messages using postcards of small sizes.

To save space on the small postcards, they devise a string compression algorithm:

If a character, \$ch\$, occurs \$n(> 1)\$ times in a row, then it will be represented by \$\{ch\}\{n\}\$. where \$\{x\}\$ is the value of \$x\$. For example, if the substring is a sequence of \4ドル\$ 'a' ("aaaa"). it will be represented as "a4".

If a character, \$ch\$, occurs exactly one time in a row, then it will be simply represented as \$\{ch\}\$. For example, if the substring is "a". then it will be represented as "a".

Help Joseph to compress a message. msg.

Input

The only line of input contains a string. msg.

Output

Print the string msg as a compressed message.

Create a messaging system where each word (token) will be compressed based on any consecutive characters.

Single sequenced characters remain the same but multiple identical consecutive characters will be replaced by the character followed by an integer representing the number of times it repeats consecutively.

Example input and output:

Sample Input #00:
> abcaaabbb
Sample Output #00:
> abca3b3
Sample Input #01:
> abcd
Sample Output #01:
> abcd

My code:

-- String representation as a simple Compression Algorithm
--
module Main where
import Text.Printf
compression :: String -> String -> String -> Int -> String
compression input output prevChar count
 | input == "" && output == "" = output
 | prevChar == "" = compression (tail input) output ([head input]) 1
 | input == "" && count > 1 = output ++ prevChar ++ (show count)
 | input == "" && count < 2 = output ++ prevChar
 | prevChar == ([head input]) = compression (tail input) output prevChar (count+1)
 | prevChar /= ([head input]) && count < 2 = compression (tail input) (output ++ prevChar) ([head input]) 1
 | prevChar /= ([head input]) && count > 1 = compression (tail input) (output ++ prevChar ++ (show count)) ([head input]) 1
main :: IO ()
main =
 do
 let s1 = "aaabaaaaccaaaaba"
 printf "Decompressed: %s and Compressed: %s" s1 (compression s1 "" "" 0)

I feel as though some of my Haskell code can be clunky and could use some more features of the language itself as I mentioned earlier.

Now this isn't a full-blown application or even a part of a module for an actual application but just an exercise to improve my working knowledge and skill for writing Haskell.

Question 2

@J_H the text of the description from the image has been added

Question 3

I agree with some of the criticism mentioned in the other review, notably:

You're trying to write a function from String to String, so you should have such a function.

In general, be more eager to break up your work into smaller named pieces.

I'd like to add another point, which is heavy use of existing functions (take, drop, takeWhile, dropWhile, splitAt, etc.) instead of iterating yourself. You can search for functions on hoogle.haskell.org by entering the desired type signature and it spits out all matching functions. In many cases, you will find an existing function that fits you needs. If not, write it yourself, but keep it as a separate helper function.

Also, use pattern matching in you function signature. If you write input == "" && output == "" in the pattern and then compression (tail input) output ([head input]) in the function body, you probably should have deconstructed the list during pattern matching. In my code below, I use input@(a : _) to extract the head and the full list (including the head), but I could also extract the tail if I wanted to (by replacing _ with a variable name). This line automatically rejects empty lists, so execution will fall through to until it finds a matching pattern.

I strongly disagree with the code provided in the other review, which is equally clunky than your original code, if not more. An idiomatic solution to your problem looks more like this:

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where count = length $ takeWhile (== a) input
 remainder = dropWhile (== a) input

If you want to avoid iterating twice, you can use span, which returns a tuple and is equivalent to (takeWhile, dropWhile). The updated code is:

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where count = length prefix
 (prefix, remainder) = span (== a) input

For absolute best performance, you can create a helper function that replaces dropWhile but also reports the number of dropped items. This avoids creating the prefix list and then counting its length.

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where (remainder, count) = dropWhileAndCount (== a) input
dropWhileAndCount :: (a -> Bool) -> [a] -> ([a], Int)
dropWhileAndCount f = go 0
 where go n (x : xs) | f x = go (n + 1) xs
 go n xs = (xs, n)

Question 4

Agreed that this is a much nicer solution to the original coding challenge.

Question 5

take(drop)While was my exact thought process. I was on Hoogle search for them both right before I had originally posted this. I haven't leveraged where nearly enough and this shows it. I'll start to include this instead of creating more recursive cases. I think this is pretty self-evident with my code provided. Thanks again!

Question 6

The first two sections apply to every problem and every programming language.

Use complementary conditions

I recommend against the > 1 and < 2 conditions. When > 1 is one case, the other should be <= 1. How easy is it to spot the mistake in

f x | x < 1 = "foo"
 | x > 2 = "bar"

You would have never written the equivalent

f x | x < 1 = "foo"
 | x >= 3 = "bar"

because it is so obviously wrong. Of course the haskell way of writing it is

f x | x > 1 = "bar"
 | otherwise = "foo"

but I like my code to closely mirror the problem statement, see next section.

Follow the problem statement as closely as possible

This is very pedantic but it helps in larger problems. Your order of conditions is swapped. The question is

Single sequenced characters remain the same

multiple identical consecutive characters will...

but you handle the multiple case first. Ideally, at some place on code, you can write the code the sequence as the problem statement.

encode count char
 | count == 1 = char -- Single sequenced characters remain the same but 
 | count > 1 = -- multiple identical consecutive characters will be replaced
 char -- by the character 
 ++ -- followed by an
 show count -- integer representing the number of times it repeats consecutively

The comments here are clearly overdoing it, but in the real world you know which part is the most important to the customer or where the requirements are most likely to be changed (slightly).

use standard functions

Learn the standard functions. For the compression algorithm, group jumps to my mind:

group :: Eq a => [a] -> [[a]]
> group "abcaaabbb" 
["a","b","c","aaa","bbb"]
compression = concatMap encode' . group where
 encode' x = encode (length x) x
 encode -- see above

Question 7

I enjoy reviewing from past times. I re-read your comment and I absolutely agree, and strongly for that matter, with your post.

Question 8

There are a lot of tools Haskell provides for doing things like this elegantly; I'll try to limit my feedback to boring solutions that will be helpful.

You're trying to write a function from String to String, so you should have such a function. Rename what you've got now as compression', and provide a wrapper.
By convention, try to have "context like" arguments first and "data like" arguments last; this will assist with partial-application of your functions.
Having prevChar :: String isn't good; what if it happened to be "asdf"? Maybe Char would be better.
In general, be more eager to break up your work into smaller named pieces.
The quantity of arguments you're handling, and the quantity of cases you're handling in parallel, should feel like a red flag: you need a better abstraction. You've probably already noticed that Functor won't work for this. With a little experience, you'll start to notice that your current recursion scheme basically is the State monad. I'm using the mtl library for this but we don't need to use the Transformer version; just State will serve.

By the time I got this working, it didn't have much in common with what you wrote, sorry. Hopefully it contains the right number of new things for you to learn!

module Main where
import Control.Monad (forM_, when)
import Control.Monad.State (execState, get, gets, modify, put, State)
import Text.Printf
data CompressorState = CompressorState { output :: String
 , prevChar :: Maybe Char
 , count :: Int
 , input :: String
 } deriving (Show)
type Compressor = State CompressorState
compression' :: Compressor ()
compression' = do
 mChar <- getNext
 case mChar of
 Nothing -> flushCount
 Just char -> do
 previous <- getPrevious
 when (Just char /= previous) $
 do flushCount
 emit char
 setPrevious char
 increment
 compression'
 where getNext = do cs@CompressorState{input = i} <- get
 case i of [] -> return Nothing
 c:i' -> do put cs{input = i'} -- Using record update syntax
 return (Just c)
 getPrevious = gets prevChar
 increment = modify (\cs@CompressorState{count = c} -> cs{count = c + 1})
 flushCount = do c <- gets count
 when (1 < c) $ forM_ (show c) emit
 modify (\cs -> cs{count = 0})
 emit :: Char -> Compressor ()
 emit c = modify (\cs@CompressorState{output = o} -> cs{output = c : o})
 setPrevious :: Char -> Compressor ()
 setPrevious c = modify (\cs -> cs{prevChar = Just c})
compression :: String -> String
compression i = reverse . output $
 execState compression' CompressorState{ output = ""
 , prevChar = Nothing
 , count = 0
 , input = i
 }
examples :: [(String, String)]
examples = [ ("abcaaabbb", "abca3b3")
 , ("abcd", "abcd")
 , ("aaabaaaaccaaaaba", "a3ba4c2a4ba")
 ]
main :: IO ()
main =
 do
 let results = do -- This is in the List Monad!
 (input, reference) <- examples
 let output = compression input
 return (output == reference, input, output, reference)
 forM_ results print

Question 9

I have to be honest, I'm not sure I would have used this exact form but I think that is only natural when it comes to writing in a functional form. That said, I was able to learn a few things just by having gone through what you did here and breaking each part down.

Question 10

I've seen this Monad scheme many places now where case catches Maybe and Nothing/Just

Question 11

All when, put, & get are new to me. I'll stop adding every minor comment but I want to say truly thank you.

Question 12

"I've seen this Monad scheme many places now where case catches Maybe" Just note that case expressions are just expressions, nothing about them is tied to monads. (One extra thing I considered using in my response was the LambdaCase extension, which often makes cases more fun to use.)

Question 13

Of course, I've just seen the do notation (where) is used heavily with case where Monads are being used. I'm not at all familiar enough to speak much more on it or LambdaCase. I'm still picking apart several aspects of your example. It's short but packed with nuance.

Rainer P. 3,11012 silver badges18 bronze badges · Accepted Answer · 2024-08-27 10:44:51Z

I agree with some of the criticism mentioned in the other review, notably:

You're trying to write a function from String to String, so you should have such a function.

In general, be more eager to break up your work into smaller named pieces.

I'd like to add another point, which is heavy use of existing functions (take, drop, takeWhile, dropWhile, splitAt, etc.) instead of iterating yourself. You can search for functions on hoogle.haskell.org by entering the desired type signature and it spits out all matching functions. In many cases, you will find an existing function that fits you needs. If not, write it yourself, but keep it as a separate helper function.

Also, use pattern matching in you function signature. If you write input == "" && output == "" in the pattern and then compression (tail input) output ([head input]) in the function body, you probably should have deconstructed the list during pattern matching. In my code below, I use input@(a : _) to extract the head and the full list (including the head), but I could also extract the tail if I wanted to (by replacing _ with a variable name). This line automatically rejects empty lists, so execution will fall through to until it finds a matching pattern.

I strongly disagree with the code provided in the other review, which is equally clunky than your original code, if not more. An idiomatic solution to your problem looks more like this:

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where count = length $ takeWhile (== a) input
 remainder = dropWhile (== a) input

If you want to avoid iterating twice, you can use span, which returns a tuple and is equivalent to (takeWhile, dropWhile). The updated code is:

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where count = length prefix
 (prefix, remainder) = span (== a) input

For absolute best performance, you can create a helper function that replaces dropWhile but also reports the number of dropped items. This avoids creating the prefix list and then counting its length.

compression :: String -> String
compression "" = ""
compression input@(a : _)
 | count > 1 = a : show count ++ compression remainder
 | otherwise = a : compression remainder
 where (remainder, count) = dropWhileAndCount (== a) input
dropWhileAndCount :: (a -> Bool) -> [a] -> ([a], Int)
dropWhileAndCount f = go 0
 where go n (x : xs) | f x = go (n + 1) xs
 go n xs = (xs, n)

Agreed that this is a much nicer solution to the original coding challenge.
take(drop)While was my exact thought process. I was on Hoogle search for them both right before I had originally posted this. I haven't leveraged where nearly enough and this shows it. I'll start to include this instead of creating more recursive cases. I think this is pretty self-evident with my code provided. Thanks again!

Stack Exchange Network

HackerRank, Haskell simple "compression" algorithm

Input

Output

3 Answers 3

Use complementary conditions

Follow the problem statement as closely as possible

use standard functions

You must log in to answer this question.

Hot Network Questions

HackerRank, Haskell simple "compression" algorithm

Input

Output

3 Answers 3

Use complementary conditions

Follow the problem statement as closely as possible

use standard functions

You must log in to answer this question.

Related

Hot Network Questions