The question asked is very straight-forward and is simple enough to solve. What I am looking for is that hopefully I can get some understanding for using the constructs and built-ins of the Haskell language itself.
That said, the question is as follows:
Joseph and Jane are making a contest for apes. During the process, they have to communicate frequently with each other. Since they are not completely human, they cannot speak properly. They have to transfer messages using postcards of small sizes.
To save space on the small postcards, they devise a string compression algorithm:
If a character, \$ch\$, occurs \$n(> 1)\$ times in a row, then it will be represented by \$\{ch\}\{n\}\$. where \$\{x\}\$ is the value of \$x\$. For example, if the substring is a sequence of \4ドル\$ 'a' ("aaaa"). it will be represented as "a4".
If a character, \$ch\$, occurs exactly one time in a row, then it will be simply represented as \$\{ch\}\$. For example, if the substring is "a". then it will be represented as "a".
Help Joseph to compress a message. msg.
Input
The only line of input contains a string. msg.
Output
Print the string msg as a compressed message.
Create a messaging system where each word (token) will be compressed based on any consecutive characters.
Single sequenced characters remain the same but multiple identical consecutive characters will be replaced by the character followed by an integer representing the number of times it repeats consecutively.
Example input and output:
Sample Input #00:
> abcaaabbb
Sample Output #00:
> abca3b3
Sample Input #01:
> abcd
Sample Output #01:
> abcd
My code:
-- String representation as a simple Compression Algorithm
--
module Main where
import Text.Printf
compression :: String -> String -> String -> Int -> String
compression input output prevChar count
| input == "" && output == "" = output
| prevChar == "" = compression (tail input) output ([head input]) 1
| input == "" && count > 1 = output ++ prevChar ++ (show count)
| input == "" && count < 2 = output ++ prevChar
| prevChar == ([head input]) = compression (tail input) output prevChar (count+1)
| prevChar /= ([head input]) && count < 2 = compression (tail input) (output ++ prevChar) ([head input]) 1
| prevChar /= ([head input]) && count > 1 = compression (tail input) (output ++ prevChar ++ (show count)) ([head input]) 1
main :: IO ()
main =
do
let s1 = "aaabaaaaccaaaaba"
printf "Decompressed: %s and Compressed: %s" s1 (compression s1 "" "" 0)
I feel as though some of my Haskell code can be clunky and could use some more features of the language itself as I mentioned earlier.
Now this isn't a full-blown application or even a part of a module for an actual application but just an exercise to improve my working knowledge and skill for writing Haskell.
-
1\$\begingroup\$ @J_H the text of the description from the image has been added \$\endgroup\$Sᴀᴍ Onᴇᴌᴀ– Sᴀᴍ Onᴇᴌᴀ ♦2024年09月13日 15:11:54 +00:00Commented Sep 13, 2024 at 15:11
3 Answers 3
I agree with some of the criticism mentioned in the other review, notably:
- You're trying to write a function from String to String, so you should have such a function.
- In general, be more eager to break up your work into smaller named pieces.
I'd like to add another point, which is heavy use of existing functions (take, drop, takeWhile, dropWhile, splitAt, etc.) instead of iterating yourself. You can search for functions on hoogle.haskell.org by entering the desired type signature and it spits out all matching functions. In many cases, you will find an existing function that fits you needs. If not, write it yourself, but keep it as a separate helper function.
Also, use pattern matching in you function signature. If you write input == "" && output == ""
in the pattern and then compression (tail input) output ([head input])
in the function body, you probably should have deconstructed the list during pattern matching. In my code below, I use input@(a : _)
to extract the head and the full list (including the head), but I could also extract the tail if I wanted to (by replacing _
with a variable name). This line automatically rejects empty lists, so execution will fall through to until it finds a matching pattern.
I strongly disagree with the code provided in the other review, which is equally clunky than your original code, if not more. An idiomatic solution to your problem looks more like this:
compression :: String -> String
compression "" = ""
compression input@(a : _)
| count > 1 = a : show count ++ compression remainder
| otherwise = a : compression remainder
where count = length $ takeWhile (== a) input
remainder = dropWhile (== a) input
If you want to avoid iterating twice, you can use span
, which returns a tuple and is equivalent to (takeWhile, dropWhile)
. The updated code is:
compression :: String -> String
compression "" = ""
compression input@(a : _)
| count > 1 = a : show count ++ compression remainder
| otherwise = a : compression remainder
where count = length prefix
(prefix, remainder) = span (== a) input
For absolute best performance, you can create a helper function that replaces dropWhile
but also reports the number of dropped items. This avoids creating the prefix list and then counting its length.
compression :: String -> String
compression "" = ""
compression input@(a : _)
| count > 1 = a : show count ++ compression remainder
| otherwise = a : compression remainder
where (remainder, count) = dropWhileAndCount (== a) input
dropWhileAndCount :: (a -> Bool) -> [a] -> ([a], Int)
dropWhileAndCount f = go 0
where go n (x : xs) | f x = go (n + 1) xs
go n xs = (xs, n)
-
\$\begingroup\$ Agreed that this is a much nicer solution to the original coding challenge. \$\endgroup\$ShapeOfMatter– ShapeOfMatter2024年08月28日 16:09:33 +00:00Commented Aug 28, 2024 at 16:09
-
\$\begingroup\$
take(drop)While
was my exact thought process. I was on Hoogle search for them both right before I had originally posted this. I haven't leveragedwhere
nearly enough and this shows it. I'll start to include this instead of creating more recursive cases. I think this is pretty self-evident with my code provided. Thanks again! \$\endgroup\$tijko– tijko2024年08月31日 17:42:01 +00:00Commented Aug 31, 2024 at 17:42
The first two sections apply to every problem and every programming language.
Use complementary conditions
I recommend against the > 1
and < 2
conditions. When > 1
is one case, the other should be <= 1
.
How easy is it to spot the mistake in
f x | x < 1 = "foo"
| x > 2 = "bar"
You would have never written the equivalent
f x | x < 1 = "foo"
| x >= 3 = "bar"
because it is so obviously wrong. Of course the haskell way of writing it is
f x | x > 1 = "bar"
| otherwise = "foo"
but I like my code to closely mirror the problem statement, see next section.
Follow the problem statement as closely as possible
This is very pedantic but it helps in larger problems. Your order of conditions is swapped. The question is
- Single sequenced characters remain the same
- multiple identical consecutive characters will...
but you handle the multiple case first. Ideally, at some place on code, you can write the code the sequence as the problem statement.
encode count char
| count == 1 = char -- Single sequenced characters remain the same but
| count > 1 = -- multiple identical consecutive characters will be replaced
char -- by the character
++ -- followed by an
show count -- integer representing the number of times it repeats consecutively
The comments here are clearly overdoing it, but in the real world you know which part is the most important to the customer or where the requirements are most likely to be changed (slightly).
use standard functions
Learn the standard functions. For the compression algorithm, group
jumps to my mind:
group :: Eq a => [a] -> [[a]]
> group "abcaaabbb"
["a","b","c","aaa","bbb"]
compression = concatMap encode' . group where
encode' x = encode (length x) x
encode -- see above
-
\$\begingroup\$ I enjoy reviewing from past times. I re-read your comment and I absolutely agree, and strongly for that matter, with your post. \$\endgroup\$tijko– tijko2025年08月19日 17:29:39 +00:00Commented Aug 19 at 17:29
There are a lot of tools Haskell provides for doing things like this elegantly; I'll try to limit my feedback to boring solutions that will be helpful.
- You're trying to write a function from
String
toString
, so you should have such a function. Rename what you've got now ascompression'
, and provide a wrapper. - By convention, try to have "context like" arguments first and "data like" arguments last; this will assist with partial-application of your functions.
- Having
prevChar :: String
isn't good; what if it happened to be"asdf"
?Maybe Char
would be better. - In general, be more eager to break up your work into smaller named pieces.
- The quantity of arguments you're handling, and the quantity of cases you're handling in parallel, should feel like a red flag: you need a better abstraction. You've probably already noticed that
Functor
won't work for this. With a little experience, you'll start to notice that your current recursion scheme basically is theState
monad. I'm using the mtl library for this but we don't need to use theTransformer
version; justState
will serve.
By the time I got this working, it didn't have much in common with what you wrote, sorry. Hopefully it contains the right number of new things for you to learn!
module Main where
import Control.Monad (forM_, when)
import Control.Monad.State (execState, get, gets, modify, put, State)
import Text.Printf
data CompressorState = CompressorState { output :: String
, prevChar :: Maybe Char
, count :: Int
, input :: String
} deriving (Show)
type Compressor = State CompressorState
compression' :: Compressor ()
compression' = do
mChar <- getNext
case mChar of
Nothing -> flushCount
Just char -> do
previous <- getPrevious
when (Just char /= previous) $
do flushCount
emit char
setPrevious char
increment
compression'
where getNext = do cs@CompressorState{input = i} <- get
case i of [] -> return Nothing
c:i' -> do put cs{input = i'} -- Using record update syntax
return (Just c)
getPrevious = gets prevChar
increment = modify (\cs@CompressorState{count = c} -> cs{count = c + 1})
flushCount = do c <- gets count
when (1 < c) $ forM_ (show c) emit
modify (\cs -> cs{count = 0})
emit :: Char -> Compressor ()
emit c = modify (\cs@CompressorState{output = o} -> cs{output = c : o})
setPrevious :: Char -> Compressor ()
setPrevious c = modify (\cs -> cs{prevChar = Just c})
compression :: String -> String
compression i = reverse . output $
execState compression' CompressorState{ output = ""
, prevChar = Nothing
, count = 0
, input = i
}
examples :: [(String, String)]
examples = [ ("abcaaabbb", "abca3b3")
, ("abcd", "abcd")
, ("aaabaaaaccaaaaba", "a3ba4c2a4ba")
]
main :: IO ()
main =
do
let results = do -- This is in the List Monad!
(input, reference) <- examples
let output = compression input
return (output == reference, input, output, reference)
forM_ results print
-
\$\begingroup\$ I have to be honest, I'm not sure I would have used this exact form but I think that is only natural when it comes to writing in a functional form. That said, I was able to learn a few things just by having gone through what you did here and breaking each part down. \$\endgroup\$tijko– tijko2024年08月22日 13:56:38 +00:00Commented Aug 22, 2024 at 13:56
-
\$\begingroup\$ I've seen this Monad scheme many places now where
case
catchesMaybe
andNothing
/Just
\$\endgroup\$tijko– tijko2024年08月23日 00:11:58 +00:00Commented Aug 23, 2024 at 0:11 -
\$\begingroup\$ All
when
,put
, &get
are new to me. I'll stop adding every minor comment but I want to say truly thank you. \$\endgroup\$tijko– tijko2024年08月23日 01:57:56 +00:00Commented Aug 23, 2024 at 1:57 -
1\$\begingroup\$ "I've seen this Monad scheme many places now where case catches Maybe" Just note that
case
expressions are just expressions, nothing about them is tied to monads. (One extra thing I considered using in my response was the LambdaCase extension, which often makescases
more fun to use.) \$\endgroup\$ShapeOfMatter– ShapeOfMatter2024年08月24日 00:34:56 +00:00Commented Aug 24, 2024 at 0:34 -
\$\begingroup\$ Of course, I've just seen the
do
notation (where
) is used heavily withcase
where Monads are being used. I'm not at all familiar enough to speak much more on it orLambdaCase
. I'm still picking apart several aspects of your example. It's short but packed with nuance. \$\endgroup\$tijko– tijko2024年08月24日 01:14:03 +00:00Commented Aug 24, 2024 at 1:14
Explore related questions
See similar questions with these tags.