Filter Duplicate Elements in Haskell

Question 1

I'm working on HackerRank to try to improve my Haskell skills along side with reading Haskell Programming from first principles. I wrote a program that works, but it seems to time out on large input sets. The purpose of the program is

Given a list of n integers a = [a1, a2, ..., an], you have to find those integers which are repeated at least k times. In case no such element exists you have to print -1.

If there are multiple elements in a which are repeated at least k times, then print these elements ordered by their first occurrence in the list.

So I wrote a few different functions to help with this.

`count` which counts the number of occurrences of an element in a list

count :: Eq a => Integral b => a -> [a] -> b
count e [] = 0
count e (a:xs) = (count e xs +) $ if a == e then 1 else 0

`uniq` which removes duplicates from a list

uniq :: Eq a => [a] -> [a] -> [a]
uniq x [] = x 
uniq [] (a:xs) = uniq [a] xs
uniq x (a:xs) = if a `elem` x then uniq x xs else uniq (a:x) xs

`filt` which filters through a list and removes elements that don't occur at least `k` times.

filt :: Show a => Num a => Read a => Eq a => Integral b => [a] -> b -> [a]
filt a b = reverse $ uniq [] [i | i <- a, count i a >= b]

`printList` which prints a list as a space separated list or prints `-1` if the list is empty.

printList :: Show a => [a] -> IO ()
printList [] = putStrLn "-1"
printList a = putStrLn $ unwords [show i | i <- a]

`readNumbers` which takes a space separated string and returns a `Num` list from that string.

readNumbers :: Show a => Eq a => Num a => Read a => String -> [a]
readNumbers = map read . words

`run` which throws all of this together and runs this `n` times.

run :: (Show a, Eq a, Num a, Read a) => a -> IO ()
run 0 = putStr ""
run n = do
 a <- getLine
 b <- getLine
 printList $ filt (readNumbers b) (readNumbers a !! 1)
 run $ n - 1

`main` the main function. It gets a number `n` and then calls `run n`.

main :: IO ()
main = do
 a <- getLine
 run $ read a

This code works, for example, with the input

3
9 2
4 5 2 5 4 3 1 3 4
9 4
4 5 2 5 4 3 1 3 4
10 2
5 4 3 2 1 1 2 3 4 5

and gives the desired output of

4 5 3
-1
5 4 3 2 1

However, with larger datasets this code is incredibly slow. I'm guessing it's because the recursion is less than optimal, but I can't really pinpoint what is taking so long. My best guess is that uniq or count is the limiting factor, but I can't figure out how to optimize them.

Question 2

The primary problem is uniq, count, and filt are all O(n). Take a look at hackage.haskell.org/package/containers and see if there is a data structure that would work better.

Question 3

If you write uniq as a right fold, you don't need to pass an accumulator through, and the list comes out in the right order:

uniq :: Eq a => [a] -> [a]
uniq [] = []
uniq (x:xs) = (if x `elem` xs then id else (x:)) $ uniq xs
filt :: Show a => Num a => Read a => Eq a => Integral b => [a] -> b -> [a]
filt k is = uniq [i | i <- is, count i is >= k]

(Edit: Actually that one throws out the first of each two equal elements, not the last. Here`s one without that problem:

uniq :: Eq a => [a] -> [a]
uniq [] = []
uniq (x:xs) = x : uniq (filter (/=x) xs)

)

You've commendably already brought count into a form that allows it to be written in terms of library combinators:

count :: Eq a => Integral b => a -> [a] -> b
count e = sum . map (\a -> if a == e then 1 else 0)

That's a bit ugly due to lambdas though, here's a nicer version:

count e = length . filter (== e)

For separation of monadic and pure code (and generally for factoring out common code from across cases), here's a showList to replace printList:

showList :: Show a => [a] -> String
showList [] = "-1"
showList a = unwords [show i | i <- a]

Calling a monadic action a given number of times doesn't need manual recursion, and thus also doesn't need to give the repeated action a name:

main :: IO () 
main = do
 a <- readLn
 replicateM_ a $ do
 [_n, k] <- map read . words <$> getLine
 numbers <- map read . words <$> getLine
 putStrLn $ showList $ filt k numbers

(I think readNumbers doesn't deserve a name.)

In case the order in which the output is given isn't important, here's a version that doesn't require quadratic time because each element is compared to every other:

filt k = map head . filter ((>=k) . length) . group . sort

which relies on Data.Lists sort being faster than quadratic time.

Question 4

You are some type of Haskell wizard here. This is an absolutely beautiful piece of code.

Gurkenglas Gurkenglas 3,79413 silver badges17 bronze badges · Accepted Answer · 2016-12-22 01:35:55Z

If you write uniq as a right fold, you don't need to pass an accumulator through, and the list comes out in the right order:

uniq :: Eq a => [a] -> [a]
uniq [] = []
uniq (x:xs) = (if x `elem` xs then id else (x:)) $ uniq xs
filt :: Show a => Num a => Read a => Eq a => Integral b => [a] -> b -> [a]
filt k is = uniq [i | i <- is, count i is >= k]

(Edit: Actually that one throws out the first of each two equal elements, not the last. Here`s one without that problem:

uniq :: Eq a => [a] -> [a]
uniq [] = []
uniq (x:xs) = x : uniq (filter (/=x) xs)

)

You've commendably already brought count into a form that allows it to be written in terms of library combinators:

count :: Eq a => Integral b => a -> [a] -> b
count e = sum . map (\a -> if a == e then 1 else 0)

That's a bit ugly due to lambdas though, here's a nicer version:

count e = length . filter (== e)

For separation of monadic and pure code (and generally for factoring out common code from across cases), here's a showList to replace printList:

showList :: Show a => [a] -> String
showList [] = "-1"
showList a = unwords [show i | i <- a]

Calling a monadic action a given number of times doesn't need manual recursion, and thus also doesn't need to give the repeated action a name:

main :: IO () 
main = do
 a <- readLn
 replicateM_ a $ do
 [_n, k] <- map read . words <$> getLine
 numbers <- map read . words <$> getLine
 putStrLn $ showList $ filt k numbers

(I think readNumbers doesn't deserve a name.)

In case the order in which the output is given isn't important, here's a version that doesn't require quadratic time because each element is compared to every other:

filt k = map head . filter ((>=k) . length) . group . sort

which relies on Data.Lists sort being faster than quadratic time.

You are some type of Haskell wizard here. This is an absolutely beautiful piece of code.

Stack Exchange Network

Filter Duplicate Elements in Haskell

`count` which counts the number of occurrences of an element in a list

`uniq` which removes duplicates from a list

`filt` which filters through a list and removes elements that don't occur at least `k` times.

`printList` which prints a list as a space separated list or prints `-1` if the list is empty.

`readNumbers` which takes a space separated string and returns a `Num` list from that string.

`run` which throws all of this together and runs this `n` times.

`main` the main function. It gets a number `n` and then calls `run n`.

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Filter Duplicate Elements in Haskell

count which counts the number of occurrences of an element in a list

uniq which removes duplicates from a list

filt which filters through a list and removes elements that don't occur at least k times.

printList which prints a list as a space separated list or prints -1 if the list is empty.

readNumbers which takes a space separated string and returns a Num list from that string.

run which throws all of this together and runs this n times.

main the main function. It gets a number n and then calls run n.

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions

`count` which counts the number of occurrences of an element in a list

`uniq` which removes duplicates from a list

`filt` which filters through a list and removes elements that don't occur at least `k` times.

`printList` which prints a list as a space separated list or prints `-1` if the list is empty.

`readNumbers` which takes a space separated string and returns a `Num` list from that string.

`run` which throws all of this together and runs this `n` times.

`main` the main function. It gets a number `n` and then calls `run n`.