Learn You a Haskell shows the union
function:
union also acts like a function on sets. It returns the union of two lists. It pretty much goes over every element in the second list and appends it to the first one if it isn't already in yet. Watch out though, duplicates are removed from the second list!
Example:
ghci> [1..7] `union` [5..10]
[1,2,3,4,5,6,7,8,9,10]
Please critique my implementation.
import Data.List (nub)
union' :: (Eq a) => [a] -> [a] -> [a]
union' xs ys = nub (xs ++ ys)
This 1-line implementation seems easy to understand, however I'm concerned at its bad/mediocre performance. Consider that I'm appending xs
to ys
, and then performing nub
on the whole list.
1 Answer 1
The formatting of your code is broken, so I've reproduced it here for clarity.
union' :: (Eq a) => [a] -> [a] -> [a]
union' xs ys = xs ++ union'' ys xs
where union'' [] _ = []
union'' (a:as) first = if (a `elem` first) then union'' as first else a : union'' as (a:first)
Your code is correct, but can be rewritten to exhibit better style.
module MyUnion where
import qualified Data.List as List (union) -- For testing with QuickCheck
union :: (Eq a) => [a] -> [a] -> [a]
union xs ys = xs ++ union' ys xs
where union' [] skips = []
union' (y':ys') skips | y' `elem` skips = union' ys' skips
| otherwise = y' : union' ys' (y':skips)
-- ^ Guards instead of ifs
-- ^ Aligned for readability
prop_equal xs ys = union xs ys == List.union xs ys -- QuickCheck-able property
Good Haskell style also involves using higher-level functions whenever other considerations don't require otherwise (e.g., performance, comprehensibility). Here's my version of union
, note that it looks much more like your version from before the edits.
union xs ys = xs ++ filter (`notElem` xs) (nub ys)
For reference here is the definition of unionBy
from base
(union
is defined as unionBy (==)
).
unionBy :: (a -> a -> Bool) -> [a] -> [a] -> [a]
unionBy eq xs ys = xs ++ foldl (flip (deleteBy eq)) (nubBy eq ys) xs
union
's behavior is weird. The key is in that last sentence of the quote, what it elides is that duplicates are not removed from the first list. Unless you change data structures or impose anOrd
constraint, the performance won't get any better either. \$\endgroup\$