Implementation of Set data type in Haskell

Question 1

I'm an advanced beginner Haskell programmer. I implemented a set data type in Haskell and I wanted to find out if I made correct choices when writing it. Would there be a way to improve some of the algorithms like intersection etc.

data SET a = E | Add(a, SET a) deriving (Eq, Ord, Read, Show)
--test data
setx = listToSet[1,2,3,4]
setx2 = listToSet[3,4,2,1]
sety = listToSet[3,4,5,6]
setys = listToSet[3,4]
listToSet :: [a] -> SET a
listToSet [] = E
listToSet (s:x) = Add(s, listToSet x)
inS :: (Eq a) => a -> SET a -> Bool
inS a E = False
inS a (Add(value, set)) = if (value == a)
 then True
 else (inS a set)
nullSET :: (Eq a) => SET a -> Bool
nullSET s = s == E
headSET :: SET a -> a
headSET (Add(v, s)) = v
tailSET :: SET a -> SET a
tailSET (Add(v, s)) = s
addToSet :: (Eq a) => a -> SET a -> SET a
addToSet a (Add(value, set)) = if (inS a (Add(value, set)) == False)
 then Add(a, Add(value, set))
 else (Add(value, set))
setFilter :: (Eq a) => (a -> Bool) -> SET a -> SET a
setFilter f s
 | nullSET s = E
 | f h == False = setFilter f t
 | otherwise = Add(h, setFilter f t)
 where
 h = headSET s
 t = tailSET s
intersection :: (Eq a) => SET a -> SET a -> SET a
intersection s1 s2 = setFilter (\x -> inS x s1) s2
union :: (Eq a) => SET a -> SET a -> SET a 
union set E = set
union set (Add(value2, set2)) = if (inS value2 set == False)
 then Add(value2, (union set set2))
 else union set set2
subSet :: (Eq a) => SET a -> SET a -> Bool
subSet set1 set2
 | set1 == E = True
 | inS h set2 = subSet t set2
 | otherwise = False
 where
 h = headSET set1
 t = tailSET set1
setEq :: (Eq a) => SET a -> SET a -> Bool
setEq s1 s2 = (subSet s1 s2) == (subSet s2 s1)

Question 2

module Set where

Self-contained reusable chunks of code should be organized into modules.

data Set a = Empty | Set a (Set a)
 deriving (Read, Show)

All-caps types are not a thing, just Set is fine. Single letter constructors are uncommon too, so instead of E use a meaningful name like Empty. Constructors can take multiple arguments, so don't use tuples where you don't have to. I'll also point out that this data type you've defined is isomorphic to [a]. Besides the magic square brackets, Haskell lists are a data type like any other, i.e. data [a] = [] | (:) a [a]. Notice the similarity? Is this a situation where you need a new datatype or might it make sense to use a newtype?

-- Test data
setx, setx2, sety, setys :: Set Int
setx = listToSet [1,2,3,4]

All top-level bindings should be given a type signature, get into the habit now and you'll save yourself the frustration of a lot of cryptic misplaced error messages later. These names are all over the place too, they should be more consistent, e.g. exampleSetA, exampleSetB, exampleSetC. Don't crowd together functions and their arguments, leave some space and let it breathe.

fromList :: [a] -> Set a
fromList [] = Empty
fromList (x:xs) = Set x (fromList xs)

fromList is a common name for converting lists to arbitrary datatypes. When it conflicts with other names from modules you have imported, you disambiguate with qualified imports, not a menagerie of globally unique names. (x:xs) is a common naming pattern for lists, that is, some variable x, y, e, or whatever, and then the plural form (xs, ys, es... it's a pun, get it?). Again, space out function application, tuples are a datatype in Haskell, not a method of passing arguments.

member :: (Eq a) => a -> Set a -> Bool
member _ Empty = False
member a (Set x xs) | a == x = True
 | otherwise = member a xs

A common function name for testing set membership is member. When you don't use an argument name in the body of a function, replace it with _. If you compile with -Wall (as you should), GHC will warn you about unused bindings. That is, names that you defined in a function body but didn't use. Often this can help you to notice errors before you ever get to running your code! Use guards (e.g., | a == x) instead of if statements where you can. Guards are like a multiway if, when you have multiple cases it becomes much clearer than chaining ifs, get into the habit now, there's no reason not to.

null :: Set a -> Bool
null Empty = True
null _ = False

Use pattern matching where you can, it's more explicit than equality testing, often faster, and helps you avoid unnecessary constraints. Notice how my version lacks the (Eq a) constraint?

-- Mourn headSet and tailSet here

These functions are meaningless on Sets. Being unordered collections, Sets do not have meaningful heads or tails, so you should not leak implementation details by exposing them.

insert :: (Eq a) => a -> Set a -> Set a
filter :: (a -> Bool) -> Set a -> Set a
intersection :: (Eq a) => Set a -> Set a -> Set a
union :: (Eq a) => Set a -> Set a -> Set a
isSubset :: (Eq a) => Set a -> Set a -> Bool

Around now you're starting to feel the pain of your choice of Set representation. Implementations are slow and unwieldy. For instance, your version of intersection is an \$O(n^2)\$ operation. You can improve running times by utilizing any of a number of standard algorithms, try using balanced binary trees to represent your sets instead of linked lists.

instance (Eq a) => Eq (Set a) where

Instead of making up another name, write your own instances for standard functions like equality. If the derived instance doesn't do what you want, don't use it, don't keep it around (because you'll be unable to keep it from being exported to users of your library), and don't let it clutter up good namespace.

R B R B 1,0337 silver badges9 bronze badges · Accepted Answer · 2015-07-24 08:07:28Z

module Set where

Self-contained reusable chunks of code should be organized into modules.

data Set a = Empty | Set a (Set a)
 deriving (Read, Show)

All-caps types are not a thing, just Set is fine. Single letter constructors are uncommon too, so instead of E use a meaningful name like Empty. Constructors can take multiple arguments, so don't use tuples where you don't have to. I'll also point out that this data type you've defined is isomorphic to [a]. Besides the magic square brackets, Haskell lists are a data type like any other, i.e. data [a] = [] | (:) a [a]. Notice the similarity? Is this a situation where you need a new datatype or might it make sense to use a newtype?

-- Test data
setx, setx2, sety, setys :: Set Int
setx = listToSet [1,2,3,4]

All top-level bindings should be given a type signature, get into the habit now and you'll save yourself the frustration of a lot of cryptic misplaced error messages later. These names are all over the place too, they should be more consistent, e.g. exampleSetA, exampleSetB, exampleSetC. Don't crowd together functions and their arguments, leave some space and let it breathe.

fromList :: [a] -> Set a
fromList [] = Empty
fromList (x:xs) = Set x (fromList xs)

fromList is a common name for converting lists to arbitrary datatypes. When it conflicts with other names from modules you have imported, you disambiguate with qualified imports, not a menagerie of globally unique names. (x:xs) is a common naming pattern for lists, that is, some variable x, y, e, or whatever, and then the plural form (xs, ys, es... it's a pun, get it?). Again, space out function application, tuples are a datatype in Haskell, not a method of passing arguments.

member :: (Eq a) => a -> Set a -> Bool
member _ Empty = False
member a (Set x xs) | a == x = True
 | otherwise = member a xs

A common function name for testing set membership is member. When you don't use an argument name in the body of a function, replace it with _. If you compile with -Wall (as you should), GHC will warn you about unused bindings. That is, names that you defined in a function body but didn't use. Often this can help you to notice errors before you ever get to running your code! Use guards (e.g., | a == x) instead of if statements where you can. Guards are like a multiway if, when you have multiple cases it becomes much clearer than chaining ifs, get into the habit now, there's no reason not to.

null :: Set a -> Bool
null Empty = True
null _ = False

Use pattern matching where you can, it's more explicit than equality testing, often faster, and helps you avoid unnecessary constraints. Notice how my version lacks the (Eq a) constraint?

-- Mourn headSet and tailSet here

These functions are meaningless on Sets. Being unordered collections, Sets do not have meaningful heads or tails, so you should not leak implementation details by exposing them.

insert :: (Eq a) => a -> Set a -> Set a
filter :: (a -> Bool) -> Set a -> Set a
intersection :: (Eq a) => Set a -> Set a -> Set a
union :: (Eq a) => Set a -> Set a -> Set a
isSubset :: (Eq a) => Set a -> Set a -> Bool

Around now you're starting to feel the pain of your choice of Set representation. Implementations are slow and unwieldy. For instance, your version of intersection is an \$O(n^2)\$ operation. You can improve running times by utilizing any of a number of standard algorithms, try using balanced binary trees to represent your sets instead of linked lists.

instance (Eq a) => Eq (Set a) where

Instead of making up another name, write your own instances for standard functions like equality. If the derived instance doesn't do what you want, don't use it, don't keep it around (because you'll be unable to keep it from being exported to users of your library), and don't let it clutter up good namespace.

Stack Exchange Network

Implementation of Set data type in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Implementation of Set data type in Haskell

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions