This is my first attempt at learning binary trees. I'm looking for general feedback on how it could be made cleaner, more idiomatic, or more efficient.
data Tree a = ETree | Node (Tree a) (Tree a) a deriving (Show)
tInsert :: Ord o => o -> Tree o -> Tree o
tInsert x ETree = Node ETree ETree x
tInsert x (Node lT rT a)
| x < a = Node (tInsert x lT) rT a
| x > a = Node lT (tInsert x rT) a
| otherwise = Node lT rT x
-- Folds a list into a tree
tFromListL, tFromListR :: Ord o => [o] -> Tree o
tFromListL = foldl (flip tInsert) ETree
tFromListR = foldr tInsert ETree
-- Turns a tree into a list
tToList :: Ord o => Tree o -> [o]
tToList ETree = []
tToList (Node lT rT a) = (tToList lT) ++ [a] ++ (tToList rT)
-- Splits a list roughly in half (as part of balancing)
splitInHalf :: [a] -> ([a],[a])
splitInHalf xs = splitAt (round $ (fromIntegral $ length xs) / 2.0) xs
-- Returns how unbalanced a node is
tUnbalancedBy :: Tree a -> Int
tUnbalancedBy ETree = 0
tUnbalancedBy (Node lT rT _) = absDiff (tDepth lT) (tDepth rT)
-- Arranges a list in such a way that it forms a more balanced tree
balanceList :: [a] -> [a]
balanceList xs = let (fH,sH) = splitInHalf xs in (reverse fH) ++ sH
-- "Inefficient balance"
tIneffBalance :: Ord o => Tree o -> Tree o
tIneffBalance = tFromListL . balanceList . tToList
-- Finds the min/max values of a tree
tMin, tMax :: Ord o => Tree o -> o
tMin ETree = error "tMin called on an Empty Tree"
tMin (Node lT _ a) = case lT of
ETree -> a
(Node lT' _ _) -> tMin lT'
tMax ETree = error "tMax called on an Empty Tree"
tMax (Node _ rT a) = case rT of
ETree -> a
(Node _ rT' _) -> tMax rT'
-- Find the max depth of a tree
tDepth :: Tree a -> Int
tDepth ETree = 0
tDepth (Node lT rT _) = 1 + max (tDepth lT) (tDepth rT)
-- Finds how many nodes a tree contains
tSize :: Tree a -> Int
tSize ETree = 0
tSize (Node lT rT _) = 1 + (tSize lT) + (tSize rT)
absDiff :: Int -> Int -> Int
absDiff x y = abs $ x - y
-- Checks if a node is balanced
tIsBalanced :: Tree a -> Bool
tIsBalanced ETree = True
tIsBalanced n
| tUnbalancedBy n > 2 = False
| otherwise = True
-- Checks if a value is an element of the tree
tElem :: Ord o => o -> Tree o -> Bool
tElem x ETree = False
tElem x (Node lT rT a)
| x < a = tElem x lT
| x > a = tElem x rT
| otherwise = True
tDelete :: Ord o => o -> Tree o -> Tree o
tDelete _ ETree = ETree
tDelete _ n@(Node ETree ETree _) = n -- Or give "Not found" error?
tDelete tD n@(Node lT rT a)
| tD < a = Node (tDelete tD lT) rT a
| tD > a = Node lT (tDelete tD rT) a
| otherwise = case (lT,rT) of
(ETree,t) -> t
(t,ETree) -> t
(t,t') -> let fMin = tMin t' in Node t (tDelete (tMin t') t') fMin
My main concerns are:
- The balancing algorithm. I thought it would work beautifully at first, then I realized that it basically just turns the tree into a "V". It's more efficient then a linked list, but still not very balanced
- The 2 "tToList*" functions. Should I use foldl or foldr? They seem to act fairly equivalently; although foldl is easier to use because you typically read a list from the left
- The "tDelete" function. It works (from my testing), but it looks very ugly. I basically just wrote it case-by-case, which made it very long.
1 Answer 1
Function Visibility
In the code there's some functions that seem to be intended for internal use only. To hide these from external modules you should have a "proper" module declaration that exports those functions you want to make "public".
module MyTree ( Tree(ETree, Node), tInsert, tFromListL, tFromListR, ... ) where
this allows you to hide (and change) implementation details of your exposed function.
Balancing Your Tree
Your algorithm to balance that tree is something along the lines of \$\mathcal{O}(3 * n)\$. That's bad compared to an almost constant time-effort when balancing in tree form. Then again why are you even exposing this functionality?
If you allow a tree to be unbalanced, you shouldn't expose an operation to balance it and vice versa. You should strongly consider extracting balanced behaviour into either a separate type and using instance
to expose a common "interface" or extract this into a completely separate module.
Something like this:
class (Ord a) => Tree a where
tInsert, tDelete :: a -> Tree a -> Tree a
tElem :: a -> Tree a -> Bool
tIsBalanced :: Tree a -> Bool
tSize, tDepth :: Tree a -> Int
tMin, tMax :: Tree a -> a
tToList :: Tree a -> [a]
data BalancedTree a = Empty | Node (BalancedTree a) (BalancedTree a) a deriving (Show)
instance NormalTree BalancedTree where
-- | because we are an instance of NormalTree we can access tInsert'
tInsert = balance . tInsert'
-- and so on and so forth
data NormalTree a = Empty | Node (Tree a) (Tree a) a deriving (Show)
instance Tree NormalTree where
tInsert = tInsert'
-- | default implementation for insert without balancing
tInsert' :: a -> Tree a -> Tree a
Note that I haven't even used this myself and I'm not sure whether this works. The code should give you a gist as to how this could / should work and isn't even intended to be syntactically correct.
This defines the "interface" of Tree in a class
and then uses data
types to implement said "interface".
Errors In Expected Circumstances
When you call tMax
or tMin
respectively on an ETree
(which would be better named as Empty
) an error
is raised. That's an IMO really ugly and declarative way of solving this. It'd be preferrable to have a pure function. This requires a small change to the type-signature:
tMin, tMax :: Ord o => Tree o -> Maybe o
Which in turn allows you to do the following:
tMin Empty = Nothing
tMin (Node lT _ a) = case lT of
Empty -> a
(Node lT' _ _) -> tMin lT'
This function is total and pure. As such it's much easier and cleaner to use.