I wrote a function, which takes a list of Integers, and then makes a balanced binary tree.
Please critique it. I'd like to know know how to make it more concise and idiomatic.
-- Note that the Integer represents the height of the Tree
data Tree a = Leaf
| Node Integer (Tree a) a (Tree a)
deriving (Show)-- where Integer is the height (bottom = 0)
-- make balanced, binary tree (height diff is <= 1)
-- Note - I would've liked to have kept `foldTree` point-free,
-- but I'm not sure how to do that since I need `xs` for `treeHeight`
foldTree :: [a] -> Tree a
foldTree xs = (foldingFn . zip [0..]) xs
where foldingFn = foldr (\(i, elem) acc -> if (odd i) then insertFreeOrLeft treeHeight elem acc
else insertFreeOrRight treeHeight elem acc) Leaf
treeHeight = getBinTreeHt xs
-- get Binary Tree Height (used to start making the Tree)
getBinTreeHt :: [a] -> Integer
getBinTreeHt = floor . (logBase 2) . fromIntegral . length
-- insert where there's a Leaf, otherwise choose Left
insertFreeOrLeft :: Integer -> a -> Tree a -> Tree a
insertFreeOrLeft index x Leaf = Node index Leaf x Leaf
insertFreeOrLeft _ x (Node level Leaf val right) = Node level (Node (level-1) Leaf x Leaf) val right
insertFreeOrLeft _ x (Node level left val Leaf) = Node level left val (Node (level-1) Leaf x Leaf)
insertFreeOrLeft _ x (Node level left val right) = Node level (insertFreeOrLeft (level-1) x left) val right
-- insert where there's a Leaf, otherwise choose Right
insertFreeOrRight :: Integer -> a -> Tree a -> Tree a
insertFreeOrRight index x Leaf = Node index Leaf x Leaf
insertFreeOrRight _ x (Node level left val Leaf) = Node level left val (Node (level-1) Leaf x Leaf)
insertFreeOrRight _ x (Node level Leaf val right) = Node level (Node (level-1) Leaf x Leaf) val right
insertFreeOrRight _ x (Node level left val right) = Node level left val (insertFreeOrRight (level-1) x right)
Testing
*Main> foldTree "ABC"
Node 1 (Node 0 Leaf 'B' Leaf) 'C' (Node 0 Leaf 'A' Leaf)
*Main> foldTree "ABCDE"
Node 2 (Node 1 (Node 0 Leaf 'B' Leaf) 'D' Leaf) 'E' (Node 1 Leaf 'C' (Node 0 Leaf 'A' Leaf))
1 Answer 1
From the design perspective, I see these possible conceptual problems:
- The data structure doesn't help in determining if the required invariant holds or not.
Tree
holds information that is outside of its scope - its level in the tree. This means, among other things, that a node can't be shared between multiple trees, which is something you want when manipulating trees (a new tree is just a slightly modified version of an old one, sharing most of its nodes).- Your design focuses only on creating a tree from a list. If you don't need any other operations, that's fine, but if you want to do more operations later, like insertion/deletion, merging trees, etc., it's going to be problematic. In particular, for some operations, when the total height of the tree must change, you'll have to update the whole tree. Most likely, it'd be possible to find an arbitrarily long sequence of operations, each taking O(n) time (where n is the number of nodes in the tree).
All these problems relate to having the node's level in Tree
, which is currently sort-of unused. If instead you kept there the height of the subtree rooted at a node, you would be able to:
- Easily check that a function keeps the required invariant when creating/manipulating nodes.
- Each
Tree
value would be self-contained, depending only on its own subtree, thus can be shared within multiple trees. - Following from 2., you can implement the operations on trees in such a way that each takes just O(log n).
Also I'd suggest you to have a look at various balanced tree implementations (unless you want to explore it yourself, which is a good exercise).
On the other hand, if your aim is only to have a balanced tree, without the need of modifying it later, you could somewhat relax your requirement, and just require that the height of the tree is at most 1+log{2} n. Then you'd be able to create the tree incrementally, without knowing the total length of the input list: at a particular point, the algorithm tries to fill a node of height h. If it's full and there are more incoming data, it starts to build another node of height h+1. So the left subtree will be always a full binary tree.
From the syntactical point of view, it's very good that you annotate all functions with their types, the only thing I'd improve is to have a fixed line length limit to increase readability.
foldTree
returns linear depth. Just runfoldTree [0 .. 15]
, you will see negative heights. \$\endgroup\$foldTree [0 .. 15]
. Output contains these nodes :(Node 1 Leaf 6 Leaf)
(Node (-1) Leaf 0 Leaf)
clearly height difference between these leaves is (1 - (-1)) = 2. (>1, hence unbalanced). \$\endgroup\$