As an answer to excersise 2.3 in Chris Okasaki's book, Purely Functional Data Structures,
Inserting an existing element into a binary search tree copies the entire search path even though the copied nodes are indistinguishable from the originals. Rewrite insert using exceptions to avoid this copying. Establish one handler per insertion rather than one handler per iteration.
I've written the following haskell implementations of insertion in a binary heap:
import Data.Maybe
data Tree e = Leaf | T (Tree e) e (Tree e) deriving Show
empty = Leaf
insert x Leaf = T Leaf x Leaf
insert x tree@(T l e r)
| x == e = tree
| x < e = T (insert x l) e r
| otherwise = T l e (insert x r)
ins x t =
t >>= \t -> case t of
Leaf -> Just $ T Leaf x Leaf
tree@(T l e r)
| x == e -> Nothing
| x < e -> (ins x (Just l)) >>= \nt -> Just $ T nt e r
| ot herwise -> (ins x (Just r)) >>= \nt -> Just $ T l e nt
fastInsert x t =
fromMaybe t $ ins x $ Just t
The fastInsert
function (using ins
as a 'helper') and the insert
function will return the same tree given the same input, but fastInsert
wont copy sub-trees if no insertion is needed (if the tree already contains the node to be inserted).
The question is: how would this look if written more idiomatically, with regard to monadic operations/syntax in ins
?
Another question: I'm using monads as a substitute for exceptions as suggested in the book. The point is to avoid rebuilding (reference copying) the tree when returning in the recursive calls. How should I have gone about this?
-
1\$\begingroup\$ I would use less indentation in the case statement (personal choice). I've found that aligning everything as close to the left as you can (with sufficient indentation) is neater. Including the type signatures would probably also help cleanliness. And I just made a near full Map implementation based on a binary tree, and I didn't use Monad at all. If you're using them just to play around, then OK, but they're probably unnecessary here in the first place. \$\endgroup\$Carcigenicate– Carcigenicate2014年10月05日 14:05:21 +00:00Commented Oct 5, 2014 at 14:05
-
1\$\begingroup\$ Noted about type annotations and indentation. Thanks. Monads here actually serve a function performance wise. \$\endgroup\$Bladt– Bladt2014年10月11日 09:46:14 +00:00Commented Oct 11, 2014 at 9:46
2 Answers 2
I agree w/ Petr's answer for all essential parts.
I will show a more step-by-step refactoring.
In your ins
:
ins x t =
t >>= \t -> case t of
-- omitted
tree@(T l e r)
t
stands for two different variables of two different types. You do not need to name the second matched case, if you are not going to use that name. I re-scan an expression when I see an unused name, to see if I missed anything, which impedes readability.- Even if you needed to refer to
T l e r
, it already had a namet
, giving it a second name is also confusing.
Now the reason there seems to be to much indentation is that there is too much nesting. That huge anonymous lambda is the second argument to the >>=
. That function deserves it's own name:
ins x t = t >>= ins'
where
ins' Leaf = Just $ T Leaf x Leaf
ins' (T l e r)
| x == e = Nothing
| x < e = (ins x (Just l)) >>= \nt -> Just $ T nt e r
| otherwise = (ins x (Just r)) >>= \nt -> Just $ T l e nt
Indentation is already easier on the eyes. Also vertically aligning symmetric cases helps understanding.
It is customary to rename variables x'
if it stands for a modified version of a variable x
. Let's rename variables and remove unnecessary parens:
ins' (T l e r)
| x == e = Nothing
| x < e = ins x (Just l) >>= \l' -> Just $ T l' e r
| otherwise = ins x (Just r) >>= \r' -> Just $ T l e r'
Now it is easier to make ins'
recursive, instead of ins
and ins'
co-recursive, as ins
isn't doing much (ins x (Just l)
⊢ (Just l) >>= ins'
⊢ ins' l
):
ins' (T l e r)
| x == e = Nothing
| x < e = ins' l >>= \l' -> Just $ T l' e r
| otherwise = ins' r >>= \r' -> Just $ T l e r'
The question is: how would this look if written more idiomatically, with regard to monadic operations/syntax in ins?
The above function isn't terribly complicated, and can be left alone.
If there are more intermediate results, do
notation may be indicated:
ins' (T l e r)
| x == e = Nothing
| x < e = do { l' <- ins' l; return (T l' e r) }
| otherwise = do { r' <- ins' r; return (T l e r') }
Verbosity may be further reduced, using monad comprehension:
{-# LANGUAGE MonadComprehensions #-}
....
ins' (T l e r)
| x == e = Nothing
| x < e = [ T l' e r | l' <- ins' l]
| otherwise = [ T l e r' | r' <- ins' r]
Now doing further refactorings, s.a. those suggested by Petr, are easier to see and make.
Lambda lift x
in ins'
:
ins x t = t >>= (ins' x)
where
ins' x Leaf = ....
Move ins'
top-level, inline ins
in fastInsert
, (ins x $ Just t
⊢ (Just l) >>= (ins' x)
⊢ ins' x l
):
fastInsert x t = fromMaybe t $ ins' x t
After inlining ins
, it can be deleted; and ins'
can be renamed to ins
.
Side note
If we now look at ins
it doesn't depend on Maybe
specifically. We can re-write it for any MonadPlus
instead.
import Control.Monad
ins :: (Ord e, MonadPlus m) => e -> Tree e -> m (Tree e)
ins x Leaf = return $ T Leaf x Leaf
ins x (T l e r) = case compare x e of -- let's do case as Petr said this time
EQ -> mzero
LT -> [ T l' e r | l' <- ins x l]
GT -> [ T l e r' | r' <- ins x r]
Both of these, now, work:
foldM (flip ins) Leaf [1 .. 10] :: [Tree Integer]
foldM (flip ins) Leaf [1 .. 10] :: Maybe (Tree Integer)
This can also be written like this:
ins x (T l e r) = mplus
[ T l' e r | x < e, l' <- ins x l]
[ T l e r' | x > e, r' <- ins x r]
EDIT
Another question: I'm using monads as a substitute for exceptions as suggested in the book.
Maybe
is what to use to implement an operation that may or may not succeed.
The point is to avoid rebuilding (reference copying) the tree when returning in the recursive calls.
It won't allocate a new path of nodes as asked in the exercise.
How should I have gone about this?
My knowledge of Haskell is elementary, so I cannot say that there is no better way; but I do not know of any.
-
\$\begingroup\$ Awesome step-by-step of the refactorization, Thanks! Is there anything you can say about the "error handling" currently done by the
Maybe
monad? - avoiding to copy the tree when inserting an existing element? \$\endgroup\$Bladt– Bladt2014年10月13日 18:23:49 +00:00Commented Oct 13, 2014 at 18:23 -
\$\begingroup\$ @Bladt I edited in a few words. \$\endgroup\$abuzittin gillifirca– abuzittin gillifirca2014年10月14日 07:12:04 +00:00Commented Oct 14, 2014 at 7:12
There are couple of points that I'd suggest:
- Be sure to always include types for top-level functions. Without them, the code becomes very quickly unreadable. For example, would you know what
ins
does and what is its type, if you didn't write it yourself? In many cases you can immediately judge what a function does just looking at its name and its type. If you do
| x < e = ... | x == y = ... | otherwise = ...
you're performing the comparison twice. In most cases this shouldn't be a problem, but if writing
case compare x e of LT -> ... EQ -> ... GT -> ...
will compare only once, and also allows the compiler to do more optimizations.
- This isn't a problem, just a note: Be aware that
fastInsert
trades speed for lazyiness. If you examine the root of the tree returned by afastInsert
call, the tree must be examined to check if it contains the inserted element or not. Under most circumstances you wouldn't mind that. - There is no need for
ins
to takeMaybe (Tree e)
as the argument, only to apply>>=
on it. Most monadic functions take pure arguments and return a monadic one. - It's often convenient to use functions from
Control.Applicative
to lift pure functions/values to monadic computations. In particularpure
,<$>
and<*>
. See also Functor, Applicative and Monad.
I wasn't sure what error handling you have in mind, as these operations should never fail.
The updated program:
import Control.Applicative
import Data.Maybe
data Tree e = Leaf | T (Tree e) e (Tree e) deriving Show
empty = Leaf
insert :: (Ord e) => e -> Tree e -> Tree e
insert x Leaf = T Leaf x Leaf
insert x tree@(T l e r) = case compare x e of
EQ -> tree
LT -> T (insert x l) e r
GT -> T l e (insert x r)
ins :: (Ord e) => e -> Tree e -> Maybe (Tree e)
ins x Leaf = return $ T Leaf x Leaf
ins x tree@(T l e r) = case compare x e of
EQ -> Nothing
LT -> (\l' -> T l' e r) <$> ins x l
-- you could also write a point-free version, if you like,
-- which better preserves the order of arguments
-- LT -> T <$> ins x l <*> pure e <*> pure r
GT -> T l e <$> ins x r
fastInsert :: (Ord e) => e -> Tree e -> Tree e
fastInsert x t =
fromMaybe t $ ins x t
-
\$\begingroup\$ Thank you for your answer. I should have explained what I meant by "error handling" better, sorry. I've updated my question. I'm afraid I don't understand point 3. What do you mean I have to examine the returned tree to see the node has been inserted? \$\endgroup\$Bladt– Bladt2014年10月11日 09:52:13 +00:00Commented Oct 11, 2014 at 9:52