= Outcoding UNIX geniuses =
Static types prevent disasters by catching bugs at compile time.
Yet many languages have self-defeating types. Type annotation can be so
laborious that weary programmers give up and switch to unsafe languages.
Adding insult to injury, some of these prolix type systems are simultaneously
inexpressive. Some lack
https://en.wikipedia.org/wiki/Parametric_polymorphism[parametric polymorphism],
forcing programmers to duplicate code or subvert static typing with casts.
A purist might escape this dilemma by writing code to generate code, but
this leads to another set of problems.
Type theory shows how to avoid these pitfalls, but many programmers seem
unaware:
 * Popular authors Bruce Eckel and Robert C. Martin mistakenly believe
 https://docs.google.com/document/d/1aXs1tpwzPjW9MdsG5dI7clNFyYayFBkcXwRDo-qvbIk/preview[strong typing implies verbosity], and worse still, https://blog.cleancoder.com/uncle-bob/2017/01/11/TheDarkPath.html[testing conquers all].
 Tests are undoubtedly invaluable, but at best they
 https://en.wikipedia.org/wiki/Proof_by_example["prove" by example].
 Testing can be used
 https://www.cs.utexas.edu/~EWD/transcriptions/EWD03xx/EWD303.html["...to show
 the presence of bugs, but never to show their absence"]. One could even argue
 a test-heavy approach helps attackers find exploits, as untested cases may
 hint at overlooked bugs. Instead of testing, we need strong static types so we
 can prove correctness of programs.
 * The designers of the Go language, including famed former Bell Labs
 researchers, have been stumped by polymorphism for years. To be fair,
 parametric polymorphism is only half the story. The other half is ad hoc
 polymorphism, which Haskell researchers only neatly solved in the late 1980s
 with type classes. Practical Haskell compilers also need more type trickery
 for unboxing.
Why is this so? Perhaps people think the theory is arcane, dry, and
impractical?
By working through some programming interview questions, we'll find the
relevant theory is surprisingly accessible. We quickly arrive at a simple _type
inference_ or _type reconstruction_ algorithm that seems too good to be true:
it powers strongly typed languages that support parametric polymorphism
without requiring any type declarations.
We'll write code that figures out the function:
----------------------------------------------------------------
\x y z -> x z(y z)
----------------------------------------------------------------
has type:
----------------------------------------------------------------
(a -> b -> c) -> (a -> b) -> a -> c
----------------------------------------------------------------
And given that `+++(+)+++` has type `+Int -> Int -> Int+`, our code determines
that the function:
----------------------------------------------------------------
\x -> (+) (x 42)
----------------------------------------------------------------
has type:
----------------------------------------------------------------
(Int -> Int) -> Int -> Int
----------------------------------------------------------------
== 1. Identifying twins ==
Determine if two binary trees storing integers in their leaf nodes are equal.
*Solution:* Two words: `deriving Eq`.
------------------------------------------------------------------------------
data Tree a = Leaf a &#124; Branch (Tree a) (Tree a) deriving Eq
------------------------------------------------------------------------------
https://www.haskell.org/onlinereport/derived.html[Haskell's derived instance
feature] automatically works on any algebraic data type built on any type for
which equality makes sense. It even works for mutually recursive data types
(https://hackage.haskell.org/package/containers/docs/Data-Tree.html[see
`Data.Tree`]):
------------------------------------------------------------------------------
data Tree a = Node a (Forest a) deriving Eq
data Forest a = Forest [Tree a] deriving Eq
------------------------------------------------------------------------------
Perhaps my interviewer would ask me to explain `deriving Eq` does.
Roughly speaking, it generates code like the following, saving the
programmer from stating the obvious:
------------------------------------------------------------------------------
data Tree a = Leaf a &#124; Branch (Tree a) (Tree a)
eq (Leaf x) (Leaf y) = x == y
eq (Branch xl xr) (Branch yl yr) = eq xl yl && eq xr yr
eq _ _ = False
------------------------------------------------------------------------------
== 2. On assignment ==
This time, the first tree may contain variables in place of integers.
Can we _unify_ the trees, that is, assign integers to all variables so the
trees are equal? The same variable may appear more than once.
*Solution:* We extend our data structure to hold variables:
\begin{code}
data Tree a = Var String &#124; Leaf a &#124; Tree a :. Tree a deriving Show
\end{code}
As before, we traverse both trees and look for nodes that differ in value
or type. If the first is a variable, then we record a constraint, that is, a
variable assignment that is required for the trees to be equal such as `a = 4`
or `b = 2`. If there are conflicting values for the same variable, then we
indicate failure by returning `Nothing`. Otherwise we return `Just` the list of
assignments found.
\begin{code}
solve1 = go [] where
 go as = \cases
 (xl :. xr) (yl :. yr) -> do
 as0 <- go as xl yl go as0 xr yr (Leaf x) (Leaf y) &#124; x == y -> Just as
 &#124; otherwise -> Nothing
 (Var v) (Leaf x) -> case lookup v as of
 Nothing -> Just $ (v, x):as
 Just x' &#124; x == x' -> Just as
 &#124; otherwise -> Nothing
 _ _ -> Nothing
solve1 (Leaf 1) (Leaf 2)
solve1 (Leaf 3) (Leaf 3)
solve1 (Leaf 4) (Var "x")
solve1 (Var "x" :. Var "y") (Leaf 5 :. Leaf 6)
solve1 (Var "x" :. Var "x") (Leaf 5 :. Leaf 6)
\end{code}
== 3. Both Sides, Now ==
Now suppose leaf nodes in both trees can hold integer variables. Can two
trees be made equal by assigning certain integers to the variables? If so, find
the most general solution.
*Solution:* We proceed as before, but now we may encounter constraints such as
`a = b`, which equate two variables. To handle such a constraint, we pick one
of the variables, such as `a`, and replace all occurrences of `a` with the
other side, which is `b` in our example. This eliminates `a` from all
constraints. Eventually, all our constraints have an integer on at least one
side, which we check for consistency.
We write a function that substitutes a given tree for a given variable. This is
more general than necessary since we only need to substitute variables for
variables, but we'll soon need its full power.
\begin{code}
sub x t = \case
 Var x' &#124; x == x' -> t
 l :. r -> sub x t l :. sub x t r
 a -> a
sub "x" (Leaf 1 :. Leaf 2)
 ((Var "x" :. (Leaf 3 :. Var "y")) :. Var "x")
\end{code}
We discard redundant constraints where the same variable appears on both sides,
such as `a = a`. Thus a variable may wind up with no integer assigned to it,
which means if a solution exists, it can take any value.
Our code is inefficient. Rather than repeat the same chain of substitutions to
reach a representative variable, we should use a more efficient
https://en.wikipedia.org/wiki/Disjoint-set_data_structure[disjoint-set data
structure].
\begin{code}
foldSub = foldr $ uncurry sub
solve2 = go [] where
 go as = \cases
 (Leaf x) (Leaf y) &#124; x == y -> Just as
 &#124; otherwise -> Nothing
 (xl :. xr) (yl :. yr) -> go as xl yl>>=
 \as0 -> go as0 (foldSub xr as0) (foldSub yr as0)
 (Var _) (_ :. _) -> Nothing
 (Var v) (Var w) &#124; v == w -> Just as
 (Var v) t -> Just $ (v, t):as
 x y@(Var _) -> go as y x
 _ _ -> Nothing
solve2 ((Var "x" :. Leaf 1) :. Var "z") ((Var "y" :. Var "y") :. Leaf 2)
solve2 ((Var "x" :. Leaf 1) :. Var "z") ((Var "y" :. Var "z") :. Leaf 2)
solve2 ((Var "x" :. Leaf 1) :. Var "x") ((Var "y" :. Var "z") :. Leaf 2)
\end{code}
== 4. Once more, with subtrees ==
What if variables can represent subtrees?
*Solution:* Although we've significantly generalized the problem, our answer
almost remains the same.
When unifying a variable `v` against a tree `t`, to avoid infinite recursion,
we perform an _occurs check_: the variable `v` must not occur in `t`,
except in the trivial case when `v == t`. If we pass this check, then we
record that in future, we should replace `v` with the tree resulting from
applying all substitutions found so far to `t`.
\begin{code}
treeUnify :: (Show a, Eq a) => Tree a -> Tree a -> Maybe [(String, Tree a)]
treeUnify = go [] where
 go as = \cases
 (Leaf x) (Leaf y) &#124; x == y -> Just as
 &#124; otherwise -> Nothing
 (xl :. xr) (yl :. yr) -> do
 as0 <- go as xl yl go as0 (foldSub xr as0) (foldSub yr as0) (Var v) (Var w) &#124; v == w -> Just as
 (Var v) t
 &#124; occurs t -> Nothing
 &#124; otherwise -> Just $ (v, foldSub t as):as
 where
 occurs = \case
 l :. r -> occurs l &#124;&#124; occurs r
 Var w &#124; v == w -> True
 _ -> False
 x y@(Var _) -> go as y x
 _ _ -> Nothing
treeUnify (Var "x")
 (Leaf 5)
treeUnify (Var "x" :. Leaf 5)
 (Leaf 4 :. Var "y")
treeUnify ((Var "x" :. Var "x") :. Var "w")
 (((Leaf 3 :. Leaf 5) :. (Var "y" :. Var "z")) :. Var "w")
\end{code}
== 5. Type inference! ==
Design a language based on lambda calculus where integers and strings are
primitve types, and where we can deduce whether a given expression is
_typable_, that is, whether types can be assigned to the untyped bindings so
that the expression is well-typed. If so, find the most general type.
For example, the expression `+\f -> f 2+` which takes its first argument `f`
and applies to the integer 2 must have type `+(Int -> u) -> u+`. Here, `u` is
a _type variable_, that is, we can substitute `u` with any type. This is known
as _parametric polymorphism_.
More precisely the inferred type is most general, or _principal_ if:
 1. Substituting types such as `Int` or `+(Int -> Int) -> Int+`
 (sometimes called _type constants_ for clarity)
 for all the type variables results in a well-typed closed term.
 2. There are no other ways of typing the given expression.
*Solution:* We define a tree to represent types. Like our previous trees, leaf
nodes can be variables (`TV` for _type variable_) or constants (`TC` for _type
constant_). The `+(:->)+` constructor plays the same role as `(:.)` above, but
this time it also represents functions.
\begin{code}
infixr 5 :->
data Type = TC String &#124; Type :-> Type &#124; TV String
instance Show Type where
 showsPrec d = \case
 TC s -> (s++)
 TV s -> (s++)
 a :-> b -> showParen (d> 0) $ showsPrec 1 a . (" -> "++) . shows b
\end{code}
We write a function `tsub` that substitutes a given tree for a given variable,
analogous to `sub` above:
\begin{code}
tsub (x, t) = \case
 TV x' &#124; x == x' -> t
 a :-> b -> tsub (x, t) a :-> tsub (x, t) b
 y -> y
\end{code}
Tree unification is almost the same, though this time we switch from `Maybe`
to `Either` so we can add error messages:
\begin{code}
unify :: Type -> Type -> Either String [(String, Type)]
unify = go [] where
 go as = \cases
 (TC x) (TC y) &#124; x == y -> Right as
 (xl :-> xr) (yl :-> yr) -> do
 as0 <- go as xl yl go as0 (foldr tsub xr as0) (foldr tsub yr as0) (TV v) (TV w) &#124; v == w -> Right as
 (TV v) t
 &#124; occurs t -> Left "occurs check"
 &#124; otherwise -> Right $ (v, foldr tsub t as):as
 where
 occurs = \case
 l :-> r -> occurs l &#124;&#124; occurs r
 TV w &#124; v == w -> True
 _ -> False
 x y@(TV _) -> go as y x
 x y -> Left $ show x ++ " versus " ++ show y
\end{code}
Next, we define a data structure to hold an expression in our language.
Applications, lambda abstractions, variables, integers, and strings:
\begin{code}
infixl 5 :@
data Expr = Expr :@ Expr &#124; Lam String Expr &#124; V String &#124; I Int &#124; S String
instance Show Expr where
 showsPrec d = \case
 a :@ b -> showParen (d> 0) $ shows a . (' ':) . showsPrec 1 b
 Lam v t -> ('\\':) . (v++) . (" -> "++) . shows t
 V s -> showParen (s == "+") $ (s++)
 I i -> shows i
 S s -> shows s
\end{code}
Unfortunately my base library lacks the state monad transformer, so we swipe
some code from GHC:
\begin{code}
data StateT s m a = StateT { runStateT :: s -> m (a,s) }
instance (Monad m) => Monad (StateT s m) where
 return a = StateT $ \ s -> return (a, s)
 m>>= k = StateT $ \ s -> do
 (a, s') <- runStateT m s runStateT (k a) s' instance (Functor m) => Functor (StateT s m) where
 fmap f m = StateT $ \ s ->
 fmap (\(a, s') -> (f a, s')) $ runStateT m s
instance (Functor m, Monad m) => Applicative (StateT s m) where
 pure a = StateT $ \ s -> return (a, s)
 StateT mf <*> StateT mx = StateT $ \ s -> do
 (f, s') <- mf s (x, s'') <- mx s' return (f x, s'') m *> k = m>>= \_ -> k
state f = StateT (return . f)
getT = state $ \ s -> (s, s)
putT s = state $ \ _ -> ((), s)
lift m = StateT $ \ s -> do
 a <- m return (a, s) \end{code} To finish off, we traverse the expression and unify types along the way to enforce the rules: * The type of a primitive value is its corresponding type; for example, 5 has type `TC "Int"` and "Hello, World" has type `TC "String"`. * For an application `f x`, we recursively determine the type `tf` of `f` and `tx` of `x`, possibly requiring new assignments. We return a new type variable `tfx`, and unify `tf` with `+tx :-> tfx+` after applying any
new assignments.
 * For a lambda abstraction `\x.t`, we generate a new type variable `tx` to
represent the type of `x`. Then we recursively find the type `tt` of `t`
during which we assign the type `tx` to any free occurrence of `x`, then
return the type `+tx :-> tt+`.
Let's walk through an example. The expression `+\f -> f 2+` would be
represented as the `Expr` tree `Lam "f" (V "f" :@ I 2)`. To type this
expression:
 1. At the top level we have a lambda so begin by generating a new type
 variable `t0`.
 2. Recursively type the lambda body, with the additional context that the
 symbol `f` has type `t0`.
 .. Generate a new type variable `t1`.
 .. Recursively type the left child of the `(:@)` node:
 ... We have the symbol `f`, which has type `t0` from the context.
 .. Recursively type the right child of the `(:@)` node:
 ... We have the integer constant 2, so we return the type `TC "Int"`.
 .. Unify `t0` with `+TC "Int" -> t1+`, which assigns the latter to the
 former.
 .. Return the type `t1`.
 3. Return the type `+t0 -> t1+`.
Thus `+\f -> f 2+` has type `+t0 -> t1+` which expands to `+(Int -> t1) -> t1+`
if we apply all the assignments we found.
We use `StateT` to help with bookkeeping. To guarantee a unique name for each
type variable, we maintain a counter which we increment for each new name,
which is stored along with the type variable assignments collected so far.
The environment `gamma` holds types of any predefined functions, and also
records the types of variables in lambda abstractions.
We ultimately return the type of an expression, or an error message explaining
why it is ill-typed. We name the function `gather` because it accumulates more
and more type variable assignments over time, but also because "to gather" can
mean "to infer".
\begin{code}
gather :: [(String, Type)] -> Expr ->
 StateT ([(String, Type)], Int) (Either String) Type
gather gamma = \case
 I _ -> pure $ TC "Int"
 S _ -> pure $ TC "String"
 f :@ x -> do
 tf <- gather gamma f tx <- gather gamma x tfx <- newTV (cs, i) <- getT let rewrite t = foldr tsub t cs cs' <- lift $ unify (rewrite tf) (rewrite $ tx :-> tfx)
 putT ((second rewrite <$> cs') ++ cs, i)
 pure tfx
 V s -> lift $ maybe (Left $ "missing " ++ s) Right $ lookup s gamma
 Lam x t -> do
 tx <- newTV (tx :->) <$> gather ((x, tx):gamma) t
 where
 newTV = do
 (cs, i) <- getT putT (cs, i + 1) pure $ TV $ 't':show i \end{code} This algorithm is the heart of the link:pcf.html[_Hindley-Milner type system_], or HM for short. See https://web.cecs.pdx.edu/~mpj/thih/thih.pdf[Mark P. Jones, _Typing Haskell in Haskell_]. The `infer` wrapper shows the type returned by `gather` with all assignments applied, or returns an error message: \begin{code} infer gamma x = case runStateT (gather gamma x) ([], 0) of Right (ty, (cs, _)) -> show $ foldr tsub ty cs
 Left e -> "FAIL: " ++ e
infer [] (Lam "x" (V "x" :@ I 2))
\end{code}
We build a demo with a couple of predefined functions with certain types:
\begin{code}
prelude :: [(String, Type)]
prelude = [
 ("+", TC "Int" :-> TC "Int" :-> TC "Int"),
 ("length", TC "String" :-> TC "Int")]
demo expr = do
 putStrLn $ "type of `" ++ show expr ++ "`:"
 putStrLn $ " " ++ infer prelude expr ++ "\n"
demo $ V "length"
demo $ V "length" :@ S "hello"
demo $ V "length" :@ V "length"
demo $ V "+" :@ I 1 :@ I 1
demo $ V "+" :@ I 1 :@ S "hello"
demo $ Lam "x" (V "x" :@ I 2)
demo $ Lam "x" (V "x" :@ V "x")
demo $ Lam "x" ((V "+" :@ V "x") :@ I 42)
demo $ Lam "x" (V "+" :@ (V "x" :@ I 42))
demo $ Lam "x" (V "x")
demo $ Lam "x" (Lam "y" (V "x"))
demo $ Lam "x" (Lam "y" (Lam "z" ((V "x" :@ V "z") :@ (V "y" :@ V "z"))))
\end{code}
== Theorems for free ==
Parametric polymorphism not only aids correctness, but also allows easier
manipulation of code.
Suppose we infer that a function `f` has type:
----------------------------------------------------------------
f :: [a] -> [a]
----------------------------------------------------------------
Then link:../compiler/para.html[we can mechanically deduce] that:
----------------------------------------------------------------
f (map r xs) = map r (f xs)
----------------------------------------------------------------
for any function `r`. This rewrite rule could be applied automatically to
help optimize code.
</div><div class="naked_ctrl">
<form action="/index.cgi/speech" method="get" name="gate">
<p><a href="http://altstyle.alfasado.net">AltStyle</a> によって変換されたページ <a href="https://crypto.stanford.edu/~blynn/lambda/hm.lhs">(-&gt;オリジナル)</a>
/ <label>アドレス: <input type="text" name="naked_post_url" value="https://crypto.stanford.edu/~blynn/lambda/hm.lhs" size="22" /></label> <label>モード: <select name="naked_post_mode">
<option value="default">デフォルト</option>
<option value="speech" selected="selected">音声ブラウザ</option>
<option value="ruby">ルビ付き</option>
<option value="contrast">配色反転</option>
<option value="larger-text">文字拡大</option>
<option value="mobile">モバイル</option>
</select>
<input type="submit" value="表示" />
</p>
</form>
</div>