A Neighborhood of Infinity

Saturday, February 11, 2012

Using Lawvere theories to combine effects

> {-# LANGUAGE MultiParamTypeClasses, ExplicitForAll, RankNTypes, FlexibleInstances, FlexibleContexts, TypeSynonymInstances #-}
> import Data.Monoid
> import Data.Functor.Identity
> import Control.Monad.Writer

In an earlier post I talked about how monads arise from free algebras. Let me recap a bit.

In Part 1 I described algebras. They're sets with operations on them satisfying some laws. We can build new elements of an algebra from old ones by using its operations. Eg. if x and y are in an algebra then x `mappend` y must be in it too. Starting with a bunch of symbols, thought of as leaves, we can consider the set of all expressions trees we can build from them. If we consider pairs of trees to be equivalent if the laws say the corresponding expressions are equal, then the set of trees itself forms an algebra known as a free algebra (for the given theory).

Let's start with some code. This type class says that the type b has leaves of type a:

> class Free a b where
> leaf :: a -> b

Effects from monoids
Now we can make the type of all trees built from Monoid operations and including all leaves of type a:

> data FreeMonoid a = FreeMonoid (forall b. (Monoid b, Free a b) => b)

And we have:

> instance Monoid (FreeMonoid a) where
> mempty = FreeMonoid mempty
> FreeMonoid a `mappend` FreeMonoid b = FreeMonoid (a `mappend` b)

Unfortunately elements like e1 and e2 two ought to be equal but Haskell doesn't know this:

> e1, e2 :: FreeMonoid Char
> e1 = FreeMonoid (leaf 'a' `mappend` (leaf 'b' `mappend` leaf 'c'))
> e2 = FreeMonoid ((leaf 'a' `mappend` leaf 'b') `mappend` leaf 'c')

Instead we can manually construct a type that does respect equality in monoids. Elements of FreeMonoid are binary trees with a `mappend` at each node. Associativity means that we can always replace a tree with an equivalent one where the left branch is a leaf. We can also use the laws to eliminate any occurrence of mempty. So every element of FreeMonoid a is equivalent to one of the form:

Leaf x1 `mappend` (Leaf x2 `mappend` (... mempty))

In other words, free monoids are lists. We can make this explicit. The standard prelude already makes [] an instance of Monoid so we just need:

> instance Free a [a] where
> leaf x = [x]

Here's the isomorphism (modulo tree equivalence):

> iso1 :: FreeMonoid a -> [a]
> iso1 (FreeMonoid x) = x
> iso1' :: [a] -> FreeMonoid a
> iso1' [] = FreeMonoid mempty
> iso1' (a : as) = let FreeMonoid r = iso1' as
> in FreeMonoid (leaf a `mappend` r)

As I talked about in that earlier article, free algebras give monads and the trees representing expressions in the algebra can be thought of as abstract syntax trees for domain specific languages. In this case it's the usual list monad. So the Monoid type class gives us a language for talking about non-determinism. The operation mappend gives us a way to "fork" a process and mempty gives as a way to "kill a thread". Here's an example using non-determinism to search for some Pythagorean triples:

> test1 :: [(Int, Int, Int)]
> test1 = do
> a <- return 3 `mappend` return 4
> b <- return 4 `mappend` return 5
> c <- return 5 `mappend` return 6
> if a*a+b*b==c*c then return (a, b, c) else mempty

Effects form M-sets
We can do exactly the same for -sets.

> class Monoid m => MSet m s where
> act :: m -> s -> s
> data FreeMSet w a = FreeMSet (forall b. (MSet w b, Free a b) => b)
> instance Monoid w => MSet w (FreeMSet w a) where
> m `act` FreeMSet b = FreeMSet (m `act` b)

Again we have the problem that FreeMSet doesn't automatically make equivalent elements equal. But it's not hard to see that every element of FreeMSet is equivalent to one of the form:

m `act` (leaf x)

So the free -set on the set of variables is simply the set of pairs . This is the basis of Haskell's writer monad:

> instance Monoid w => MSet w (Writer w a) where
> act w1 m = let (a, w2) = runWriter m in WriterT (Identity (a, w1 `mappend` w2))
> instance Monoid w => Free a (Writer w a) where
> leaf x = return x

Here's the isomorphism (again treating equivalent elements of FreeMSet as equal):

> iso2 :: Monoid w => FreeMSet w a -> Writer w a
> iso2 (FreeMSet x) = x
> iso2' :: Writer w a -> FreeMSet w a
> iso2' m = let (a, w) = runWriter m in FreeMSet (act w (leaf a))

And now the -set operation gives us an interface to an effect. This time the side effect of accumulating in a monoid:

> test2 :: Writer String Int
> test2 = do
> act "foo" (return ())
> a <- return 2
> act "bar" (return ())
> b <- return (10*a)
> return b

Combining effects

And now we can finally combine the two effects of non-determinism and accumulation. We make the free algebra that is both a monoid and an -set:

> data FreeMMonoid w a = FreeMMonoid (forall b. (Monoid b, MSet w b, Free a b) => b)
> instance Monoid w => Monoid (FreeMMonoid w a) where
> mempty = FreeMMonoid mempty
> FreeMMonoid a `mappend` FreeMMonoid b = FreeMMonoid (a `mappend` b)
> instance Monoid w => MSet w (FreeMMonoid w a) where
> m `act` FreeMMonoid b = FreeMMonoid (m `act` b)

Again we have the problem that equivalent elements aren't recognised as equal so we have to manually find a suitable type. For this we need to use the compatibility notion I introduced in Part 1. We can take 2 variables and and write them in a 1 by 2 array:

Apply mappend horizontally and act vertically to get:

m `act` (x `mappend` y)

Now apply act vertically and then mappend horizontally to get:

(m `act` x) `mappend` (m `act` y)

The law we want is:

m `act` (x `mappend` y) == (m `act` x) `mappend` (m `act` y)

Given an arbitrary tree in FreeMMonoid we can use this law to "push" all occurrences of act inwards. Ultimately every element can be written uniquely in the form:

act m1 (leaf x1) `mappend` (act m2 (leaf x2) `mappend` (... mempty)

We can then use the same argument as above to show that we end up with a list of pairs of elements of . This is exactly what we get if we apply the WriterT monad transformer to []. Here are the relevant instances:

> instance Monoid w => Monoid (WriterT w [] a) where
> mempty = WriterT []
> WriterT xs `mappend` WriterT ys = WriterT (xs ++ ys)
> instance Monoid w => MSet w (WriterT w [] a) where
> m `act` WriterT xs = WriterT $ map (\(x, w) -> (x, m `mappend` w)) xs
> instance Monoid w => Free a (WriterT w [] a) where
> leaf x = return x

Here's the isomorphism though we won't use it:

> iso3 :: Monoid w => FreeMMonoid w a -> WriterT w [] a
> iso3 (FreeMMonoid x) = x
> iso3' :: Monoid w => WriterT w [] a -> FreeMMonoid w a
> iso3' m = let xws = runWriterT m in FreeMMonoid $
> foldr mappend mempty $ map (\(x, w) -> act w (leaf x)) xws

The monad WriterT (Product Float) [] is in fact the probability monad. Here's an example of its use:

> coin :: (Monoid a, MSet (Product Float) a, Free Bool a) => a
> coin = act (Product 0.5 :: Product Float) (leaf False)
> `mappend`
> act (Product 0.5 :: Product Float) (leaf True)

Compute unnormalised conditional probability distribution on a pair of coin tosses given that first coin can't be True unless second one is:

> test3 :: WriterT (Product Float) [] (Bool, Bool)
> test3 = do
> coin1 <- coin
> coin2 <- coin
> if coin1>coin2 then mempty else return (coin1, coin2)

(Compare with Eric Kidd's article that also 'refactors' probability theory.)

What just happened?
Something miraculous just happened though it may have been lost in the details. We combined the list monad and the writer monad to get a new monad. We did it without using monad transformers and without specifying an order for the two monads. It just so happens in this case that the result was the same as using a monad transformer.

M-set with M-set
We can try other products of theories. It's tricky to deal with a theory combined with itself because repeating a type class in a context doesn't do anything. We need to make another type class that looks exactly like MSet but with different names. The result is that the product of the theory of -sets and the theory of -sets is the theory of -sets. This agrees with what we'd get from using monad transformers. It also agrees with intuition. -sets correspond to the effect of accumulating data in a monoid. The product theory corresponds to using two accumulators simultaneously.

(This makes me think type classes should take as arguments the name of the operations within them. That way a type can be an instance of the same type class in multiple ways. Compare with Agda modules.)

Monoid with monoid
This example illustrates why we can't expect a programming language to use the above method to combine theories. If an algebra has two multiplication operators with identities on it, and the two operators are compatible, then something surprising happens. The multiplications turn out to be the same operation. What's more, the operation is commutative. So the product of the theory of monoids with itself is the theory of commutative monoids. A free commutative monoid is a multiset. Multisets require a very different implementation to lists and I doubt any automatic algebra combiner in the near future could discover one. (The Eckmann-Hilton argument also appears here.)

The compatibility condition
To form the product of two theories we add in extra laws to ensure commutativity. If we don't add in such laws we get the sum of two theories. For the example theories I used here these theories can lead to quite complex types. For example the sum of the theory of -sets and -sets is, I think, the theory of -sets where is the "free product" of monoids. I this is a bit of a messy object from the perspective of types. Other effects, however, may behave nicely with respect to . I haven't yet investigated.

Conclusion
If you don't mind computing the relevant types by hand there are perfectly good alternative to monad transformers for combining effects. But it seems very difficult to automatically combine theories. In fact, I expect finding canonical forms for the elements of free algebras for a product theory isn't even computable. So this approach isn't going to replace monad transformers any time soon.

Exercise
Make a multiplication table showing the result of forming the product of algebras for lots of useful effects.

Labels: category theory, haskell, lawvere theories

posted by sigfpe at Saturday, February 11, 2012 8 comments

Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Suppose you have a Haskell program and you want to introduce a new global constant into your program. There are at least two approaches you could take:

Simply introduce a new global constant. You could name it x and write something like x=1.23456 and refer to x throughout your code. This has the advantage of being easy to implement.
Write all of your code in monadic style and make use of the reader monad. This is intrusive in the sense that you may have to make many changes to your code to support it. But it has the advantage that all of your functions now explicitly become functions of your global constant.

Now I’m going to roughly sketch a more categorical view of both of these approaches. So let’s restrict ourselves to the subset of Haskell that corresponds to typed lambda calculus without general recursion so that we know all of our functions will be total and correspond to the mathematical notion of a function. Then all of our functions become arrows in the category that we’ll call Hask.

Firstly consider approach (1). Suppose we want to introduce a new constant, x, of type A. Category theory talks about arrows rather than elements of objects, so instead of introducing x of type A, introduce the function x:1->A where 1 is the terminal object in Hask, normally called (). An element of A is the same thing as an element of 1->A, but in the latter case we have an arrow in the category Hask.

Before continuing, let me digress to talk about polynomials. Suppose we have a ring (with an identity) R. We define R[x], where x is an indeterminate, to be the ring of polynomials in x. Another way to describe that is to say that R[x] is the smallest ring containing R and an indeterminate x, that makes no assumptions about x other than those required to make R[x] a ring. For example we know that (1+x)(1-x)=1-x², because that must hold in any ring. Given a polynomial p in R[x] we can think of it as a function f_p from R to R. f_p(a) is the value we get when substituting the value of a for x in p. So a polynomial in R[x] is the same as a function from R to R that can be written in terms of elements of R, multiplication and addition.

We can do the same with category theory. Given a category A we can ask for the smallest category extending A and containing an indeterminate arrow x:1 -> A. Just as with polynomials we have to allow all possible arrows that can be made by composing arrows of A with x. The resulting expressions for arrows will contain x as a free variable, just like the way x appears in polynomials. In fact, by analogy we can call the resulting category, A[x], the category of polynomials in x:1->A. In the special case A=Hask, you can see that Hask[x] is the category of Haskell functions extended by a new constant of type x:1->A but assuming no equations other than those necessary to make Hask[x] a category. Just as an arrow in Hask is a Haskell function, an arrow in Hask[x] is a Haskell function making use of an as yet undefined constant x.

(I've glossed over some subtleties. Just as we need a suitable equivalence relation to ensure that (1+x)(1-x)=1-x² in R[x], we need suitable equivalence relations in our category. I'll be showing you where to find the missing details later.)

Here's the implementation of a function, h, making use of a constant x:

(Note that I'll be using Edward Kmett's category-extras shortly so I need some imports)



> import Control.Monad.Reader
> import Control.Comonad
> import Control.Comonad.Reader

> x = 1.23456

> f a = 2*a+x
> g a = x*a
> h a = f (g a)

> test1 = h 2

Now consider the second approach. The easiest thing is to just give an implementation of the above using the reader monad:



> f' a = do
> x <- ask
> return $ 2*a+x

> g' a = do
> x <- ask
> return $ x*a

> h' a = return a >>= g' >>= f'

> test2 = runReader (h' 2) 1.23456

Note how, as is typical in monadic code, I have to plumb f' and g' together using >>= so that 1.23456 is passed through f' and g'. Previously I've described another way to think about the composition of monadic functions. Using >>= we can compose functions of type a->m b and b->m c to make a function of type a->m c. The result is that given a monad we can form the Kleisli category of the monad. The objects are the same as in Hask, but an arrow from a->b in the Kleisli category is an arrow of type a->m b in Hask. It's not hard to show this satisfies all of the axioms of a category. When we program in the reader monad it's a bit like we've stopped using Hask and switched to the Kleisli category of the reader monad. It's not quite like that because we used functions like +. But in theory we could use lifted versions of those functions too, and then we'd be programming by composing things in the Kleisli category. If we call the reader monad R then we can call the corresponding Kleisli category Hask^R. (Strictly speaking that R needs a subscript telling is the type of the value we intend to ask for.)

So here's the important point: Hask[x] is the same category as Hask^R. In both cases the arrows are things, which when supplied a value of the right type (like 1.23456), give arrows in Hask from their head object to their tail object.

But there's another way to do this. We can use the reader comonad:



> f'' a = 2*extract a+askC a
> g'' a = extract a*askC a
> h'' a = a =>> g'' =>> f''

> test3 = runCoreader (h'' (Coreader 1.23456 2))

In a similar way, we're dealing with arrows of the form wa -> b and we can compose them using =>>. These arrows form the coKleisli category of the reader comonad, S, which we can write Hask_S. So we must have

Hask[x]≅Hask^R≅Hask_S.

Now some back story. Over 20 years ago I was intrigued by the idea that logic might form a category with logical ‘and’ and ‘or’ forming a product and coproduct. I came across the book Introduction to Higher Order Categorical Logic by Lambek and Scott for ₤30.00. That’s ₤60.00 at today's prices, or about 120ドル.00. On a student grant? What was I thinking? And as it bore no relation to anything I was studying at the time, I barely understood a word of it. I was probably fairly applied at that point doing courses in stuff like solid state physics and electromagnetism as well as a bit of topology and algebra. I doubt I'd heard of lambda calculus though I could program in BASIC and APL. So there it sat on my bookshelf for 22 years. Periodically I’d look at it, realise that I still didn’t understand enough of the prerequisites, and put it back on the shelf. And then a month or so ago I picked it up again and realised that the first third or so of it could be interpreted as being about almost trivial Haskell programs. For example, on page 62 was

Proposition 7.1
The category A[x] of all polynomials in the indeterminate x:1->A over the cartesian or cartesian closed category A is isomorphic to the Kleisli category A_A=A_{S_A} of the cotriple (S_A,&epsilon_A,δ_A).

The language is a little different. Lambek and Scott used the term cotriple instead of comonad and Kleisli category where I’d say coKleisli category. δ and ε are cojoin and coreturn. And Lambek and Scott's theorem applies to any cartesian closed category. But after staring at this claim for a while it dawned on me that all it was really saying was this: here are two ways to introduce new constants into a category. But there’s no way I would have seen that without having practical experience of programming with monads. Learning Haskell has finally paid off. It’s given me enough intuition about category theory for me to get some return on my ₤30.00 investment paid to Heffers all those years ago. I expected to take this book to my deathbed, never having read it.

Anyway, for the details I left out above, especially the correct equivalence relation on Hask[x], you'll just have to read the book yourself.

Also, note the similarity to the deduction theorem. This theorem says that if we can prove B, assuming A, then we can deduce A implies B without making any assumptions. It unifies two way to introduce a proposition A, either as a hypothesis, or as an antecedent in an implication. In fact, the above theorem is just a categorical version of the deduction theorem.

Also note the connection with writing pointfree code. In fact, the pointfree lambdabot plugin makes use good use of the reader monad to eliminate named parameters from functions.

I’m amazed by seeing a book from 1986 that describes how to use a comonad to plumb a value through some code. As far as I know, this predates the explicit use of the reader monad in a program, Wadler and Moggi’s papers on monads, and certainly Haskell. Of course monads and comonads existed in category theory well before this date, but not, as far as I know, for plumbing computer programs. I’d love to hear from anyone who knows more about the history these ideas.

Labels: comonads, haskell, mathematics, monad

posted by sigfpe at Saturday, June 14, 2008 23 comments

Sunday, April 27, 2008

Infinitesimal rotations and Lie algebras

A little while back I tried to give a rough idea of what Lie algebras were about. What I want to do now is show how with a bit of Haskell code we can directly get our hands on one. I'm going to assume you know how to make matrices for rotations, and multiply them, and some knowledge from previous posts, for which I'll give links.

Firstly a bit of Haskell administration:


> {-# OPTIONS -fno-warn-missing-methods #-}

Now we need some quick and dirty matrix code:


> data Matrix a = M [[a]] deriving (Eq,Show)

> instance Functor Matrix where
> fmap f (M a) = M $ map (map f) a

> instance Num a => Num (Matrix a) where
> M a * M b = M $ mult a b

A Lie Group

What I'm going to do is start by constructing elements of the group of 3D rotations, otherwise known as SO(3), and show how there's another algebraic structure hidden inside it. So let's make some rotation matrices:


> rx theta = M $ [[1,0,0],
> [0,cos theta,-sin theta],
> [0,sin theta,cos theta]]

> ry theta = M $ [[cos theta,0,sin theta],
> [0,1,0],
> [-sin theta,0,cos theta]]

> rz theta = M $ [[cos theta,-sin theta,0],
> [sin theta,cos theta,0],
> [0,0,1]]

These are the three rotations around the x-, y- and z-axes. It's traditional to build arbitrary rotations through the use of Euler angles:


> euler [a,b,c] = rx a*ry b*rz c

The 3D rotations form an example of a Lie group. (A Lie group is essentially a group where the operations like multiplication are differentiable.)

Any 3D rotation can be constructed from a single application of euler. But notice how there's a bit of ugliness in this function. I've made an arbitrary choice of which order to apply the rotations in. I could have defined:


> euler' [a,b,c] = rz c*ry b*rx a

And it's easy to show that euler≠euler'. This is because rotations don't commute. In other words, for rotations, a*b≠b*a.

We can measure the non-commutativity. Remember that any rotation has an inverse. For example rx theta*rx (-theta) gives us the identity matrix because one of these two rotations 'undoes' the other. Given any two rotations we can construct what is known as their commutator:


> commutator a b = inverse a*inverse b*a*b

The idea is that we first perform b, then a, then undo b and then undo a. If a and b commute then this expression can be rearranged to inverse a*inverse b*b*a and then the inverses cancel leaving us with the identity matrix. If they don't commute then we end up with a non-identity matrix. So the comuutator measures the extent to which matrices don't commute.

As I'm feeling lazy, I don't feel like writing inverse. Instead, as I'm only going to work with rotations, I'll use the fact that the inverse of a rotation matrix is the transpose and define:


> inverse = transpose

Try playing with expressions like commutator (rx 1) (ry 2). Note how the numbers quickly get messy. Try to write down closed form expressions for applications of euler and you'll see how complex things can get.

A Lie Algebra

But there's another way to approach the rotation group - through 'infinitesimal' rotations. In my earlier article I just talked about infinitesimal group operations in a hand-wavey way. Now I'm going to make the notion more rigorous. We just need to introduce an infinitesimal number, d, whose square is zero. I've talked about this a lot before so I'm borrowing my earlier code and defining:


> d :: Num a => Dual a
> d = D 0 1

If you try it you'll see that d*d is zero.

Now we can try making infinitesimal rotations:


> rot1 = euler [d,2*d,3*d]

Note how when we evaluate this we get 'nice' numbers. No need to worry about those trig functions any more. And if you look closely at rot1 you'll see that it's essentially the identity matrix plus an infinitesimal part. We can pull the infinitesimal part out using fmap im rot1. You may be able to guess how to build it from the arguments to euler. But first, try evaluating rot2:


> rot2 = euler' [d,2*d,3*d]

It's the same! Try other infiitesimal rotations. When dealing with infinitesimal rotations it doesn't matter what order you apply them in, you get the same result. Working with infinitesimal rotations is looking much easier than working with full=size rotations. In fact, it gets better. Try defining


> rot3 = euler [5*d,-d,2*d]

Now look at fmap im (rot1*rot3) and compare with fmap im rot1 and fmap im rot3. We can multiply infinitesimal rotations simply by adding their infinitesimal parts. In fact, we can define


> star [a,b,c] = M $ [[0, -c, b],
> [c,0,-a],
> [-b,a,0]]

So we have:

fmap im (euler [a*d,b*d,c*d]) == star [a,b,c]

and

fmap im (euler [a*d,b*d,c*d]*euler [u*d,v*d,w*d]) == fmap im (euler [(a+u)*d,(b+v)*d,(c+w)*d])

Not a single trig expression anywhere!

So we have a simplified way of viewing rotations by looking at infinitesimal rotations. A triple [a,b,c] can be thought of as representing an infinitesimal rotation through star and instead of multiplying matrices we just add the triples elementwise. These triples, together with the binary operation of addition form an example of a Lie algebra. But there's a piece missing. We have lost all information about the non-commutativity of the rotations. It's one thing to simplify, but it's another to lose an important feature of what you're looking at.

The problem is that d is 'too small'. We need an infinitesimal that doesn't go to zero the moment you square it, but is still, in some sense, infinitesimally small. We could rewrite the Dual type. But there's a trick. Define:


> e :: Num a => Dual (Dual a)
> e = D (D 0 1) (D 1 0)

(If you think of Dual as forming a tensor product as I described here then e=1⊗d+d⊗1.)

You can check that e^2 is non-zero but e^3 is zero.

Now when we compute a commutator we get something a little different:


> comm1 = commutator (euler [e,0,0]) (euler [0,e,0])

fmap re comm1 is essentially the identity as before. But if we look at fmap im comm2 there's a non-zero infinitesimal piece which is different from what we had when we worked with d. This infinitesimal piece is in fact proportional to e^2. As (im . im) (e^2) is a half, we can extract the coefficient of e^2 from comm1 using


> im2 x = im (im x)/2

In fact, we have fmap im2 comm1 == star [0,0,1]. So by choosing infinitesimals that aren't too small, we haven't lost information about non-commutativity. In fact, a bit of experimentation may convince you that with:


> shrink u = fmap (e*) u

we have:

fmap im2 (commutator (euler (shrink u)) (euler (shrink v))) == star (u `cross` v)

So let's step back a bit and recapitulate all this in something approaching English:

Given a tiny rotation we can represent it as three Euler angles a, b, c, all of which are tiny. We can think of a, b and c as forming a vector [a,b,c]. When we do this, apart from an even smaller error, multiplication of rotations becomes ordinary addition of vectors and the order of rotations isn't significant. But if we choose not to ignore this small error we see that a rotation represented by u and a rotation represented by v don't quite commute and the order does matter. The size of this error is measured by the cross product of u and v. This is intuitively plausible, we'd expect that rotations defined by vectors in a similar direction would be closer to commuting, and this is reflected in the fact that the cross product is zero for parallel vectors.

So now I can fill in the missing piece from the description of the Lie algebra I gave above. The Lie algebra so(3) consists of the 3d vectors (representing infinitesimal rotations), addition of vectors (representing multiplcation of infinitesimal rotations) and the cross product (measuring the amount by which small rotations fail to commute). This picture doesn't just apply to the rotations, a similar one applies for any Lie group. We get the same pattern of a simplified form of multiplication and a binary operation that measures non-commutativity.

But you might still ask if there's something missing. What if we defined f so that f^4 is zero but f^3 isn't? Would we extract more information about the non-commutativity of rotations? Well the interesting fact, which I won't prove here, is that it doesn't. Almost everything you need to know about a Lie group can be extracted from its Lie algebra. In fact, from a group's Lie algebra you can almost recover the original group. (In fact, what you recover is its universal cover.) But notice how there are no trig formulae involved when talking about Lie algebras. Lie algebras give a really nice way to study Lie groups without getting your hands too dirty. But this isn't the only reason to study Lie algebras, many physical properties arising from symmetries are more naturally studied through the Lie algebra because Lie algebras arise from Lie groups as soon as you start differentiating things. In particular, Lie algebras play a major role in Noether's theorem, one of the cornerstones of modern theoretical physics.

In summary then, I hope I've given some flavour of Lie algebras. Using infinitesimals you can get your hands on them directly without the use of a large amount of mathematical machinery. There has been some difficult stuff here, but I'm hoping that the freedom you now have at the Haskell prompt to play with the things I've been talking about will make up for the inadequacies of my explanations.

One last word: in principle we could do the same with E8. But we'd need 248 by 248 matrices, and the equivalent of euler would need 248 parameters. (That's a coincidence, in general the number of parameters needed to define an element of a Lie group isn't equal to the dimension of the matrix, but it is for SO(3) and E8).

Appendix (the bits of code I left out above)

Defining the infinitesimals:


> data Dual a = D a a deriving (Show,Eq)

Extract the 'full-size' and 'infinitesimal' parts of a number:


> re (D a _) = a
> im (D _ b) = b

> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b) 

> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')

> instance Fractional a => Fractional (Dual a)

Some useful matrix and vector operations:


> mult a ([]:_) = map (const []) a
> mult a b = zipWith (:) (map (dot (map head b)) a) (mult a (map tail b))

> transpose (M a) = M $ transpose' a where
> transpose' [] = repeat [] 
> transpose' (xs : xss) = zipWith (:) xs (transpose' xss)

Some simple vector operations


> dot a b = foldr (+) 0 $ zipWith (*) a b
> cross [a,b,c] [d,e,f] = [b*f-c*e,c*d-a*f,a*e-b*d]

Labels: haskell, mathematics, physics

posted by sigfpe at Sunday, April 27, 2008 0 comments

Tuesday, February 05, 2008

A Third Order Quine in Three Languages

Suppose F_nxy is a program, written in language x, that takes as input n string arguments as input, G₁,...,G_n and outputs a program in language y that is the application of the function whose source code is G₂ to the strings G₂,G₃,...,G_n,G₁. Then F_3xy(F_3xy,F_3yz,F_3zx) will be a program in language x that that outputs a program in y that computes F_3yz(F_3yz,F_3zx,F_3xy).

We can put this into practice:


q a b c=putStrLn $ b ++ [toEnum 10,'q','('] ++ show b ++ [','] ++ show c ++ [','] ++ show a ++ [')']
main=q "q a b c=putStrLn $ b ++ [toEnum 10,'q','('] ++ show b ++ [','] ++ show c ++ [','] ++ show a ++ [')']" "def q(a,b,c):print b+chr(10)+'q('+repr(b)+','+repr(c)+','+repr(a)+')'" "def e(x) return 34.chr+x+34.chr end;def q(a,b,c) print b+10.chr+'main=q '+e(b)+' '+e(c)+' '+e(a)+' '+10.chr end"

This is a Haskell program that outputs a Python program that outputs a Ruby program that outputs the original Haskell program.

Apologies for the lack of line breaks. If it's not readable, it should be possible to copy and paste that source. Also, you may need to tweak it if your OS doesn't treat character 10 as a new line.

Labels: haskell, programming, self-reference

posted by sigfpe at Tuesday, February 05, 2008 15 comments

Saturday, February 02, 2008

Purely functional recursive types in Haskell and Python

"""
This post is simultaneously Python and literate Haskell.

There is a certain truth to Greenspun's tenth law of programming. A Python project I was developing at work has slowly mutated into a compiler for a programming language without me planning it that way. Usually (I assume) compilers parse their input and construct an AST which is passed to the compiler proper. My code didn't have an AST, just a bunch of lambdas. I realised that I'd actually come across a real world example of what Wadler was talking about in Recursive Types for Free!.

In Haskell, the foldr function reduces a list using a binary function and some initial value. Suppose the function is called a and the initial value is b. Take a list, for example [1,2,3]. Now write it without using list notation, directly in terms of its constructors. Ie. 1:(2:(3:[])). foldr replaces (:) by a and [] by b. So this becomes a(1,a(2,a(3,b))). The best known example is a=(+) and b = 0 so we get 1+2+3+0 and hence the sum of the values in the list. Here is how we'd use foldr in Haskell:



> x = foldr (+) 0 [1,2,3]

The interesting thing is that anything you might want to know about a (finite) list can be extracted using foldr. There is a sense in which it the universal function on lists and all other functions can be factored through it. For example, we can implement head and tail as follows



> head = foldr const undefined
> tail x = let Just (_,t) = foldr tailHelper Nothing x in t where
> tailHelper x Nothing = Just (x,[])
> tailHelper x (Just (y,z)) = Just (x,y:z)

So if x is a list, \a b -> foldr a b x tells you everything you could want to know about the list. In other words, you can completely replace the list itself with functions like this. In fact, we can replace the list constructors with functions that build such functions:



> nil a b = b
> cons h t a b = a h (t a b)

We can use nil and cons just like [] and (:). In fact, given an element defined by



> y = cons 1 (cons 2 (cons 3 nil))

We can convert it to a conventional list via



> z = y (:) []

So foldr embeds a list as a function.

We can write the same thing in Python. (Note that Python already has a variation of foldr, called reduce.)


"""

def foldr(a,b,l):
 if l==[]:
 return b
 else:
 return a(l[0],foldr(a,b,l[1:]))

print foldr(lambda x,y:x+y,0,[1,2,3])

"""

It's surprisingly easy to implement cons and nil in Python too:


"""

def nil(a,b):
 return b

def cons(h,t):
 def _cons(a,b):
 return a(h,t(a,b))
 return _cons

l = cons(1,cons(2,cons(3,nil)))

print l(lambda x,y:x+y,0)

print l(lambda x,y:[x]+y,[])

"""

Folds can be generalised to any recursive type, not just lists. (Stricly speaking I mean recursive rather than corecursive types. Folds aren't appropriate for infinite structures.) Note how for lists, foldr takes two arguments besides the list: a two argument function and a zero argument function. Applying a fold simply replaces the list constructors (:) and [] with these functions. Generalised folds do something similar: each constructor gives rise to an argument to the fold and when the fold is evaluated, each constructor is replaced with the appropriate function. Here's an example:

Now consider a simple expression type in Haskell:



> data Expr = X | Const Int | Binop (Int -> Int -> Int) Expr Expr

This is a recursive type so it has a generalised fold associated with it. This fold will take three arguments, one for each of X, Const and Binop, and each one will take the same number of arguments as the constructor. Here it is:



> efold :: a -> (Int -> a) -> ((Int -> Int -> Int) -> a -> a -> a) -> Expr -> a
> efold x _ _ X = x
> efold _ c _ (Const a) = c a
> efold x c b (Binop f lt rt) = b f (efold x c b lt) (efold x c b rt)

efold simply replaces each constructor with an application of the matching function recursively through the entire Expr.

Anything you might want to do to an Expr can be done using efold, and many things you might naturally want to do with an Expr are particularly easy to write using it. Here the functions to (1) evaluate the expression for X equal to some Int, and (2) to determine whether or not an expression is free of references to X:



> eval x e = efold x id id e
> freeX e = efold False (const True) (const (&&)) e
> identity e = efold X Const Binop e

Now we can do the same thing we did above, replace the Expr structure with its corresponding fold. And again, I'm implementing it in Python rather than Haskell:


"""

def X():
 def _X(x,c,b):
 return x
 return _X

def Const(a):
 def _Const(x,c,b):
 return c(a)
 return _Const

def Binop(f,l,r):
 def _Binop(x,c,b):
 return b(f,l(x,c,b),r(x,c,b))
 return _Binop

def eval(x,e):
 return e(x,lambda x:x,lambda f,l,r:f(l,r))

def freeX(e):
 return e(False,lambda x:True,lambda f,l,r:l and r)

"""

So we have translated the Haskell algebraic type Expr into functional expressions in Python. Here are some examples of their use:

Evaluating X, 2 and X+2 at X=2:


"""

print eval(3,X())
print eval(3,Const(2))
print eval(3,Binop(lambda x,y:x+y,X(),Const(2)))

"""

Testing whether 10-2 and X()+2 are free of references to X():



"""
print freeX(Binop(lambda x,y:x-y,Const(10),Const(2)))
print freeX(Binop(lambda x,y:x+y,X(),Const(2)))
"""

You can even implement a version in a blend of functional and OO style:



"""
class X:
 def __call__(self,x,c,b):
 return x

class Const:
 def __init__(self,a):
 self.a = a
 def __call__(self,x,c,b):
 return c(self.a)

class Binop:
 def __init__(self,f,l,r):
 self.f = f
 self.l = l
 self.r = r
 def __call__(self,x,c,b):
 return b(self.f,self.l(x,c,b),self.r(x,c,b))

"""

Some final comments:

This can sometimes be an inefficient style of programming, especially so in a strict language. Look again at tail for the cons/nil lists. But many uses are quite efficient, and folds capture a very common design pattern.

When I wrote this post a while back I left out mention of what the main point of the paper was. This post fixes that.

Wadler's paper also describes a dual version of this for codata such as streams. But as far as I understand it's not very interesting.

It's interesting that theory about static types has something to say about programming in a dynamically typed programming language.

Just so you know, my work project doesn't look anything like the code above.

Oh...and I guess you could say this was a form of the visitor pattern. Ugh. It's hideously complicated in C++.
"""

Labels: haskell, programming

posted by sigfpe at Saturday, February 02, 2008 13 comments

Saturday, September 08, 2007

Tries and their Derivatives

This was going to be part two of three, but I'm now squashing two parts down to this one article. So this is the conclusion to what I started in my previous installment.

Previously I talked about certain things you could do with types built up recursively using addition and multiplication of types - the so-called regular tree types. But I didn't talk about types that could be built using ->, ie. function types. These types seem somehow different. The type Either a b contains data pertaining to an a or a b, and (a,b) contains data about an a and a b. But a -> b is something completely different, a function rather than a datastructure. What I want to show is that this distinction isn't so clear, and that in fact functions can often be replaced by an isomorphic datastructure. There's an obvious application for this: memoisation, and most of what I say can be found in a paper by Ralf Hinze on that subject. However, I have an ulterior motive - computing generalised antidiagonals, and my notation will be biased accordingly.

Firstly, I'll be using conventional mathematical exponentiation notation for function types, so I'll use A^B to mean the same thing as B -> A, ie. functions mapping from B to A. This will help to make some of the algebra seem more natural.

So let's start with a type like A -> (). We could also write this as 1^A. From ordinary algebra we might expect this to equal 1. Sure enough, there is (up to pointwise equality) only one function of type A -> (), and it's called const ().

Now consider A^B+C. Algebraically this is just A^BA^C. In fact, the isomorphism is given by the standard Haskell either function. So far we can take any type B, built from singletons and addition, and reduce A^B to an isomorphic type with no exponentiation. Our goal will be to generalise this - for any type B (well...as many as possible anyway) to find a type constructor T such that T[A] = A^B, and in such a way that T is defined without exponentiation. Such a T is called a (generalised) trie. Like we did for the antidiagonal we can express this through a multiparameter type class:



> class Trie a t | a -> t where
> tabulate :: (a -> b) -> t b
> apply :: t b -> (a -> b)

tabulate converts a function into a datastructure and apply performs the opposite.

Again we're going to have the same issues with the antidiagonal. Haskell can't build tries for us automatically, but various generic forms of Haskell can. I'll be using plain Haskell so we'll be doing a bit of unnecessary labour. (I did try Hinze's Generics for the Masses approach but couldn't get it to work in this context. I suspect it can be made to work with more effort.)

So let's use the same example as in my previous installment, Bool.



> data BoolTrie a = BoolTrie a a deriving Show

> instance Trie Bool BoolTrie where
> tabulate f = BoolTrie (f False) (f True)
> apply (BoolTrie f _) False = f
> apply (BoolTrie _ t) True = t

It's an easy exercise to show that apply and tabulate are mutual inverses. The BoolTrie stores the two possible values that a function Bool -> a could take. But here's the cool thing: given an expensive to compute function, f, of type Bool -> X, tabulate f is a datastructure that tells us everything we need to know about f. apply (tabulate f) gives us back our function again, but note how it never needs to call f more than once for each argument. In other words, we can define



> memo :: Trie a t => (a -> b) -> (a -> b)
> memo = apply . tabulate

which automatically converts functions into a memoised form.

Again, mimicking antidiagonals, let's implement the trie of Either a b:



> data EitherTrie u v a = EitherTrie (u a) (v a)

> instance (Trie a u,Trie b v) => Trie (Either a b) (EitherTrie u v) where
> apply (EitherTrie f _) (Left x) = apply f x
> apply (EitherTrie _ g) (Right y) = apply g y
> tabulate f = EitherTrie (tabulate (f . Left)) (tabulate (f . Right))

Next consider products. This is slightly subtler, but only slightly. We use A^BC=(A^C)^B. If we define U[X]=X^B and V[X]=X^C then A^BC=U[V[A]].



> data PairTrie u v a = PairTrie (u (v a))

> instance (Trie a u,Trie b v) => Trie (a,b) (PairTrie u v) where
> apply (PairTrie f) (b,c) = apply (apply f b) c
> tabulate f = PairTrie $ tabulate (\a -> tabulate (\b -> f (a,b)))

This all applies recursively. So let's try tackling boolean lists again. We have L = 1+2L. Define T_L[X]=X^L. So T_L[X]=X¹X^2L=X(X^L)²=X(T_L[X])². This gives a nice bona fide definition of T_L. Just as with antidiagonals, it's annoying to have to write this by hand. A good generic programming language should be able to build a BoolTrie from a [Bool] automatically:



> data BoolListTrie a = BoolListTrie a (BoolListTrie a) (BoolListTrie a)

> instance Trie [Bool] BoolListTrie where
> apply (BoolListTrie n _ _) [] = n
> apply (BoolListTrie _ f _) (False:bs) = apply f bs
> apply (BoolListTrie _ _ t) (True:bs) = apply t bs
> tabulate f = BoolListTrie (f []) (tabulate (f . (False:))) (tabulate (f. (True:)))

Here's a silly example putting this in action on a binary version of the Ackermann function:



> inc (False : bs) = True : bs
> inc (True : bs) = False : inc bs
> inc [] = [True]

> dec [True] = []
> dec (True : bs) = False : bs
> dec (False : bs) = True : dec bs

> ack [] n = inc n
> ack m [] | not (null m) = ack (dec m) [True]
> ack m n = ack (dec m) (ack m (dec n))

> ack' = curry (memo (uncurry ack))

Note how once you've used ack' on some arguments, it runs instantly when you reuse it on the same or smaller arguments.

Part 2

Tries are containers. So suppose T is the trie corresponding to the type X with T[A] isomorphic to A^X. What is T[1]? Well it's just 1^X which from basic algebra equals 1. Or do look at it another way, it's the type of all functions taking values in the type 1. There's only one such function and so clearly there is a unique object of type T[1] and T[1] must be isomorphic to 1. An element of '1' can be thought of as a type with no information in it. So if we have a container with elements of 1 in it, it's as if the slots for those elements have simply been closed off. So another way to look at T[1] is that it is a container with all of its slots closed. So for tries, T[1] must be equivalent to the trivial type 1.

At this point, derivatives of containers have become mainstream. So I don't need to talk about this and can direct you to the Haskell wikibook if you need to brush up.

So what do we get if we differentiate the trie T? T is a container that holds one value for each element of X. The derivative, T', is a trie with a 'hole' in it. In other words, it's a container that contains a value for all elements of X but one. So what's T'[1]? All of the slots in the container have been stuffed with 1 and so are effectively 'closed'. But the hole is still there and the whole hasn't been 'closed'. So a T'[1] is a trie where one slot has been singled out as a hole. But the slots are in one-to-one correspondence with elements of X, and so T'[1]=X. There's another way to see this. Define the function on the reals t(y)=y^x. Then t'(y)=xy^x-1 so t'(1)=x. So it should come as no surprise that T'[1]=X. See the Container Types blog for more dicussion of this. They call F'[1] by the name log F. Their Y can be seen as the operator that maps a type to its trie.

Now, let's go back to the antidiagonal again. I showed how to compute X², but you can see that it's tricky to use the same approach to extend this to Xⁿ, for arbitrary naturals n. Instead, consider differentiating T' to make another hole in the trie. T''[X] must be a trie of X's with two holes, but the second hole obviously can't be where the first hole was. So T''[1] is a datastructure that consists of nothing but two distinct holes in a T[1]. As if by magic, T''[1] must be X². There's another way to see this. If t(y)=y^x, then t''(y)=x(x-1)y^x-2 so t''(1)=x². And more generally, T⁽ⁿ⁾[1]=Xⁿ. And that's the solution!

So a programming language powerful enough to differentiate types automatically, and powerful enough to build tries automatically (and Type-Indexed Data Types describes implementations of both of these in Generic Haskell) allows us to automatically construct the type of n-tuples where all of the elements are distinct. I'll leave the code to you as an exercise. :-)

Let me at least do the algebra for simple binary trees.

B = 1+B²

Let T be the trie of B so T[X] is isomorphic to X^B.

X^B = X(X^B)^B

T[X] = XT[T[B]]

(Exercise: try to draw a picture of this kind of tree. Because of the nesting of T's it's not a 'regular' type and you'll find yourself rapidly needing a lot of paper!).

So now we get

T'[X] = T[T[X]]+XT'[T[X]]T'[X]
T'[1] = T[1]+XT'[1]T'[1] = 1+X(T'[1])²

Ie. T'[1]=B

T''[X] = T'[T[X]]T'[X]+T'[T[X]]T'[X]+XT''[T[X]]T'[X]²
+XT'[T[X]]T''[X]
T''[1] = 2B²+B²T''[1]+BT''[1]

And that's a perfectly respectable recursive definition for T''[1]. I'll leave you to write the code. (It does work - this is actually the point which I started from...)

Some random thoughts

You can think of differentiation as annihilating an element of a container, leaving only a hole as the 'trace' that it was there before. So it's pretty weird that annihilating a 1 from a T[1] actually creates an X. It's almost as if T[1] is some kind of vacuum and X is an anti-particle. Weirder still, when we do this twice we get two distinct X's. So it's as if T[1] is some kind of vacuum for fermionic X's. This isn't completely insane. In quantum field theory we use differential operators to create and annihilate particles.

I haven't written the code, but it ought to be possible to go further and define Xⁿ/n! for instances of Ord. This is the type of n-tuples where the elements are distinct and in order. I expect you can construct this using a similar recursive technique to the way I built up the antidiagonal.

I don't know how to get an overline with HTML so I can't use Knuth's notation for this. But I expect that for instances of Ord there's a similar method that can be used to define X(X+1)...(X+n-1)/n!. This is the type of non-decreasing n-tuples, or, equivalently, n-element multisets. I'm not sure, but I think this may conflict with what Abbott et al. say in their paper on Derivatives of Containers where they want to identify exp (X) with multisets. I think it should be the sum of X(X+1)...(X+n-1)/n!, not the sum of Xⁿ/n!. But as that's just a sketch in the final paragraph of the paper, maybe I shouldn't get too worried about it. (If you're still reading Conor...)

Also, many identities satisfied by falling factorial should correspond to isomorphisms implementable in Haskell. And I'm sure there must be an Umbral Calculus connection.

And an important final note: X -> X² is not a functor. So don't even think about differentiating it.

Appendix

Changed my mind. Here's some code for implementing X²/2!.

Call the set of pairs (a,a') with a<a' the subdiagonal. An element of (a,a) either has a lower first element, a lower second element, or the two elements are equal. This is expressed in the type Pair below.

I'll sketch some algebra. Write s(X) for the subdiagonal. So s(X) is a little like X(X-1)/2. It's not hard to see that

s(1) = 0
s(a+b) = s(a)+ab+s(b)

s(ab) = s(a)b²+as(b)

If L[X] is a list of X's, then

L[X]=1+X L[X]

s(L[X]) = XL[X]+s(X)L[X]+Xs(L[X])



> import Test.QuickCheck

> data Pair a u = Lower u | Diagonal a | Upper u deriving Show

> class Ord a => SubDiagonal a u | a -> u where
> twine :: (a,a) -> Pair a u
> untwine :: Pair a u -> (a,a)
> twine' :: (a,a) -> u
> untwine' :: u -> (a,a)
> twine (a,b) = case compare a b of
> LT -> Lower (twine' (a,b))
> EQ -> Diagonal a
> GT -> Upper (twine' (b,a))
> untwine (Lower u) = untwine' u
> untwine (Diagonal a) = (a,a)
> untwine (Upper u) = uncurry (flip (,)) $ untwine' u

> data SubEither a b u v = BothLeft u | Diff a b | BothRight v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (Either a b) (SubEither a b u v) where
> twine' (Left a,Left a') = BothLeft (twine' (a,a'))
> twine' (Left a,Right b) = Diff a b
> twine' (Right b,Right b') = BothRight (twine' (b,b'))
> untwine' (BothLeft u) = let (a,a') = untwine' u in (Left a,Left a')
> untwine' (Diff a b) = (Left a,Right b)
> untwine' (BothRight u) = let (b,b') = untwine' u in (Right b,Right b')

> data SubPair a b u v = LeftDiffers u b b | LeftSame a v deriving Show

> instance (SubDiagonal a u,SubDiagonal b v) => SubDiagonal (a,b) (SubPair a b u v) where
> twine' ((a,b),(a',b')) | a/=a' = LeftDiffers (twine' (a,a')) b b'
> | otherwise = LeftSame a (twine' (b,b'))
> untwine' (LeftDiffers u b b') = let (a,a') = untwine' u in ((a,b),(a',b'))
> untwine' (LeftSame a v) = let (b,b') = untwine' v in ((a,b),(a,b'))

> instance SubDiagonal Bool () where
> twine' (False,True) = ()
> untwine' () = (False,True)

> data SubList a u = FirstDiffers u (Pair [a] (SubList a u))| FirstSame a (SubList a u) | LeftNil a [a] deriving Show

> instance (SubDiagonal a u) => SubDiagonal [a] (SubList a u) where
> twine' ([],(a:as)) = LeftNil a as
> twine' ((a:as),(b:bs)) | a/=b = FirstDiffers (twine' (a,b)) (twine (as,bs))
> | otherwise = FirstSame a (twine' (as,bs))
> untwine' (FirstDiffers u v) = let (a,a') = untwine' u in let (as,bs) = untwine v in (a:as,a':bs)
> untwine' (FirstSame a u) = let (bs,bs') = untwine' u in (a:bs,a:bs')
> untwine' (LeftNil b bs) = ([],b:bs)

> type Natural = Integer

> instance SubDiagonal Natural (Natural,Natural) where
> twine' (a,b) = (a,b-a)
> untwine' (a,b) = (a,a+b)

> main = do
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::Bool,y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[Bool],y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::(Bool,[Bool]),y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::([Bool],Natural),y))
> quickCheck (\(x,y) -> untwine (twine (x,y))==(x::[[(Bool,Natural)]],y))

It's interesting to play with this a little and see what actually gets constructed. For example



*Main> twine ([1..10],[1,2,3,6,5,6,6,8,9,10,11::Integer])
Lower (FirstSame 1 (FirstSame 2 (FirstSame 3 (FirstDiffers (4,2) (Upper
(FirstSame 5 (FirstSame 6 (FirstDiffers (6,1) (Upper (FirstSame 8
(FirstSame 9 (FirstSame 10 (LeftNil 11 [])))))))))))))

Spans of values that are the same in both lists are represented with just one set of values. Individual differences are flagged as such. And of course the whole thing works recursively if you have lists of lists as in the last quickCheck example above.

A Closing Puzzle

Can you define the type of unordered binary trees? An unordered binary tree is either the empty tree, or an unordered pair of unordered binary trees. I'd like to say T=1+T(T+1)/2 = 1+T+T², but those can't be turned directly into a valid definition is Haskell.

I don't know if there's a solution.

Labels: haskell, types

posted by sigfpe at Saturday, September 08, 2007 7 comments

Sunday, September 02, 2007

The Antidiagonal

In a programming language with constrained types we can construct a type like "the type of pairs of X's where the two X's are distinct". But can we make such a type in Haskell? Answering this question will take us on a long journey which I think I'm going to split over three parts. And as usual, just put this blog post into a file called main.lhs and you can run it with ghci -fglasgow-exts -fallow-undecidable-instances.

We can write our question a little more formally. Given a type X, can we form a type U with the property that

X² = U+X

The idea is that the = sign is an isomorphism with the property that the diagonal in X², ie. elements of the form (x,x), get mapped to the right component of U+X. When we are able to do this, we'll call U the antidiagonal of X, and say that X is splittable.

We can express the relationship between U and X through a multiparameter type class



> diagonal x = (x,x)

> class Eq x => AntiDiagonal u x | x -> u where
> twine :: (x,x) -> Either x u
> untwine :: Either x u -> (x,x)

> twine' :: (x,x) -> u
> untwine' :: u -> (x,x)

> twine (x,y) | x==y = Left x
> | otherwise = Right $ twine' (x,y)

> untwine (Left x) = (x,x)
> untwine (Right y) = untwine' y

The isomorphism between X² and X+U is given by twine and untwine. But to save writing similar code over and over again, and to ensure we really are mapping the diagonal of X² correctly, we define these in terms of twine' and untwine'. (Note that twine' is partial, it's only guaranteed to take a value off the diagonal.)

Just to get into the swing of things, here's a simple example:



> instance AntiDiagonal Bool Bool where
> twine' (a,b) = a
> untwine' a = (a,not a)

It's not hard to check that twine and untwine are mutual inverses and that twine . diagonal=Left. You can view this as a special case of 2²=2+2 as 2 is essentially a synonym for Bool.

It looks a lot like what we're trying to do is subtract types by forming X²-X or X(X-1). Much as I've enjoyed trying to find interpretations of conventional algebra and calculus in the context of the algebra of types, subtraction of types, in general, really doesn't make much sense. Consider what we might mean by X-1. If X=Bool then we could simply define



data Bool' = False'

This certainly has some of the properties you might expect of Bool-1, such as having only one instance. But it's not very natural. How should we embed this type back in Bool? There are two obvious ways of doing it and neither stands out as better than the other, and neither is a natural choice. But what we've shown above is that there is a natural way to subtract X from X² because a copy of X appears naturally in X² as the diagonal. So the question is, can we extend this notion to types beyond Bool?

How about a theorem. (I'll give a more intuitive explanation below.)

Theorem

In any commutative semiring fix a and b. If the equation a²=u+a has a solution, and the equation b²=v+b has a solution, then the equations (a+b)²=x+(a+b) and (ab)²=y+ab also have solutions.

Proof

Simply define

x = u+v+2ab

and

y = au+bv+uv.

QED

So if some types are splittable, so are their sums and products. As types form a commutative semiring we see that this theorem, twined with 1²=1+0 allows us to form "X(X-1)" for any type X built purely using non-recursive Haskell data declarations. In fact, we can use the above theorem to define "X(X-1)" for types, and use the notation X² for this. (I hope your browser shows that the exponent here is underlined.) There's a reason I use this notation which I'll get to later. So (a+b)²=a²+b²+2ab and (ab)²=ab²+ba²+a²b².

Here's a more intuitive explanation of that theorem above. Suppose u and v are both in Either a b. Then there are several ways they could be different to each other. For example, they could both be of the form Left _ in which case u = Left u' and v = Left v' and u' and v' must be distinct. They could be of the form Left u' and Right v' in which case it doesn't matter what u' and v' are. A little thought shows there are four distinct cases in total. These can be written in algebraic notation as u+v+ab+ba=u+v+2ab. To make this clearer, here's an implementation:



> data Either' a b u v = BothLeft u | BothRight v | LeftAndRight Bool a b

> instance (AntiDiagonal u a,AntiDiagonal v b) => AntiDiagonal (Either' a b u v) (Either a b) where
> twine' (Left x,Left y) = BothLeft (twine' (x,y))
> twine' (Right x,Right y) = BothRight (twine' (x,y))
> twine' (Left x,Right y) = LeftAndRight False x y
> twine' (Right x,Left y) = LeftAndRight True y x

> untwine' (BothLeft u) = let (a,b) = untwine' u in (Left a,Left b)
> untwine' (BothRight v) = let (a,b) = untwine' v in (Right a,Right b)
> untwine' (LeftAndRight False x y) = (Left x,Right y)
> untwine' (LeftAndRight True x y) = (Right y,Left x)

A similar argument can be carried through for (a,b) leading to:



> data Pair' a b u v = LeftSame a v | RightSame u b | BothDiffer u v deriving (Eq,Show)

> instance (AntiDiagonal u a,AntiDiagonal v b) => AntiDiagonal (Pair' a b u v) (a,b) where
> twine' ((a,b),(a',b')) | a==a' = LeftSame a (twine' (b,b'))
> | b==b' = RightSame (twine' (a,a')) b
> | otherwise = BothDiffer (twine' (a,a')) (twine' (b,b'))

> untwine' (LeftSame a v) = let (b,b') = untwine' v in ((a,b),(a,b'))
> untwine' (RightSame u b) = let (a,a') = untwine' u in ((a,b),(a',b))
> untwine' (BothDiffer u v) = let (a,a') = untwine' u
> (b,b') = untwine' v
> in ((a,b),(a,b'))

This all works very well, but at this point it becomes clear that Haskell has some weaknesses. The type Either () () is isomorphic to Bool so we should be able to use the above code to construct the antidiagonal of Bool automatically. But Haskell doesn't give us access to that information. We can't ask, at runtime, if Bool is the sum of more primitive types. There are a number of solutions to this problem - we can use various more generic types of Haskell, or use Template Haskell. But I'm just going to stick with Haskell and manually construct the antidiagonal. (I wonder what Coq has to offer here.)

So I've solved the problem for 'finite' types built from singles, addition and multiplication. But what about recursive types. Before doing that, let's consider an approach to forming the antidiagonal of the naturals. Haskell has no natural type but let's pretend anyway



> type Natural = Integer

There's an obvious packing of a distinct pair of naturals as a pair of naturals:



> instance AntiDiagonal (Bool,Natural,Natural) Natural where
> twine' (a,b) | compare a b == GT = (False,a-b,b)
> | compare a b == LT = (True,b-a,a)
> untwine' (False,d,b) = (b+d,b)
> untwine' (True,d,a) = (a,a+d)

(I had to use compare to work around a blogger.com HTML bug!) It'd be cool if the code above could have been derived automatically, but alas it's not to be. But we can get something formally similar. Define the natural numbers like this:



> data N = Zero | S N deriving (Eq,Show)

A natural is zero or the successor of a natural. Algebraically this is just N=1+N. Now we wish to find N². Using 1²=0 and the earlier theorem we get M=1²+N²+2N, ie. M=M+2N. Let's code this up:



> data M = Loop M | Finish Bool N deriving Show

> instance AntiDiagonal M N where
> twine' (Zero,S x) = Finish False x
> twine' (S x,Zero) = Finish True x
> twine' (S x,S y) = Loop (twine' (x,y))
> untwine' (Finish False x) = (Zero,S x)
> untwine' (Finish True x) = (S x,Zero)
> untwine' (Loop m) = let (a,b) = untwine' m in (S a,S b)

Note that I've more or less just coded up the same thing as what I did for Either above. Can you see that that this is a disguised version of the code for the antidiagonal of Natural above? Think about the type N=1+N. We can view this as the set of paths through this finite state diagram, starting at N and ending at 1:

Essentially there's just one path for each natural number, and this number counts how many times you loop around. Now consider the same sort of thing with the type M=M+2N, starting at M and ending at 1:

We can describe such paths by the number of loops taken at M, the number of loops taken at N, and a Bool specifying whether we took path 0 or 1 from state 2 to state N. In other words, M is much the same thing as (Bool,Natural,Natural) above! M is a kind of 'compressed' version of a pair of N's. Suppose we want to twine S (S (S Zero)) and S (S (S (S Zero))). Both of these share a S (S (S _)) part. What the type M does is allow you to factor out this part (that's the part that goes into Loop) and the remainder is stored in the Finish part, with a boolean specifying whether it was the first or second natural that terminated first.

Let's step back or a second. Earlier I showed how for any type X, built from addition, multiplication and 1, we could form X². Now we've gone better, we can now form X² even for recursive types. (At least for data, not codata.) We haven't defined subtraction in general, but we have shown how to form X(X-1) in a meaningful way.

Let's try lists of booleans next, [Bool]. We can write this type as L=1+2L. Let's do the algebra above to find P = L²:

P = L²
= (1+2L)²
= 1²+(2L)²+4L
= (2L²+2²L+2²L²)+4L
= 2P+2P+2L+4L
= 4P+6L

In other words P = 4P+6L. The easiest thing is to code this up. Remember that each '=' sign in the above derivation corresponds to an isomorphism defined by twine and untwine so after lots of unpacking, and rewriting 4P+6L as 2L+2L+2P+2L+2P we get



> data SharedList = LeftNil Bool [Bool] | RightNil Bool [Bool]
> | HeadSame Bool SharedList | TailSame Bool [Bool]
> | Diff Bool SharedList deriving Show

> instance AntiDiagonal SharedList [Bool] where
> twine' ([],b:t) = LeftNil b t
> twine' (b:t,[]) = RightNil b t
> twine' (a:b,a':b') | a==a' = HeadSame a (twine' (b,b'))
> | b==b' = TailSame a b
> | otherwise = Diff a (twine' (b,b'))

> untwine' (LeftNil b t) = ([],b:t)
> untwine' (RightNil b t) = (b:t,[])
> untwine' (HeadSame a b) = let (t1,t2) = untwine' b in (a:t1,a:t2)
> untwine' (TailSame a b) = (a:b,not a:b)
> untwine' (Diff a b) = let (t1,t2) = untwine' b in (a:t1,not a:t2)

This looks pretty hairy, but it's really just a slight extension of the M=M+2N example. What's happening is that if two lists have the same prefix, then SharedList makes that sharing explicit. In other words this type implements a form of compression by factoring out shared prefixes. Unfortunately it took a bit of work to code that up. However, if we were programming in generic Haskell, the above would come absolutely for free once we'd defined how to handle addition and multiplication of types. What's more, it doesn't stop with lists. If you try it with trees you automatically get factoring of common subtrees and it works with any other datatype you can build from a Haskell data declaration (that doesn't use builtin primitive types like Int or Double).

So now I can say why I used the underlined superscript notation. It's the falling factorial. More generally Xⁿ=X(X-1)...(X-n+1). You may be able to guess what this actually means - it's an n-tuple with n distinct elements. Unfortunately, you can probably see that generalising the above theorem to n from 2 gets a bit messy. But (and this is what I really want to talk about) there's an amazing bit of calculus that allows you to define a generating function (more like generating functor) that gives you all of the Xⁿ in one go. But before I can talk about that I need to write a blog about generalised tries...

Here are some exercises.

(1) Can you code up the antidiagonal of binary boolean trees, T = 2+T²:



data BoolTree = Leaf Bool | Fork BoolTree BoolTree

(2) There are more efficient ways to define the naturals than through the successor function. Can you come up with a more efficient binary scheme and then code up its antidiagonal?

(3) The antidiagonal of the integers can be approximated by (Integer,Integer). This seems a bit useless - after all, the whole point of what I've written above is to split this up. But we can use this approximation to construct approximations of other types where you do get a payoff. Implement an approximation to [Integer]² this way so that you still get the benefit of prefix sharing. This looks a lot like traditional tries.

Labels: haskell, types

posted by sigfpe at Sunday, September 02, 2007 5 comments

Saturday, July 14, 2007

Data and Codata

In Gödel, Escher, Bach, Hofstadter introduces the programming language Bloop. Bloop is a little like BASIC except that it forces programmers to specify in advance how many times each loop will iterate. As a result, Bloop programs have the handy property that they are guaranteed to terminate. Unfortunately this property also makes it impossible to write something like an operating system in Bloop. To write an OS we need open-ended loops that keep running until the user explicitly chooses to shut the OS down. One solution to this is to write code in Floop. Floop allows us to write unbounded loops, the equivalent of C's while (1) { ... }. The problem with that, however, is that we can write runaway infinite loops that never terminate and never give us any output. Is there some language that lies between Bloop and Floop that can give us unbounded looping when we need it, but which never allows us to hoist ourselves by our petards by writing runaway loops?

Wishing no disrepsect to OS writers, at first blush it might seem that the distinction between a runaway loop and an idle OS is too fine - if we can write an infinite loop that does something useful, then surely we can write a useless one too. But it turns out that there is a very elegant and well-principled way to distinguish between these kinds of loops, and this allows us to write open-ended interactive software in a programming language that nonetheless always produces a well-defined output, no matter what the input. In order to do this we need to distinguish between two kinds of data: data and codata. By ensuring that a function expecting codata never receives data, and vice versa, we can ensure that even programs with open-ended loops always produce a well defined output.

The concepts I want to talk about are very general and can apply to whatever programming language you use. I'm going to use some simple Haskell examples but most of these will translate to other languages. So consider something like


sum [] = 0
sum (a:as) = a + sum as

This sums the elements of a list. Note how it's well behaved as long as we give it a finite list as input. On the other hand, consider


sum' [] = 0
sum' a = sum' (1:a) - 1

This isn't well behaved at all. Except when you input empty lists, it never gives a result. From a mathematical perspective it's not a good definition either, there are many functions that satisfy these two properties. Is there some general principle at work here that allows us to tell immediately that one of these terminates and the other doesn't? We know from Turing that there is no procedure that guarantees we can always tell such programs apart, but in this case there is something that we can easily point to. In the first program, the right hand side of the second line uses the sum function recursively but we only apply it to a strict subpart of the input, its tail in fact. In the second example we apply sum' to something that contains the argument. The former function is using what is known as structural recursion, and it's not hard to convince yourself that structural recursion, applied to finite datastructures, always terminates.

So if we limit ourselves to structural recursion we can guarantee our programs will always terminate. But what about a definition like:


fact 0 = 1
fact n = n * fact (n-1)

That doesn't appear to use structural recursion. But we can view it as such like this. Define the natural numbers as follows:


data Nat = Zero | S Nat

0 is represented as Zero, 1 is represented as S Zero and so on. We can represent every natural this way. Here's the important thing: if n>0 then n-1 is simply a subpart of n. So we can view this kind of recursion as a type of structural recursion.

(In fact, by a curious quirk of the Haskell 98 standard we can rewrite our definition to look more like a structural recursion:


fact' 0 = 1
fact' (n+1) = (n+1) * fact' n

I'm guessing this feature is in Haskell precisely so that people can 'pretend' they are using structural recursion with the + in n+1 serving a role as a kind of pseudo-constructor.)

So we have a nice rule for ensuring our code terminates. But sum fails when applied to infinite lists. Should we simply rule out infinite datastructures? That seems a bit drastic. The example that convinced me to look into Haskell was


fib = 1 : 1 : zipWith (+) fib (tail fib)

We really don't want to rule out such a succinct definition of the Fibonacci numbers. But how can we allow such structures when we have functions like sum sitting around. Applying sum to fibs will obviously cause non-termination.

Let's consider another example:


sumSoFar x [] = [x]
sumSoFar x (y:ys) = x : sumSoFar (x+y) ys

Like sum, this fails to terminate for an infinite input. But unlike sum, it's possible to make sense of it. If the inputs were 0 and the infinite list [1,1,1,1,...] then the result would be [0,1,2,3,...]. The program might not terminate, but from a mathematical perspective this is a completely well defined function. What's more, suppose the input list represented a stream of data being input at a keyboard, and that the output was displayed on a screen one element at a time, then we'd have a simple cash register. This program might not terminate, but it's completely well behaved. Note that this could only work in a lazy language. A strict language would want to evaluate the entire list before outputting anything. But in a lazy language we can start outputting the beginning of the list before the rest of it is computed.

From the point of view of using infinite lists, it's sum that's badly behaved, and sumSoFar that's well behaved. Again, can we point to a simple distinction between these two programs that explains this? Turns out we can, and in some sense it's dual to what we said above. sumSoFar is well behaved because when we recursively call sumSoFar on the right hand side we do so from inside a list constructor. (Remember that : is the list constructor.) This is known as guarded recursion and it guarantees that even though our programs don't terminate, they still define a unique mathematical function and result in a well behaved program. In the case of sumSoFar, each time we look at another element of the result, we trigger another lazy evaluation that terminates. But the right hand side won't just run off and compute endlessly until we do that triggering because the recursion is 'guarded' within a constructor. (That, by the way, was the main point of this article, so you can probably relax now.)

Note the duality: in structural recursion we 'deconstruct' the argument and then we're allowed to recurse. In guarded recursion we recurse first, and then we're allowed to use the constructor.

So we've almost achieved our goal of describing rules to allow open-ended loops because we've managed to give a rule for writing functions that are guaranteed to be well-behaved even though they manipulate infinite data structures. But we're not quite home yet - we still can't have functions like sum coexist with infinite lists. How can we ensure that an infinite list is never handed to sum?

Consider the definition of Haskell lists. It's something like this:


data [a] = [] | a : [a]

I.e. a list of a's is either the empty list [] or it's made from an a and a list of a's.
You can think of this as an equation in [a]. In Haskell we take this as uniquely defining what [a] is, but in reality there is more than one solution to this equation. Consider the type consisting of only finite lists. That satisfies this equation. A finite list is either an empty list, or an element followed by a finite list. Similarly a possibly infinite list is either an empty list, or an element followed by a possibly infinite list. There is an ambiguity. Finite lists form, what is in some sense, the smallest possible solution to this equation. The possibly infinite lists form the largest possible solution. Haskell takes the largest possible solution.

But suppose we were to distinguish between these two different solutions. We could use the keyword data to mean the smallest solution and codata to mean the largest solution. The former represents data, and it's always finite. The latter represents what we call codata, and it's possibly infinite. And now we can refine our rules for well-behavedness. Consider data and codata to be distinct types. In a strongly typed language this immediately allows us to restrict sum to data, not codata. The rule is: you're only allowed to use structural recursion with data and guarded recursion with codata. With that rule, we're guaranteed that our recursions will always be safe, and yet that we can still have open-ended loops in our code. Sometimes these are called recursion and corecursion respectively.

And now we can go a little crazy with language. When we want to prove that a structurally recursive program terminates we use induction. This doesn't work straightforwardly for corecursion, so instead we use a principle called coinduction. Recursive programs typically terminate. Corecursive programs don't necessarily terminate, but they're still well-behaved as long as they keep on going whenever we give them input. We can call this cotermination. And so on... I'm not going to say what coinduction is because I'd have to talk about bisimulation, and this post would get way too long.

So now a mathematical aside. The reason for all the "co"s is that data and codata have categorical descriptions and they turn out to be dual to each other. But you don't hear mathematicians talking about corecursion and coinduction much. Why not? Well one of the axioms of set theory is the Axiom of Foundation. One way of interpreting this is the statement that there is no infinite sequence

...a₃∈a₂∈a₁∈a₀.

So even though we can construct infinite lists in mathematics, we can't construct 'infinitely deep' lists. This means that in mathematics we can use a form of structural recursion. In fact, the familiar principle of induction follows from this. So for many of the things that mathematicians do, induction works fine. But if we decide to use a non-standard variation of set theory where the axiom of foundation doesn't hold we can no longer use recursion. For example the Axiom of Extension says that two sets are equal if their elements are equal. This is a recursive definition, but it's useless in the presence of a set a such that a∈a. At this point mathematicians need a principle of coinduction. And for more on that, I refer you to Vicious Circles.

Oh...time for a quick rant. Over the years I've seen a few people argue that there's something fundamentally wrong with the notion of the algorithm because it doesn't apply to the kind of open-ended loop we see in operating systems and interactive applications. Some have even gone further to suggest that somehow mathematics and computer science are fundamentally different because mathematics can't seek to describe these kinds of open-ended phenomena. As I've tried to show above, not only are there nice ways to look at open-ended computations, but from a mathematical perspective they are precisely dual, in the category theoretical sense, to terminating computations. It may be true that mathematicians sometimes spend more time with things, and computer scientists with cothings. But this really isn't such a big difference and the same langiage can be used to talk about both.

I learnt all this from a variety of sources including Turner's paper on Total Functional Programming. My original motivation for trying to understand this stuff came from this post on Ars Mathematica. Curiously, that post had a significant effect on me. It made me realise there was this entire co-universe out there that I knew nothing about, and I was forced to make a real effort with learning Haskell because Haskell was the programming language closest to the mathematical notation used to talk about this co-universe. (Thanks Walt, but sorry to everyone who wants to see less Haskell here!)

I didn't really get why anyone would want to distinguish between data and codata until I saw some clues in some slides by Altenkirch and some comments by Conor McBride. I'm still not sure anyone says it quite as clearly as this: distinguishing between data and codata means we can allow the coexistence of infinite lists, structural recursion and open-ended loops, without risk of causing bad behaviour.

Labels: haskell, types

posted by sigfpe at Saturday, July 14, 2007 26 comments

Saturday, June 30, 2007

Monads from Algebra and the the Gray Code from Groups

There is a close association between algebraic structures and monads. I've mentioned this in passing before but I think it's interesting to work out some details. Among other things, it gives us a way of converting your favourite algebraic structures into domain specific languages embedded in Haskell.

First an aside. Haskell monads are in some sense only an approximation to mathematical monads. In fact, Haskell is only an approximation to mathematics. It's easy to define a Haskell function f, say, such that x == y but f x /= f y. Once you do such a thing you can no longer reason about Haskell functions with the assumption that == represents equality in the mathematical sense. (For an example, see Saizan's comment here.) So in the following I'm going to assume that we've limited ourselves to functions such that x == y implies that f x == f y, x < y and y < z imply x < z and so on. I'd love to see some way, in Haskell, to make explicit 'promises' about the properties of functions. (I guess that's what they call programming by contract.) But for the following we'll just assume it holds.

The first abstract algebraic structure that people study is usually the group, but I'm going to start with something simpler, the monoid. A monoid is a set M (called the underlying set of the monoid) with a binary operator, ·:M×M→M and identity element e such that e·x = x·e = x and x ·(y·z) = (x·y)·z for all x,y,z in M. By abuse of notation we'll often use M for the name of the monoid as well as the underlying set. Monoids are easy to come by and there are lots of examples. Any time you have a sequence of operations that can be applied, or not, in any order, to some system, you have a monoid. Sequences of operations form the underlying set of the monad and the binary operator means "do what's on the right" followed by "do what's on the left". For example the set of uniform scaling operations we can perform on a circle form a monoid. Write a scaling by a factor of x as s(x). If we double the size of a circle, and then triple it we have s(2) followed by s(3), written s(3)·s(2). Note that we have s(m)·s(n) = s(mn) and that s(1)·s(m)=s(m)·s(1)=s(m) so s(1) plays the role of the identity, e. Also note that for all m apart from 0 we have s(m)·s(1/m)=e so every scaling apart from s(0) has in inverse. Note however that this monoid is special because for all x and y in it, x·y=y·x. This doesn't always hold in monoids and when it does hold our monoid is said to be commutative.

If we have a type of algebraic structure then we can often form "free" versions of that structure. For monoids we get free monoids. There are many different ways of looking at "freeness" and I'm going to go through some of them informally:

Given any monoid there are equations satisfied by its elements. From the above example we have that x·y=y·x and also that x·e·x=x·x. But notice how there is a big difference between these equations. The former doesn't hold in all monads, but the latter does. A monoid is said to be free when the only equations that hold follow from equations that hold in every monoid. You can think of a free monoid as being a generic monoid. It has no special properties above and beyond being a monoid. Given a set S, the free monoid generated by S is the smallest free monoid containing the set S and with no equations relating the elements of S. Write this monoid as FS. For example, suppose S = {x,y}. Then we know that e, x and y are all in FS. We also know that x·x, x·y, y·x and y·y are all in S. Importantly we know that all 4 of these elements are distinct because we know there can be no equations connecting them beyond those that define a monoid. In fact, it's not hard to see that the elements of FS are essentially just the (possibly zero length) strings of x's and y's.
Given a set S, consider its elements to be 'labels' for unknown variables. In fact, just use the elements of S as variables. Then we can also consider the free monoid generated by S to be the set of "monoid expressions" in these unknowns. By "monoid expression" we just mean functions of these variables that can be written using the language of monoids, ie. the symbols e and ·. If S = {x,y} then examples of such expressions are e, x, y, x·y and so on. I hope it's not hard to see that this is simply another description of the same thing as in the previous paragraph.
Another way to think of free monoids requires a tiny bit more algebra. Define a monoid homomorphism from one monoid, M, to another, N, to be a function f:M→N on the underlying sets such that f(e)=e and f(x·y)=f(x)·f(y). A bijective monoid homomorphism is called an isomorphism. If there is an isomorphism between two monoids then in some sense they are the same monoid. Note that e and · are being overloaded here - we have two different monoids and these symbols have different meanings in each one. Now we can define the free monoid generated by S to be the monoid, M, such that (1) there is a function i:S→M (2) given any monoid N, and any function f:S→N, then f can be factored as f' o i where f' is a monoid homomorphism.
Very vaguely, the set S lives in the category of sets. But FS lives in the category of monoids. FS is the closest approximation to S in the category of monoids. The previous property gives a sense of what this means: anything that can be said about a function from S to a monoid N can be said in the language of monoids using a homomorphism from FS to N. Don't worry if this is too vague as I won't be using it below - but it may click in some people's minds.

The important thing here is that the operator F, ie. the "free monoid generated by" operator, forms a monad. It's tempting to think that monads got their name from monoids for this reason, but I don't think it is because just about any "free X generated by" operator forms a monad. Haskell monads give rise to DSLs via do-notation. So this means that algebraic structures give rise to monads, which may give rise to Haskell monads, and hence DSLs.

So now to explain why F forms a monad. Note that just about everything I say here works with other kinds of algebraic structures ranging from monoids through groups to vector spaces. Consider our set S={x,y} above. For convenience, let's drop the · and write the binary operator as multiplication in the usual way. Given an element of S we can easily get an element of FS. In fact, we have an embedding i:S→FS with i(x) = x. This is an abuse of notation. The element on the LHS is an element of S and the x on the RHS is a variable labeled with the symbol x, but we also write this as x because v_x, or something like that, would be a pain in the ass to keep writing.

Now think about the elements of F(FS). These are strings of elements of FS, ie. strings of monoid expressions in x and y. We could write a typical member as (xyx)(e)(xxy) where the xyx, e and xxy are each elements of FS and I'm using parentheses to make clear which bits they are. It's tempting to say that this is simply xyxxxy, but that would be wrong. The former is an element of F(FS) and the latter is an element of FS. It would be clearer that these weren't equal if we used the v notation to write v_xyxv_ev_xxy. But in this case the oscurity is to our advantage. Even though (xyx)(e)(xxy) doesn't equal xyxxxy, the notation strongly suggests that we define a function that maps the first to the second. In fact, we can define a function m:F(FS)→FS that essentially just erases parentheses.

So we have functions i:S→FS and m:F(FS)→FS. Looks suspiciously like a monad. In fact, i and m satisfy the monad laws (exercise!) and make F into a monad.

So what monad is this? It's probably not hard to guess: elements of FS are finite strings of elements of S. So it's essentially the List monad. Unfortunately, Haskell allows you to form infinite lists and so the correspondence isn't 100% precise. Nonetheless, it's good enough that if you hadn't already invented the List monad (as in Haskell monad), you probably would if you had considered the free monoid monad (as in mathematical monad). i is just the embedding \x -> [x] and m is concat, which essentially just erases brackets. So you can think of [x,y,z] has the Haskell way to write the monoid element xyz.

In fact, if you repeat the above discussion with other algebraic structures that I and others have discussed you'll get other familiar monads (modulo details like being able to create infinite lists). Here's a table (with apologies for bad formatting caused by blogger.com):

Algebraic Structure DSL

Monoid Combinatorial search

M-Set "Write-only" side effects

Vector Space Probability theory/Quantum Mechanics

Semirings Two player game tree search

Modules over tropical semiring^* Min-sum algorithm

Group ???

(Note that the above table is approximate in the sense that sometimes you may need to restrict to instances of Eq and tweak the code slightly to make the Haskell monad behave like the mathematical one.)

Notice the gap in the "group" row. A group is a monoid in which every element has a left and right inverse. The monad is the "free group generated by" monad and I'll call it G. It doesn't correspond to any of the standard Haskell monads. So firstly - if free monoids give lists, what does the underlying datatype for free groups look like? Go back to our set S = {x,y}. GS contains e, x and y as well as all of the products of x and y that appear in FS. But additionally GS contains the inverses of x and y, x^-1 and y^-1. And of course you need all strings of x, y and their inverses. But do you need more? What about (xy)^-1y? Well we can expand out the inverse using (xy)^-1=y^-1x^-1. The net effect is that the free group contains precisely all (possibly empty) strings of x, y, x^-1 and y^-1, where substrings like xx^-1 and x^-1x are removed. We can model this in Haskell using the following type:



> import Control.Monad
> data Group a = G [Either a a] deriving Show

We're using Left x to represent x and Right x to represent x^-1. We implement >>= so that it uses the inverse of product formula above. It also ought to cancel out terms like xx^-1 but to do that requires that we restrict the monad to instances of Eq and if I do that I risk castigation from Saizan. So I'll leave out the cancellation to get the following Haskell monad:



> instance Monad Group where
> return x = G [Left x]
> G x >>= f = G $ concatMap g x where
> g (Left y) = let G u = f y in u
> g (Right z) = let G u = f z in reverse (map (either Right Left) u)

But what purpose might this serve? Try these:



> test1 = sequence $ replicate 4 (G [Left 0,Left 1])
> test2 = sequence $ replicate 4 (G [Left 0,Right 1])

The first does something almost identical to what you might expect from the ordinary list monad. But note the ordering in the second. This is the Gray code monad! By using Left and Right we can control the direction in which future combinatorial searches are carried out. Admittedly not the most useful monad, but it's curious that we do get something recognisable from this abstract nonsense. And maybe you spotted the "beautiful" Haskell implementation of the power set function on reddit recently. Here's a cute twist on that:



> powerset1 = filterM (const $ G [Left False,Left True])
> powerset2 = filterM (const $ G [Left False,Right True])

The first gives a result similar to the original, but the second lists the subsets in such a way that each element differs from the previous one by one element.

I wonder what other algebarically motivated monads are waiting to be discovered.

And I hope to write something about deriving comonads from coalgebras, as soon as I've read this. It looks really intriguing. Problem is, I'm having trouble making sense of it.

And sorry, this is a bit incoherent today. I only get a short period of time to write this stuff up and I blew most of mine this week in traffic jams. As usual, if the text makes no sense, you can be sure the code above works as I just tested it. But ask questions and complain about errors...

^* The tropical semiring is the proper name for the (R,min,+) semiring I talked about earlier.

Update: Saizan's doing a great job of keeping me honest. He noticed that I'd omitted a step. I was originally finding the inverse of a free group element by simply reversing the elements in the product. But I was failing to flip the polarities of each of the elements (so to speak) despite having spelled out in the text exactly what I needed to do. The code is now fixed.

Labels: haskell, monad

posted by sigfpe at Saturday, June 30, 2007 14 comments

A Neighborhood of Infinity

Saturday, February 11, 2012

Using Lawvere theories to combine effects

Saturday, June 14, 2008

Categories of polynomials and comonadic plumbing

Sunday, April 27, 2008

Infinitesimal rotations and Lie algebras

A Lie Group

A Lie Algebra

Appendix (the bits of code I left out above)

Tuesday, February 05, 2008

A Third Order Quine in Three Languages

Saturday, February 02, 2008

Purely functional recursive types in Haskell and Python

Saturday, September 08, 2007

Tries and their Derivatives

Part 2

Some random thoughts

Appendix

A Closing Puzzle

Sunday, September 02, 2007

The Antidiagonal

Theorem

Proof

Saturday, July 14, 2007

Data and Codata

Saturday, June 30, 2007

Monads from Algebra and the the Gray Code from Groups

About Me

Previous Posts

Archives