Saturday, June 14, 2008
Categories of polynomials and comonadic plumbing
- Simply introduce a new global constant. You could name it x and write something like x=1.23456 and refer to x throughout your code. This has the advantage of being easy to implement.
- Write all of your code in monadic style and make use of the reader monad. This is intrusive in the sense that you may have to make many changes to your code to support it. But it has the advantage that all of your functions now explicitly become functions of your global constant.
Now I’m going to roughly sketch a more categorical view of both of these approaches. So let’s restrict ourselves to the subset of Haskell that corresponds to typed lambda calculus without general recursion so that we know all of our functions will be total and correspond to the mathematical notion of a function. Then all of our functions become arrows in the category that we’ll call Hask.
Firstly consider approach (1). Suppose we want to introduce a new constant, x, of type A. Category theory talks about arrows rather than elements of objects, so instead of introducing x of type A, introduce the function x:1->A where 1 is the terminal object in Hask, normally called (). An element of A is the same thing as an element of 1->A, but in the latter case we have an arrow in the category Hask.
Before continuing, let me digress to talk about polynomials. Suppose we have a ring (with an identity) R. We define R[x], where x is an indeterminate, to be the ring of polynomials in x. Another way to describe that is to say that R[x] is the smallest ring containing R and an indeterminate x, that makes no assumptions about x other than those required to make R[x] a ring. For example we know that (1+x)(1-x)=1-x2, because that must hold in any ring. Given a polynomial p in R[x] we can think of it as a function fp from R to R. fp(a) is the value we get when substituting the value of a for x in p. So a polynomial in R[x] is the same as a function from R to R that can be written in terms of elements of R, multiplication and addition.
We can do the same with category theory. Given a category A we can ask for the smallest category extending A and containing an indeterminate arrow x:1 -> A. Just as with polynomials we have to allow all possible arrows that can be made by composing arrows of A with x. The resulting expressions for arrows will contain x as a free variable, just like the way x appears in polynomials. In fact, by analogy we can call the resulting category, A[x], the category of polynomials in x:1->A. In the special case A=Hask, you can see that Hask[x] is the category of Haskell functions extended by a new constant of type x:1->A but assuming no equations other than those necessary to make Hask[x] a category. Just as an arrow in Hask is a Haskell function, an arrow in Hask[x] is a Haskell function making use of an as yet undefined constant x.
(I've glossed over some subtleties. Just as we need a suitable equivalence relation to ensure that (1+x)(1-x)=1-x2 in R[x], we need suitable equivalence relations in our category. I'll be showing you where to find the missing details later.)
Here's the implementation of a function, h, making use of a constant x:
(Note that I'll be using Edward Kmett's category-extras shortly so I need some imports)
> import Control.Monad.Reader
> import Control.Comonad
> import Control.Comonad.Reader
> x = 1.23456
> f a = 2*a+x
> g a = x*a
> h a = f (g a)
> test1 = h 2
Now consider the second approach. The easiest thing is to just give an implementation of the above using the reader monad:
> f' a = do
> x <- ask
> return $ 2*a+x
> g' a = do
> x <- ask
> return $ x*a
> h' a = return a >>= g' >>= f'
> test2 = runReader (h' 2) 1.23456
Note how, as is typical in monadic code, I have to plumb f' and g' together using >>= so that 1.23456 is passed through f' and g'. Previously I've described another way to think about the composition of monadic functions. Using >>= we can compose functions of type a->m b and b->m c to make a function of type a->m c. The result is that given a monad we can form the Kleisli category of the monad. The objects are the same as in Hask, but an arrow from a->b in the Kleisli category is an arrow of type a->m b in Hask. It's not hard to show this satisfies all of the axioms of a category. When we program in the reader monad it's a bit like we've stopped using Hask and switched to the Kleisli category of the reader monad. It's not quite like that because we used functions like +. But in theory we could use lifted versions of those functions too, and then we'd be programming by composing things in the Kleisli category. If we call the reader monad R then we can call the corresponding Kleisli category HaskR. (Strictly speaking that R needs a subscript telling is the type of the value we intend to ask for.)
So here's the important point: Hask[x] is the same category as HaskR. In both cases the arrows are things, which when supplied a value of the right type (like 1.23456), give arrows in Hask from their head object to their tail object.
But there's another way to do this. We can use the reader comonad:
> f'' a = 2*extract a+askC a
> g'' a = extract a*askC a
> h'' a = a =>> g'' =>> f''
> test3 = runCoreader (h'' (Coreader 1.23456 2))
In a similar way, we're dealing with arrows of the form wa -> b and we can compose them using =>>. These arrows form the coKleisli category of the reader comonad, S, which we can write HaskS. So we must have
Hask[x]≅HaskR≅HaskS.
Now some back story. Over 20 years ago I was intrigued by the idea that logic might form a category with logical ‘and’ and ‘or’ forming a product and coproduct. I came across the book Introduction to Higher Order Categorical Logic by Lambek and Scott for ₤30.00. That’s ₤60.00 at today's prices, or about 120ドル.00. On a student grant? What was I thinking? And as it bore no relation to anything I was studying at the time, I barely understood a word of it. I was probably fairly applied at that point doing courses in stuff like solid state physics and electromagnetism as well as a bit of topology and algebra. I doubt I'd heard of lambda calculus though I could program in BASIC and APL. So there it sat on my bookshelf for 22 years. Periodically I’d look at it, realise that I still didn’t understand enough of the prerequisites, and put it back on the shelf. And then a month or so ago I picked it up again and realised that the first third or so of it could be interpreted as being about almost trivial Haskell programs. For example, on page 62 was
Proposition 7.1
The category A[x] of all polynomials in the indeterminate x:1->A over the cartesian or cartesian closed category A is isomorphic to the Kleisli category AA=ASA of the cotriple (SA,&epsilonA,δA).
The language is a little different. Lambek and Scott used the term cotriple instead of comonad and Kleisli category where I’d say coKleisli category. δ and ε are cojoin and coreturn. And Lambek and Scott's theorem applies to any cartesian closed category. But after staring at this claim for a while it dawned on me that all it was really saying was this: here are two ways to introduce new constants into a category. But there’s no way I would have seen that without having practical experience of programming with monads. Learning Haskell has finally paid off. It’s given me enough intuition about category theory for me to get some return on my ₤30.00 investment paid to Heffers all those years ago. I expected to take this book to my deathbed, never having read it.
Anyway, for the details I left out above, especially the correct equivalence relation on Hask[x], you'll just have to read the book yourself.
Also, note the similarity to the deduction theorem. This theorem says that if we can prove B, assuming A, then we can deduce A implies B without making any assumptions. It unifies two way to introduce a proposition A, either as a hypothesis, or as an antecedent in an implication. In fact, the above theorem is just a categorical version of the deduction theorem.
Also note the connection with writing pointfree code. In fact, the pointfree lambdabot plugin makes use good use of the reader monad to eliminate named parameters from functions.
I’m amazed by seeing a book from 1986 that describes how to use a comonad to plumb a value through some code. As far as I know, this predates the explicit use of the reader monad in a program, Wadler and Moggi’s papers on monads, and certainly Haskell. Of course monads and comonads existed in category theory well before this date, but not, as far as I know, for plumbing computer programs. I’d love to hear from anyone who knows more about the history these ideas.
Labels: comonads, haskell, mathematics, monad
posted by sigfpe at Saturday, June 14, 2008 23 comments
Sunday, April 27, 2008
Infinitesimal rotations and Lie algebras
Firstly a bit of Haskell administration:
> {-# OPTIONS -fno-warn-missing-methods #-}
Now we need some quick and dirty matrix code:
> data Matrix a = M [[a]] deriving (Eq,Show)
> instance Functor Matrix where
> fmap f (M a) = M $ map (map f) a
> instance Num a => Num (Matrix a) where
> M a * M b = M $ mult a b
A Lie Group
What I'm going to do is start by constructing elements of the group of 3D rotations, otherwise known as SO(3), and show how there's another algebraic structure hidden inside it. So let's make some rotation matrices:
> rx theta = M $ [[1,0,0],
> [0,cos theta,-sin theta],
> [0,sin theta,cos theta]]
> ry theta = M $ [[cos theta,0,sin theta],
> [0,1,0],
> [-sin theta,0,cos theta]]
> rz theta = M $ [[cos theta,-sin theta,0],
> [sin theta,cos theta,0],
> [0,0,1]]
These are the three rotations around the x-, y- and z-axes. It's traditional to build arbitrary rotations through the use of Euler angles:
> euler [a,b,c] = rx a*ry b*rz c
The 3D rotations form an example of a Lie group. (A Lie group is essentially a group where the operations like multiplication are differentiable.)
Any 3D rotation can be constructed from a single application of euler. But notice how there's a bit of ugliness in this function. I've made an arbitrary choice of which order to apply the rotations in. I could have defined:
> euler' [a,b,c] = rz c*ry b*rx a
And it's easy to show that euler≠euler'. This is because rotations don't commute. In other words, for rotations, a*b≠b*a.
We can measure the non-commutativity. Remember that any rotation has an inverse. For example rx theta*rx (-theta) gives us the identity matrix because one of these two rotations 'undoes' the other. Given any two rotations we can construct what is known as their commutator:
> commutator a b = inverse a*inverse b*a*b
The idea is that we first perform b, then a, then undo b and then undo a. If a and b commute then this expression can be rearranged to inverse a*inverse b*b*a and then the inverses cancel leaving us with the identity matrix. If they don't commute then we end up with a non-identity matrix. So the comuutator measures the extent to which matrices don't commute.
As I'm feeling lazy, I don't feel like writing inverse. Instead, as I'm only going to work with rotations, I'll use the fact that the inverse of a rotation matrix is the transpose and define:
> inverse = transpose
Try playing with expressions like commutator (rx 1) (ry 2). Note how the numbers quickly get messy. Try to write down closed form expressions for applications of euler and you'll see how complex things can get.
A Lie Algebra
But there's another way to approach the rotation group - through 'infinitesimal' rotations. In my earlier article I just talked about infinitesimal group operations in a hand-wavey way. Now I'm going to make the notion more rigorous. We just need to introduce an infinitesimal number, d, whose square is zero. I've talked about this a lot before so I'm borrowing my earlier code and defining:
> d :: Num a => Dual a
> d = D 0 1
If you try it you'll see that d*d is zero.
Now we can try making infinitesimal rotations:
> rot1 = euler [d,2*d,3*d]
Note how when we evaluate this we get 'nice' numbers. No need to worry about those trig functions any more. And if you look closely at rot1 you'll see that it's essentially the identity matrix plus an infinitesimal part. We can pull the infinitesimal part out using fmap im rot1. You may be able to guess how to build it from the arguments to euler. But first, try evaluating rot2:
> rot2 = euler' [d,2*d,3*d]
It's the same! Try other infiitesimal rotations. When dealing with infinitesimal rotations it doesn't matter what order you apply them in, you get the same result. Working with infinitesimal rotations is looking much easier than working with full=size rotations. In fact, it gets better. Try defining
> rot3 = euler [5*d,-d,2*d]
Now look at fmap im (rot1*rot3) and compare with fmap im rot1 and fmap im rot3. We can multiply infinitesimal rotations simply by adding their infinitesimal parts. In fact, we can define
> star [a,b,c] = M $ [[0, -c, b],
> [c,0,-a],
> [-b,a,0]]
So we have:
fmap im (euler [a*d,b*d,c*d]) == star [a,b,c]
and
fmap im (euler [a*d,b*d,c*d]*euler [u*d,v*d,w*d]) == fmap im (euler [(a+u)*d,(b+v)*d,(c+w)*d])
Not a single trig expression anywhere!
So we have a simplified way of viewing rotations by looking at infinitesimal rotations. A triple [a,b,c] can be thought of as representing an infinitesimal rotation through star and instead of multiplying matrices we just add the triples elementwise. These triples, together with the binary operation of addition form an example of a Lie algebra. But there's a piece missing. We have lost all information about the non-commutativity of the rotations. It's one thing to simplify, but it's another to lose an important feature of what you're looking at.
The problem is that d is 'too small'. We need an infinitesimal that doesn't go to zero the moment you square it, but is still, in some sense, infinitesimally small. We could rewrite the Dual type. But there's a trick. Define:
> e :: Num a => Dual (Dual a)
> e = D (D 0 1) (D 1 0)
(If you think of Dual as forming a tensor product as I described here then e=1⊗d+d⊗1.)
You can check that e^2 is non-zero but e^3 is zero.
Now when we compute a commutator we get something a little different:
> comm1 = commutator (euler [e,0,0]) (euler [0,e,0])
fmap re comm1 is essentially the identity as before. But if we look at fmap im comm2 there's a non-zero infinitesimal piece which is different from what we had when we worked with d. This infinitesimal piece is in fact proportional to e^2. As (im . im) (e^2) is a half, we can extract the coefficient of e^2 from comm1 using
> im2 x = im (im x)/2
In fact, we have fmap im2 comm1 == star [0,0,1]. So by choosing infinitesimals that aren't too small, we haven't lost information about non-commutativity. In fact, a bit of experimentation may convince you that with:
> shrink u = fmap (e*) u
we have:
fmap im2 (commutator (euler (shrink u)) (euler (shrink v))) == star (u `cross` v)
So let's step back a bit and recapitulate all this in something approaching English:
Given a tiny rotation we can represent it as three Euler angles a, b, c, all of which are tiny. We can think of a, b and c as forming a vector [a,b,c]. When we do this, apart from an even smaller error, multiplication of rotations becomes ordinary addition of vectors and the order of rotations isn't significant. But if we choose not to ignore this small error we see that a rotation represented by u and a rotation represented by v don't quite commute and the order does matter. The size of this error is measured by the cross product of u and v. This is intuitively plausible, we'd expect that rotations defined by vectors in a similar direction would be closer to commuting, and this is reflected in the fact that the cross product is zero for parallel vectors.
So now I can fill in the missing piece from the description of the Lie algebra I gave above. The Lie algebra so(3) consists of the 3d vectors (representing infinitesimal rotations), addition of vectors (representing multiplcation of infinitesimal rotations) and the cross product (measuring the amount by which small rotations fail to commute). This picture doesn't just apply to the rotations, a similar one applies for any Lie group. We get the same pattern of a simplified form of multiplication and a binary operation that measures non-commutativity.
But you might still ask if there's something missing. What if we defined f so that f^4 is zero but f^3 isn't? Would we extract more information about the non-commutativity of rotations? Well the interesting fact, which I won't prove here, is that it doesn't. Almost everything you need to know about a Lie group can be extracted from its Lie algebra. In fact, from a group's Lie algebra you can almost recover the original group. (In fact, what you recover is its universal cover.) But notice how there are no trig formulae involved when talking about Lie algebras. Lie algebras give a really nice way to study Lie groups without getting your hands too dirty. But this isn't the only reason to study Lie algebras, many physical properties arising from symmetries are more naturally studied through the Lie algebra because Lie algebras arise from Lie groups as soon as you start differentiating things. In particular, Lie algebras play a major role in Noether's theorem, one of the cornerstones of modern theoretical physics.
In summary then, I hope I've given some flavour of Lie algebras. Using infinitesimals you can get your hands on them directly without the use of a large amount of mathematical machinery. There has been some difficult stuff here, but I'm hoping that the freedom you now have at the Haskell prompt to play with the things I've been talking about will make up for the inadequacies of my explanations.
One last word: in principle we could do the same with E8. But we'd need 248 by 248 matrices, and the equivalent of euler would need 248 parameters. (That's a coincidence, in general the number of parameters needed to define an element of a Lie group isn't equal to the dimension of the matrix, but it is for SO(3) and E8).
Appendix (the bits of code I left out above)
Defining the infinitesimals:
> data Dual a = D a a deriving (Show,Eq)
Extract the 'full-size' and 'infinitesimal' parts of a number:
> re (D a _) = a
> im (D _ b) = b
> instance Num a => Num (Dual a) where
> fromInteger i = D (fromInteger i) 0
> (D a a')+(D b b') = D (a+b) (a'+b')
> (D a a')-(D b b') = D (a-b) (a'-b')
> (D a a')*(D b b') = D (a*b) (a*b'+a'*b)
> instance Floating a => Floating (Dual a) where
> cos (D a a') = D (cos a) (-sin a*a')
> sin (D a a') = D (sin a) (cos a*a')
> instance Fractional a => Fractional (Dual a)
Some useful matrix and vector operations:
> mult a ([]:_) = map (const []) a
> mult a b = zipWith (:) (map (dot (map head b)) a) (mult a (map tail b))
> transpose (M a) = M $ transpose' a where
> transpose' [] = repeat []
> transpose' (xs : xss) = zipWith (:) xs (transpose' xss)
Some simple vector operations
> dot a b = foldr (+) 0 $ zipWith (*) a b
> cross [a,b,c] [d,e,f] = [b*f-c*e,c*d-a*f,a*e-b*d]
Labels: haskell, mathematics, physics
posted by sigfpe at Sunday, April 27, 2008 0 comments
Saturday, January 26, 2008
The Type that Should Not Be
> import Prelude hiding ((^))
I've been pondering a type that's so mind-bogglingly weird that I can no longer think about it without frying my neurons. So instead I'll just dump the contents of my brain here so you can fry your neurons too.
Here's an interesting type to warm up with. In Haskell:
> data X = Tree [X]
At first look it seems not be well-founded in the sense that a value of type X is a list of X's and we haven't yet defined what X's are. But we don't need an X to make an empty list of X's. So we can build elements inductively starting with Tree []. If you think of Tree [] as being a leaf, you can see why I called this type Tree (though they are trees whose limbs can have infinite sequences of children).
Even without the empty set forming a 'foundation' we can still build circular elements like
> x = Tree [x,x]
Now lists are ordered and can have repeated elements. What about if we eliminate these two properties. The way to do that would be with a set type. So how do we represent sets in Haskell. Well certainly not with Haskell's Data.Set because that only represents finite sets. On approach to supporting infinite sets is to identify a set with a predicate for telling whether an element is a member of a set. In other words, the type X→Bool makes a tolerably good set of X's.
So now we're ready for an interesting type:
> data U = U { f :: U -> Bool }
(BTW U stands for universe).
The first thing to think about is what this might mean mathematically. Well it looks like a solution to the equation X=2X, in other words, a set that equals its own power set. We know that in set theory this gives rise to Russell's paradox. One way to embed recursive types into set theory is to tweak set theory, for example by using non-well-founded set theory. But Russell's paradox is so bad that it breaks just about any attempt at devising a set theory where a set can equal its power set.
And yet we can build this type in Haskell. How come? The explanation is simple, Russell's paradox tells us that there must be some elements whose evaluation doesn't terminate. In fact, we can replay Russell's paradox directly in Haskell. But before that we need some definitions.
Firstly, are we sure we can even construct an element of U? In order to define an element of U we need to define a boolean predicate on U but we can't even compare two elements of U for equality. But we can define the 'empty' element phi, along with a membership test:
> phi = U $ const False
> x `e` y = f y x
Unlike conventional set theory, we can define negation of elements. We can also define union and intersection:
> neg a = U $ \x -> not (x `e` a)
> a \/ b = U $ \x -> x `e` a || x `e` b
> a /\ b = U $ \x -> x `e` a && x `e` b
We're ready for the Russell paradox. Define the 'set' (I give in, let's informally call these things sets) of all sets that don't contain themselves.
> nonSelfContaining = U $ \a -> not (a `e` a)
Now try to compute nonSelfContaining `e` nonSelfContaining. It doesn't terminate. It's not just a matter of having implemented it badly, it can't possibly terminate as there is no answer that could make sense. So if you're designing a language that's intended to be total, you'd better rule out types like U.
So we now have two elements of U, phi and neg phi. Can we make any more? Well here's one way:
> c a = U $ \x -> a `e` x
Intuitively, c a is the set of all sets that contain a. But there are two potential problems that come to mind with this. One is, we've only made two sets so far, so "the set of all sets that contain a" might not be interesting. And secondly, can we be sure c a is distinct from a?
The first issue is easily dismissed. neg phi is obviously in c phi, but phi isn't. So we can distinguish phi and c phi. c phi is also different from neg phi (check to see if phi is an element of each). In fact, define functional power by
> f ^ 0 = \x -> x
> f ^ n = \x -> (f^(n-1)) (f x)
and it can be shown that c^m phi is distinct from c^n phi for m≠n. (Hint: Try checking to see if c^m (neg phi) is an element of c^n phi for various m and n.) So we have constructed an infinite number of elements of U.
But, we know that we can write code for elements of U that doesn't terminate. Why not restrict ourselves to just the computable elements of U? But there's a catch. How do we define a computable element of U when elements of U are functions. One approach is this: define x to be computable if y `e` x is computable for all y. But that won't do because it won't terminate if y isn't computable. So how about defining x to be computable if y `e` x is computable for all computable y. But that's circular, and not in a nice inductive way. U is so perverse that just attempting to define computability for it gives a circular definition. But that's not necessarily a problem. Define a 'universe' to be a set, C, of computations of type U such that C = the set of x such that y`e` x terminates for all y in C. The question now is this: is there a unique set C satisfying this 'equation'. We can rule out the empty set as phi must be in any such C. There's also another thing we can prove. If we have two such sets, C and C', we can prove that the intersection of C and C' is a universe. So we can take the intersection of all universes to form the smallest universe. So we can define a something in U to be computable if it is in this smallest set.
So here's my first open question: Are there larger 'universes' than the smallest one, or is it unique? It's not hard to see that a universe is closed under (non-recursive applications of) negation, finite union, finite intersection and the function c above. My second open question: What other computable elements are there?
Time to stop this and start cooking dinner.
Labels: mathematics, programming, self-reference
posted by sigfpe at Saturday, January 26, 2008 14 comments
Friday, July 20, 2007
I'll have a Buchburger with fries
> import Data.Map as M
> import Data.List as L
Everyone else is at it, so why not me? Here's my approach to solving this problem. Let's warm up with a simpler menu. Consider:
Fries 3ドル
Burger 5ドル
What can I get that costs exactly 13ドル?
My approach is going to revolve around rewrite rules. I'm going to find a set of rewrite rules that converts money into fries and burgers, and does so in such a way that it always eliminates the money, if it is possible to do so.
Let's use D to represent a dollar, F to represent fries and B to represent a burger. Here's our first attempt at some rewrite rules:
DDD->F
DDDDD->B
Now we can take the string DDDDDDDDDDDDD and try to rewrite it. We could start with the first rule:
DDDDDDDDDDDDD
->DDDDDDDDDDF
->DDDDDDDFF
->DDDDFFF
->DFFFF
But now we're stuck with a D and no way to get rid of it.
Before we go much further, however, it's time to simplify our notation. Firstly, unlike normal rewrite rules, we don't care about the order of our symbols. So let's assume that we can reorder symbols any time we want in our strings and rules. So FDD=DFD=DDF and so on. Let's also use Xn to mean n copies of X in a row. Now we can write our strings as mononomials in D, F and B. That makes things much easier. We can now write our rules as
D3->F
D5->B
We've seen that just picking the first rule each time can get us stuck. It seems that the solution should be to come up with an algorithm to generate the correct order. But there's another strategy - it's to modify the rules so that the order doesn't matter. A set of rules with this property is said to be confluent. Amazingly you can turn any reasonable set of rewriting rules into a confluent one merely by adding in elements.
So let's define a monomial type:
> newtype Monomial a i = Monomial (Map a i) deriving (Eq,Show)
> unM (Monomial x) = x
> divides (Monomial a) (Monomial b) = isSubmapOfBy (<=) a b
> lcm (Monomial a) (Monomial b) = Monomial $ unionWith max a b
> instance (Show a, Show i, Num i,Ord a) => Num (Monomial a i) where
> Monomial a*Monomial b = Monomial $ unionWith (+) a b
> instance (Show a, Show i, Num i,Ord a) => Fractional (Monomial a i) where
> Monomial a/Monomial b = Monomial $ M.filter (/=0) $ unionWith (+) a (M.map negate b)
Rewrite rules are simply pairs of monomials:
> data Rule a i = Monomial a i :-> Monomial a i deriving (Eq,Show,Ord)
> lm (a :-> _) = a
Now we can write code to apply these rules to a string. The first function simply applies the rule to a term:
> apply (p :-> q) r = r/p*q
applicable searches for the first applicable rule in a list of rules:
> applicable r rules = find (\(p :-> _) -> divides p r) rules
And reduceStep applies it. Note that it returns the result twice in a pair. This is simply so that it works with the next function. In the event that no rule is applicable it returns Nothing.
> reduceStep rules r = applicable r rules >>= return . dup . flip apply r
> where dup x = (x,x)
reduce repeatedly applies reduceStep until it can be applied no more. Repeatedly doing something until Nothing is returned is best handled by unfoldr:
> reduce rules r = last $ (r:) $ unfoldr (reduceStep rules) r
Now consider our example rewrite rules. Let's find a simple example of where confluence fails to hold. Consider two different ways to rewrite D5. We could use the first rule and get D5->D2F or use the second and get D5->B. Either way we terminate - and get different results. How can we get out of this impasse? There's a really obvious solution, add in a new rule D2F->B. Now if we start with D5 and apply the first rule then we can get out of it by applying this new third rule. Either way, we get B.
Before we generate new rules, we need to say something about how we can ensure termination. If we have one rule that says X->Y and another that says Y->X then we're going to get stuck. So what we can do is insist on an ordering on our monomials. We'll restrict ourselves to rules where the left hand side is larger than the right hand side. So here's an ordering known to algebraists as lexicographic ordering. The idea is that the symbols in the monomial have an ordering on them. One monomial, a, is considered bigger than another, b, if the power of the least symbol (in the symbol ordering) in a/b is negative. (BTW See postscript for a comment on this.) What this means is that the rewrite rules will try to eliminate symbols earlier in the ordering and replace them with symbols that come later. By making our currency symbol the first in the ordering the rewrite rules will tend to act so as to eliminate the currency. And that's what the original problem asked us to do. Note that there are other orderings we could use so for generality Monomial ought to be a typeclass.
> instance (Show a,Ord a,Ord i,Num i) => Ord (Monomial a i) where
> compare x y = let
> Monomial u = x/y
> in
> if M.null u
> then EQ
> else if snd (M.findMin u)>0
> then LT
> else GT
We also use a function to ensure that any rule is in canonical form with the greater term on the right:
> order (a :-> b) = if a>b then b :-> a else a :-> b
If we find a complete confluent set of rules that respect the ordering in this way, how do we know the rules terminate? Suppose the complete ordered list of symbols in our rules is (a1,a2,...,an) then any string a1k1a2k2...ankn can be written as a vector (k1,k2,...,kn). Each application of a rewrite rule replaces the corresponding vector with one that is lexicographically earlier. It's pretty clear that such a process must terminate. (Aside: You can think of this as using the fact that ωn is well-ordered. More general kinds of rewrite rules make use of more complex ordinals to prove termination. This goes to show that ordinal arithmetic is actually a useful tool.)
So back to what we were doing: finding how we can add new rules to ensure confluence. Suppose we have two rules whose LHS 'overlap'. If we think in terms of monomials then we mean two monomials that have a non-trivial common factor. If we form the lowest common multiple of these monomials then we now have a 'minimal' term to which both rules can be applied. In fact, that's how I got the example above. One rule has P3 on the LHS and one has P5. The lowest common multiple of these is P5 and I investigated how the two rules apply to this one term. When we did this, we ended up with two different terms which we simply made into a new rule. Here it is in Haskell:
> newRule pq@(p :-> _) pq'@(p' :-> _) = let
> j = Main.lcm p p'
> in order $ apply pq j :-> apply pq' j
Given a rule, we also simplify the left and right hand sides with normal:
> normal rules (p :-> q) = let
> p' = reduce rules p
> q' = reduce rules q
> in if p'/=q'
> then Just $ order $ reduce rules p :-> reduce rules q
> else Nothing
The problem is that when we generate a new rule there now appear a whole bunch of new pairs that could be considered because you need to consider pairs made up of the new rule and all of the old rules. Our strategy for dealing with this will be simple: take the list of all pairs of rules and generate new rules from those. Every time we generate a new rule we need to add in the new rule pairs that are generated. We maintain a list of pairs found so far, chewing pairs for newRule from the front, and adding newly generated pairs to the back.
The first line is easy:
> complete incomplete = let
We make all possible pairs. We don't want to consider pairs twice so we use the inherited ordering on pairs:
> pairs = [(x,y) | x <- incomplete, y <- incomplete, y>x]
Here's a little heuristic. Things go faster if we use monomials that come lower in the ordering before higher ranking monomials. So the completed set of rules is going to be sorted:
> rules = sort incomplete
Now here's a weird bit to explain. This isn't essential to the mathematics but is is essential to this piece of code. The local function iterate is going to return a pair that consists of the remaining list of rule pairs and the completed rules. But iterate also consumes rule pairs. So this function is actually going to consume its own output. There's an important issue with this: eventually the rate of new rule creation will peter out so that it will consume its way to the end. At this point iterate will try to consume the very element it's expected to generate and spin in an infinite loop. (The dangers of unguarded recursion!) So we also maintain another list whose length is the same as the distance between the front and back of our pair list. When this first list is empty, we know we can return [] for the second list. It doesn't matter what this list is, as long as it has the correct length. This 'phantom' list is the first argument to iterate. But for now I suggest ignoring it and coming back to it if you ever get to a second reading.
Here's the base case I mentioned:
> iterate [] pairs rules = ([],rules)
And here is the guts of the loop:
> iterate (_:n) (p:pairs) rules =
p is the first pair chewed off the front. We generate a new rule from it and normalise it:
> case normal rules $ uncurry newRule p of
If we get a new rule we now need to add it to our list of rules, and add in all the new pairs it generates:
> Just pq -> let u = [(x,pq) | x <- rules, x /= pq]
Note the use of insert to maintain the order.
> (pairs', rules') = iterate (u ++ n) pairs $ L.insert pq rules
> in (u ++ pairs', rules')
Otherwise we don't need to generate anything:
> Nothing -> iterate n pairs rules
Here's where we tie the knot by feeding our output back in as input:
> (pairs', rules') = iterate pairs (pairs ++ pairs') rules
> in rules'
It's not obvious at this point that the algorithm terminates. It does, but you'll have to trust me.
Anyway, that's not all. We also do a bit of tidying up by eliminating redundant rules. In a confluent system, if there are two rules we can apply, it doesn't matter which one we choose. Now consider a pair of rules looking something like: AB2C -> ... and AB3C2 -> ... . If we can apply the second rule, then we could also apply the first rule. But in a confluent system, it doesn't matter which rule we choose. So we can just throw out the second rule here. So we tidy up our rules by eliminating rules made redundant by another. Turns out that at this stage we have a canonical form. For any given set of rules, any completion with the reducant elements deleted is the same as any other (modulo reordering). So our completion process is itself confluent! This isn't hard to prove but I'm not going to do it now.
> nonredundant rules r = L.null [s | s <- rules, s/=r, lm s `divides` lm r]
> tidy rules = L.filter (nonredundant rules) rules
And now canonical puts our rules into a complete canonical form:
> canonical rules = tidy $ complete rules
Back to the original problem:
> data Menu = P
> | MixedFruit
> | FrenchFries
> | SideSalad
> | HotWings
> | MozzarellaSticks
> | SamplerPlate deriving (Eq,Ord,Show)
Note that deriving Ord. This is so that the lexicographic ordering above can work.
> makeRule a b = (Monomial $ M.fromList a) :-> (Monomial $ M.fromList b)
> rules = [
> makeRule [(P,215)] [(MixedFruit,1)],
> makeRule [(P,275)] [(FrenchFries,1)],
> makeRule [(P,335)] [(SideSalad,1)],
> makeRule [(P,355)] [(HotWings,1)],
> makeRule [(P,420)] [(MozzarellaSticks,1)],
> makeRule [(P,580)] [(SamplerPlate,1)]
> ]
> crules = canonical rules
> ex n = Monomial $ fromList [(P,n)]
> example0 = reduce crules (ex 1505)
> main = print (reduce crules (ex 1505))
Now, some caveats. I don't claim that this is a fast method, just a different approach. Completing a set of rules is an expensive operation and if you start with N rules, the worst case running time is something horrendous like O(exp(exp(N)). On the other hand, once you've found a set of canonical rules, reduction with them is very fast. This makes it ideal for problems involving repeated use of the same menu. Or for when you have a lot of spare cash:
> example = print (reduce crules (ex 10000115))
And now I have a confession to make. I've really just described the Buchberger algorithm for Gröbner basis computation. It's not quite the Buchberger algorithm because I'm only considering binomials, not polynomials - but you'll notice my code is remarkably similar to the code at the other end of that link. The rule p :-> q represents the binomial p-q. Unfortunately, the Buchberger algorithm is usually presented in its more general form - but understanding that requires a background in commutative algebra and algebraic geometry. I hope I've managed to present a form of it in a way that's digestible by people without abstract algebra experience. The binomial Buchberger algorithm has many applications, including a variety of applications in integer programming.
Note that this is also very close to the Knuth-Bendix algorithm. However, the latter algorithm is for the noncommutative case, and it isn't guaranteed to terminate.
BTW I recommend playing with the completed rules by hand - at least for simple examples. It's interesting to see what steps are taken to reach the solution.
PS My monomial ordering is the opposite of the usual lexicographic monomial ordering. This is because I want my rules sorted with what is usually considered the largest monomial at the front.
PPS Can I get rid of that weird phantom list thing? You can replace the list with an integer for its length, but it's not clear to me that this is an actual improvement.
Update: I forgot to mention this. Check out David Amos's implementation of the full Buchberger algorithm which you can use to solve many more problems than my simplistic monomial version.
Labels: mathematics, optimisation
posted by sigfpe at Friday, July 20, 2007 6 comments
Friday, July 06, 2007
Death>Dishonour =/=> ~Dishonour>~Death
This example from the book Subjective Probability: The Real Thing by Richard Jeffrey was pointed out to me by a friend of mine yesterday. Let's draw a little diagram of possible outcomes and their utility:
The '*' indicates an outcome with probability zero and the other outcomes are considered to be equally likely. A is the variable describing your state of liveness and B is the variable representing your honour.
We have:
E(utility|A=death) = 1.5
E(utility|B=dishonour) = 1
E(utility|B=honour) = 2.5
E(utility|A=life) = 3
With this set of utilities one does indeed rate death before dishonour but not honour before life.
This gets mixed reactions. Some people think that it's a nice counterintuitive result. Some people think it's so trivial they don't see why anyone would even mention it. If you're in the latter group, sorry for wasting your time. :-)
(I leave the Haskell program using a probability monad as an exercise...)
Labels: mathematics, probability
posted by sigfpe at Friday, July 06, 2007 4 comments
Tuesday, April 10, 2007
The curious rotational memory of the Electron, Part 2
But it's a little more complex than this. We also have quantum mechanics to contend with. The spin of an electron is a vector. But we find that when we measure one of the components of this vector this value is quantised and can only take values +hbar/2 and -hbar/2, where hbar is Planck's constant. We choose units where h-bar is 1 so the z-component of the spin is always measured to be +1/2 or -1/2. If we write these two states as |+> and |-> then because we are dealing with quantum mechanics, the z-component of the spin can be represented by the linear combination a|+>+b|->. This corresponds to a state in which there is a probability |a|² of measuring +1/2 and a probability |b|² of measuring -1/2. This is what might have been written as a.*return (1/2)+b.*return (-1/2) in my earlier Haskell code. But that's just one component of the spin. What about the x- and y-components? Amazingly the state a|+>+b|-> tells us everything we can possibly know about the spin of an electron and we'll call it a spin state.
Suppose we have an electron in the state ψ = a|+>+b|->. What happens if we measure the y-component of its spin? One way to answer that question is to rotate the electron through π/2 so that its x-axis is rotated to align with the z-axis and then measure the z-component of its spin. In order to to that we need to know how to rotate spin states. The rule for rotation through θ about the x-axis is this (in a suitable coordinate frame):
|+> → cos(θ/2)|+>-sin(θ/2)|->
|-> → sin(θ/2)|+>+cos(θ/2)|->
Note how choosing θ=0 gives the identity, as expected. Note also that θ=π maps a|+>+b|-> to b|+>-a|-> so that the probabilities of measuring +1/2 and -1/2 are simply swapped, exactly what you'd expect for turning a state upside down. But there's something else that you should notice - there's an ambiguity. A rotation through 2π should give the same as a rotation through 0 and yet setting θ=2π in that transformation maps a state ψ to -ψ. Now |a|² = |-a|² so the probability of observing spin up or spin down is unaffected. But as I've been showing over previous posts, flipping a sign in a state can make a big difference as soon as you start performing interference experiments. The same goes for any angle: if I rotate through π should I use θ=π or θ = 3π? So can the transformation I've given make sense?
The transformation does make sense if you consider that in any physical process that rotates an electron the transformation will evolve continuously over time. Electrons don't just instantly rotate. In other words, if a rotation is applied to an electron then it will follow a path in SO(3), not just be an instantaneous application of an element of SO(3). And that allows us to resolve the ambiguity: the rotations of electrons are described by the double cover of SO(3) known as SU(2). So a rotation through 360 degrees doesn't return you to the identity although a 720 degree rotation does. The transformation I gave above is completely unambiguous if you continuously rotate an electron around the x-axis tracking a continuous value of θ, after all, the double cover is basically just the set of continuous paths from the identitiy in SO(3) (with homotopic paths considered equivalent).
And that's the bizarre fact: electron rotations aren't described by SO(3), they're described by SU(2). In particular, rotating an electron through 360 degrees does not return it to its original state, but a rotation through 720 degrees does! In a sense, like Dirac's belt, electrons can remember something about the path they took to get where they are, in particular they remember how many twists there were in the path.
What does this mean experimentally? the first thing to note is that this is true not just for electrons but any spin-1/2 fermion. This included protons and neutrons. The stuff I've been talking about manifests itself in a number of ways. In particular, the spin of a particle affects how a magnetic field acts on it. For example, spin-up and spin-down particles can be separated into distinct beams using Stern-Gerlach apparatus. Also, the spin of particles precesses in a magnetic field and this is used on a regular basis in NMR. These two facts allow us to easily manipulate and measure the spin of fermions. In other words, the fact that fermions remember how many twists there are in their rotations isn't just some esoteric nonsense, it's now engineering and the theory is tested repeatedly all over the world.
Every familiar object is invariant under rotations through 360 degrees. So the fact that electrons need to be rotated through 720 degrees to return them to their original state seems like one of the most bizarre facts about the universe I know of. And yet many books that introduce spin just slip in this fact in a routine way as if it were no different to any other.
The fact that the biggest connected cover of SO(3) is the double cover puts a big constraint on the kinds of weird effects like this can happen. We can have a 360 degree rotation multiply by -1, but not by i, because a 720 degree rotation absolutely has to return us to where we started from. But suppose the universe were 2-dimensional. If you remember what I said about SO(2) you may notice that no such constraints apply because SO(2) has an infinite cover. There is a group in which all of the rotations through 360n degrees are distinct for distinct n. This means that a physical system could have its state multiplied by any factor (of modulus 1) when rotated through 360 degrees. Particle that behave this way are called anyons. But we live in a 3D universe so we don't expect any fundamental particles to have this property. However, in quantum mechanics any kind of 'excitation' of a physical system is quantised and can be thought of as a type of particle. These are known as quasiparticles. For example, just as light is made of photons, sound is also quantised as phonons. In the right kind of solid state medium, especially those that arise from some kind of 2D lattice, it seems quite plausible that anyons might arise. This gives rise to the so called fractional quantum hall effect. Anyons might one day play an important role in quantum computing via topological quantum computation.
Labels: mathematics, physics
posted by sigfpe at Tuesday, April 10, 2007 5 comments
Saturday, March 31, 2007
The Curious Rotational Memory of the Electron, Part 1
This post will be in two parts. This part will mainly be about geometry and topology, and in the second part I'll talk about the physics.
In order to explain the idea I want to introduce the concept of a universal cover. If you look at the definition of this notion anywhere it can seem pretty daunting to non-mathematicians, but in actual fact the key idea is so simple that even a child could understand it. I can say that without fear of exaggeration because universal covers used to play a role in a childhood fantasy of mine. (OK, I was a weird kid, but not that weird.)
My crazy idea as a kid was this: if there were two ways to get from A to B, maybe the B you reached by taking either of the two routes were actually slightly different, even though they looked the same. I was actually more concerned with journeys from A to A. Suppose the kitchen has a door at each end and that you start in the bedroom, go down to the kitchen at one end, come out the other end, and return by a different path to the bedroom. How do you know this is the same bedroom? (I also admit I read a lot of science fiction as a kid.)
Suppose instead you took a simpler path: you left the bedroom and went to another room and then returned by the path you went. If you had taken a piece of string and tied one end to the bedroom and walked this path playing out the string behind you as you went then when you returned to your bedroom you would have both ends of the string visible. You could then reel in the string from your journey, without letting go of the ends, and see that indeed, both ends were in fact the ends of the same string and so you had retuned to the start of your journey. But if you'd taken the round trip through the double-doored kitchen you wouldn't be able to pull the string tight and so you could no longer be sure.
In what follows we consider two paths from A to B equivalent in this way: mark the two paths with strings going from A to B. If you can pull one string to line up exactly with the other (you're allowed to stretch) without breaking the string or moving the endpoints then they are considered equivalent. (The correct technical term is homotopic.) So if a path that appears to go from A to A is homotopic to the trivial zero length path, then you know that the path really is from A back to A. But if you can't shrink the path down to zero length you're faced with the possibility that the string is extended from A to a similar looking place, A'. (Of course this implies a considerable conspiracy with someone tying a distinct identical string to an identical bedroom, but if you can't see both ends, alien beings from a higher dimension could do anything...)
The best case scenario is that all of the bedrooms are the same - a conventional house. Call this house X. But what's the worst case scenario? How many different copies of the bedroom might there be? Well let's call the starting bedroom 0. If we walk anti-clockwise round the house, going through the kitchen, we end up back at a bedroom. In the worst case scenario this will be another bedroom. Let's call it 1. Walk round again and we get to another copy of the bedroom, call it 2 and so on. So we have an infinite sequence of bedrooms, (0,1,2,…). But we could have walked clockwise. In that case we'd get a sequence (-1,-2,…). So now think about the set of all points in this infinite house. Each point can be described as a pair (x,n) where x is a point in the house X and n labels which of the infinite number of paths you could have taken to get there. (Remember, if the two paths are homotopic we consider them to be the same path.)
If using a house as an example seems too far-fetched think of a mult-storey car park. It's easy to walk what you think is a closed loop but find that you've in fact walked to almost identical looking, but different, floor.
(Incidentally, this kind of infinite house is the kind of thing that can be created in a Portal engine such as in the game Portal.)
Let's think about this more generally. Given a topological space X we can form another space X' from it by (1) picking a point p in it and (2) forming the set of all pairs (x,n) where x is a point in X and n is a path (up to homotopy) from p to x. X' is known as the universal cover of X.
Enough with houses, let's try a more purely geometric example. Consider the space of rotations in 2D, otherwise known as SO(2). A rotation can simply be described by an angle with the proviso that a rotation through an angle θ is the same as a rotation through θ+2nπ where n is an integer. The space of rotations is essentially the same thing as the circle (by circle I mean the 1-dimensional boundary of a 2-dimensional disk). Looping once round the circle brings you back to where you started. So how can we visualise the universal cover of SO(2)? There's a nice trick for this. Imagine a disk that's free to rotate around one axis. SO(2) describes the rotations of this disk. But now imagine a belt, fixed at one end, connected at the other end to the centre of this disk like this:
The rule is that we're allowed to twist the belt as much as we like as long as the centre line of the belt is along the axis. Notice how if we rotate the disk through an angle of 2nπ the disk returns to its starting point, but the belt 'remembers' how many twists we took to get there. Notice another thing: if we travel along the belt from one end to the other, the orientation of the belt at each point gives a 2D rotation. In other words, the belt itself represents a path in SO(2) with one one representing a fixed point in SO(2) and the other end matching the current orientation of the disk. So if the orientations of the disk represent elements of SO(2), the possible twistings of the belt represent elements in the universal cover of SO(2). And what the twists do is allow us to differentiate between rotations through θ and rotations through θ+2nπ. In other words, the universal cover is just the real line, but not folded down so that θ is the same as θ+2nπ.
The universal cover is the worst case scenario where all non-homotopic paths are considered to represent distinct points. But it's also possible to construct an in-between scenario where some paths that aren't homotopic to each other are still considered to lead to the same place. For example, take the real line again. If we identify 0 and 2π we get SO(2). If we consider 0 and 2nπ to be distinct for n≠0 we get the universal cover. But we pick an N≠1 and consider 2Nπ to be the same as 0, but that 2nπ is distinct from 0 for 0<n<N. What we now end up with is an N-fold cover of SO(2). Unfortunately I don't know a way construct a physical example of a general N-fold cover using belts and discs. (At least for N>2. For N=2 you should easily be able to come up with a mechanism based on what's below.)
With the examples so far, the universal cover of a space has always had an infinite number of copies of the original space. That isn't always the case and now we'll meet an example where the universal cover is a double cover: the real projective plane, RP2. The easiest way to visualise this is to imagine the game of Asteroids played on a circular TV screen. In the usual Asteroids, when you go off one side of the screen you reappear on the opposite side. In this version it's much the same, points on the edge that are diametrically opposite are considered to be the same points. Such a universe is a real projective plane. Let P be the point at the centre of the screen. Suppose we attach one end of our string to P and fly off towards the edge, reappearing on the other side. Eventually we get back to P again. Can we shrink this path down to a point? Unfortunately, any path that crosses the edge of our screen will always cross the edge no matter how much we try to deform it. If we move the point at which it crosses one edge of the screen, the point at the other side always remains diametrically opposite.
This means you can never bring these points together in order to collapse down a loop. On the other hand, suppose we loop twice round the edge. The following diagram shows how this path can be shrunk down to a loop:
So given two distinct points in RP2 there are two inequivalent paths between them, and so the universal cover of RP2 is made from two copies of the space and is in fact a double cover.
Now we'll try a 3D space - the space of 3D rotations, otherwise known as SO(3). This space is 3-dimensional because it takes 3 numbers to define a rotation, for example we can specify a rotation using a vector whose direction is the axis of rotation and whose length is the angle of rotation. Now all possible rotations are rotations through an angle 0≤θ≤π for some exis. So every rotation can be represented as a point in the ball of radius π. But a rotation around the axis a through an angle of π is also the same as a rotation around the axis -a through π. In other words, SO(3) is the closed 3-dimensional ball with diametrically opposite points identified. This is very similar to the situation with RP2. In fact, almost exactly the same argument shows that the universal cover of SO(3) is the double cover. (For an alternative wording, see this posting.) There is also another way of seeing this that is analogous to the disk with belt example above. This time we have a sphere instead of a disk. The belt is fixed at one end and the other end is attached to the sphere. We are free to rotate the sphere however we want, and we're no longer confined to having the belt along one axis. It is an amazing but easily demonstrable fact that if you rotate the sphere through 2π, you cannot untwist the belt, but if you rotate it through 4π you can! This is known as the Dirac Belt Trick, and if you don't believe it, watch the Java applet written by science fiction author Greg Egan.
If you've done much computer graphics you'll have already met the double cover of SO(3). Computer graphics people sometimes like to represent rotations using unit quaternions. The unit quaternions form a 3-dimensional sphere but there are two such quaternions for each rotation. So the unit quaternions actually represent this double cover, and SO(3) is a 3-dimensional sphere with antipodean points identified.
And that's the geometry dealt with. In the next installment I'll say what this has to do with electrons and my quantum computing code.
Footnote
Balls and Spheres. The closed n-dimensional ball is the set of points (x1,...,xn) such that x12+...+xn2≤1. The (n-1)-dimensional sphere is the set of points (x1,...,xn) such that x12+...+xn2=1.
Labels: mathematics, physics
posted by sigfpe at Saturday, March 31, 2007 5 comments
Friday, January 26, 2007
The Monads Hidden Behind Every Zipper
Firstly, what does "wrong category" mean? In the category of vector spaces, every object is equipped with a commutative binary + operator. We can make no such assumption in Haskell. What's more, Haskell doesn't easily allow us to restrict type constructors to instances of Num say. So we can't easily implement monads in the category of vector spaces as instances of Haskell's
As I've mentioned on #haskell, convolution is comonadic. A convolution (more specifically, a discrete convolution) is a function that maps a (doubly infinite) sequence of real numbers to another such sequence of real numbers in such a way that each number in the new sequence is a weighted sum of numbers in the old sequence and the weights depend only on the relative positions of the numbers. For example: replacing each number in the sequence with the average of it and its neighbours is a convolution.
Here's a simple example. Consider the sequence ...0,0,1,2,3,0,0,... We'll replace each number with the sum of its neighbours and twice itself. So this maps to ...,0,0+0+1,0+2*1+2,1+2*2+3,2+2*3+0,3+2*0+0,0,... The structure of this computation is very similar to the structure of cellular automaton evaluation and we can reuse that code here:
> data Zipper a = Zipper [a] [a] deriving (Eq,Show)
> left (Zipper (a:as) bs) = Zipper as (a:bs)
> right (Zipper as (b:bs)) = Zipper (b:as) bs
> class Comonad w where
> coreturn :: w a -> a
> cobind :: (w a -> b) -> w a -> w b
> iterate1 f x = tail $ iterate f x
> instance Comonad Zipper where
> cobind f a = fmap f $ Zipper (iterate1 left a) (iterate right a)
> coreturn (Zipper _ (b:_)) = b
> a = Zipper (repeat 0) ([0,1,2,3]++repeat 0)
> f (Zipper (a:_) (b:c:_)) = a+2*b+c
> test = let Zipper u v = cobind f a in take 5 v
When we run test we get [1,4,8,8,3].
Now think about the structure of this computation. We write a function f that gathers up the value at a cell and its neighbours and computes the weighted sum. cobind then applies this function all the way through our sequence to get our result. The important thing is that f pulls in the data it needs, distilling it down to one value, and that's why it has type Zipper a -> a.
There's another approach to convolution. We could push or scatter data. At each point in the sequence we could take the value and push one copy left, one copy right, and leave double the original value in place. Each cell will end up with three values left in it and all we have to do is sum the values deposited in a cell. What we want therefore is to take a value in a cell, of type a, and generate an entire zipper full of values pushed from it, ie. a Zipper a. We'd then like bind for the monad to sum over the returned zippers, staggering them right and left as appropriate. Here's a table illustrating what I mean:
0 -> 0 0 0 0 0 0
1 -> 0 1 2 1 0 0
2 -> 0 0 2 4 2 0
3 -> 0 0 0 3 6 3
0 -> 0 0 0 0 0 0
----------------
0 1 4 8 8 3
Unfortunately, as I pointed out above, bind for a monad can't perform summations, so I have to define bind' and return' instead.
> plus (Zipper a b) (Zipper a' b') = Zipper (plus' a a') (plus' b b') where
> plus' (a:as) (b:bs) = (a+b) : plus' as bs
> plus' a [] = a
> plus' [] a = a
Note that I'm going to assume that elements past the end of a list (so to speak) actually represent zero and define new versions of left and right that produce the required zeros on demand:
> left' (Zipper (a:as) bs) = Zipper as (a:bs)
> left' (Zipper [] bs) = Zipper [] (0:bs)
> right' (Zipper as (b:bs)) = Zipper (b:as) bs
> right' (Zipper as []) = Zipper (0:as) []
> tail' [] = []
> tail' a = tail a
> stagger f [] = []
> stagger f (x:xs) = x : map f (stagger f xs)
> stagger1 f x = tail' (stagger f x)
> instance Functor Zipper where
> fmap f (Zipper a b) = Zipper (map f a) (map f b)
Here's our fake monad:
> return' a = Zipper [] [a]
> bind' f x = let Zipper a b = fmap f x
> in foldl1 plus (stagger left' b ++ stagger1 right' a)
> a' = Zipper [] [0,1,2,3]
> f' x = Zipper [x] [2*x,x]
> test' = let Zipper u v = bind' f' a' in take 5 v
Run test' and we get the same result as before by a quite different method.
You should be able to see that in some sense comonads describe lazy computations and monads represent strict ones. In strict languages you evaluate your arguments first, and then push them into a function. In lazy languages you call the function first, and it pulls in the value of its arguments as needed. In fact, in Haskell we have a strictness monad and in ocaml you can implement a lazy comonad. By and large, it's easier to write pull algorithms in Haskell. Push algorithms are much easier to implement in a language with side effects. In fact, one of the things that Haskell beginners need to learn is how to rewrite push algorithms as pull algorithms and eliminate those side effects.
What I think is interesting here is that this dichotomy between push and pull, scatter and gather, is pervasive all the way through computing, and it's quite pretty that this deep notion it can be captured nicely by a(n almost) monad and a comonad. Just to give an illustration of how pervasive it is consider 3D rendering. We typically want to take a list of primitives and turn it into an array of pixels. If we rasterise we push the primitives, asking for each primitive, what pixels does it cover. If we ray-trace we pull on the pixels asking, for each one, what primitives cover it. Many optimisations to both of these approaches consist of making judicious decisions about which part of the pipeline should be switched from push to pull or vice versa.
BTW I have an algorithm for automatically converting some types of pull algorithm into push algorithms and vice versa. It's another one of those papers that I keep meaning to finish and submit for publication one day...
Labels: haskell, mathematics
posted by sigfpe at Friday, January 26, 2007 9 comments
Saturday, January 20, 2007
Fox's Ubiquitous Free Derivative pt. 2
First define a regular expression type:
> import List
> data RE a = Symbol a | Star (RE a) | Sum [RE a] | Prod [RE a]
> deriving (Eq,Show,Ord)
I use lists instead of binary operators to represent products and sums because
addition and multiplication of regular expressions is associative.
> a <*> b = Prod [a,b]
> a <+> b = Sum [a,b]
> zero = Sum []
> one = Prod []
Here's how to make a simple regular expression from a string
> re x = Prod $ map Symbol x
Exercise: Write a proper parser for these things so we can use conventional regexp notation!
Here's a test to see whether a regular expression matches the empty string:
> acceptsEmpty (Symbol _) = False
> acceptsEmpty (Star _) = True
> acceptsEmpty (Sum l) = or $ map acceptsEmpty l
> acceptsEmpty (Prod l) = and $ map acceptsEmpty l
I've decided to name these derivatives after people. So first I define the McBride derivative which is a generalisation of the dissection operator introduced in Clowns to the left.... The key idea is that if d = mcbride c j a then d (x <*> y) = (d x <*> j y) <+> (c x <*> d y). This needs to respect the associativity of <*> and one way to achieve this is by choosing c and j that are multiplicative homomorphisms so that, for example, c(x <*> y) = c x <*> c y.
> mcbride _ _ a (Symbol a') = if a==a' then one else zero
> mcbride c j a (Sum p) = Sum $ map (mcbride c j a) p
> mcbride _ _ _ (Prod []) = zero
> mcbride c j a (Prod x) = Sum $ map f (splits x) where
> f (x,y:ys) = Prod $ map c x ++ [mcbride c j a y] ++ map j ys
> mcbride c j a (Star p) = c (Star p) <*> mcbride c j a p <*> j (Star p)
And now I can define all the Leibniz, Fox and Brzozowski derivatives as special cases of the mcbride derivative. The latter each come in left- and right-handed pairs.
> delta p = if acceptsEmpty p then one else zero
> leibniz,fox_l,fox_r,brzozowski_l :: (Ord a,Eq a) => a -> RE a -> RE a
> leibniz a p = simplify $ mcbride id id a p
> fox_l a p = simplify $ mcbride id (const (one)) a p
> fox_r a p = simplify $ mcbride (const (one)) id a p
> brzozowski_l a p = simplify $ mcbride delta id a p
> brzozowski_r a p = simplify $ mcbride id delta a p
brzozowski_l is the usual Brzozowski derivative. If r matches some language, brzozowski_l a t matches the sublanguage of strings beginning with a and with that a removed. We can define regular expression matching in the usual way in terms of it:
> matches re s = acceptsEmpty (matches' re s) where
> matches' re [] = re
> matches' re (a:as) = matches' (brzozowski_l a re) as
Now we can start matching.
ex1 would normally be written as "abc|b*c".
> ex1 = (re "abc") <+> (Star (re "b") <*> Symbol 'c')
> test1 = map (matches ex1) ["abc","bbc","bbbbc","abbbbc"]
brzozowski_r chews off a symbol from the right:
> test2 = map (matches (brzozowski_r 'c' ex1)) ["abc","bbc","ab","bbbbb"]
leibniz a allows you to leave out one a from anywhere in the regexp.
ex3 matches "abracadabra" as it might be shouted by a wizard who might lengthen the second last 'a' to make sure the spirits can hear. ex4 matches such strings with precisely one missing 'r'.
ex3 would normally be written as "abracada*bra".
> ex3 = re "abracad" <*> Star (re "a") <*> re "bra"
> test3 = map (matches ex3) ["abracadabra","abracadaaaaabra"]
> ex4 = leibniz 'r' ex3
> test4 = map (matches ex4) ["abracadabra","abacadabra","abracaaaaadaba","abacadaba"]
fox_l a makes regular expressions accept any prefix ending just before an a. Combining fox_l and fox_r allows you to search for all elements of a language just between a chosen pair of symbols.
In ex5 we build a regexp for certain types of statement in a simple language and then pull out from it the part corresponding to substrings within parentheses. The undifferentiated part of ex5 would normally be written as "x = ([0-9]+[0-9]); *".
> ex5 = simplify $ fox_r '(' $ fox_l ')' $
> let digits = Star (Sum (map Symbol "0123456789")) in
> re "x = (" <*> digits <*> re "+" <*> digits <*> re ");" <+> Star (re " ")
> test5 = map (matches ex5) ["1+2","7-4"]
Open Question: There are more derivatives that can be constructed from the parts I've given. What do they do? What other multiplicative homomorphisms can you define? What derivatives do they give? (Hint: consider the property "R matches only a finite language").
Oh...I almost forgot the whole point of this. It's now possible to see that McBride's l-operator is a version of the Brzozowski derivative. In fact, the theorems he proves about l are pretty similar to the theorems that get proved about the Brzozowski derivative such as Theorem 4.4 here.
Some helper functions:
splits x returns all the ways of splitting x into pairs u and v such that u ++ v == x and v is non-empty.
> splits [] = []
> splits (a:as) = ([],a:as) : map f (splits as) where
> f (x,y) = (a:x,y)
Some (incomplete) simplification rules for regular expressions
> simplify a = let b = simplify' a in if a==b then b else simplify b where
> simplify' (Prod [a]) = a
> simplify' (Prod a) | zero `elem` a = zero
> simplify' (Prod (a:b)) = case a of
> Prod x -> Prod $ x ++ map simplify b
> a -> Prod $ filter (/= one) $ map simplify (a:b)
>
> simplify' (Sum [a]) = a
> simplify' (Sum (a:b)) = case a of
> Sum x -> Sum $ x ++ map simplify b
> a -> Sum $ nub $ sort $ filter (/= zero) $ map simplify (a:b)
>
> simplify' (Star a) = Star (simplify a)
> simplify' a = a
Labels: haskell, mathematics
posted by sigfpe at Saturday, January 20, 2007 0 comments
Tuesday, January 16, 2007
Fox's Ubiquitous Free Derivative
1. Regular Expressions
Every regular expression defines a language - the set of strings exactly matching the expression. Given a language L made from the symbols in S, define ∂sL, where s is a symbol in S, to be the language made by taking all of the prefixes in L that end just before an s, deleting the final s. For example, if the language were L={MADAM} then ∂ML={∅,MADA}, ∂AL={M,MAD} and ∂DL={MA}. We can write the regular expressions for these languages as 1+MADA, M+MAD and MA. So in this case, applying ∂s trivially gives another regular language. So here's a problem: is the set of prefixes of a regular language L, ending at s, also regular?
Let's use ∂s to define an operation on regular expressions that computes the regular expression for the prefixes.
Consider a regular expression of the form ab, where a and b are regular expresions. A prefix for the language ending at symbol s either ands in the 'a' part, corresponding to something matching ∂sa, or it ends in the b part in which case it matches a∂sb.
But so far I've just mentioned finite languages. What about the * operator. Well a prefix in a* must consist of a sequence of strings matching a but end as a prefix of a. Ie. we need ∂s(a*)=a*∂sa.
So it's not hard to see (I hope) that
∂ss = 1
∂ss' = 0 for symbol s'≠s
∂s(a+b) = ∂sa+∂sb
∂s(ab) = ∂sa+a∂sb
∂s(a*)=a*∂sa
works nicely and proves that the answer to the above question is yes.
2. Dissecting types
Rather than regurgitate someone else's words, take a peek at Dissecting Data Structures. Define the operator, acting on types, ∂, by ∂ F X = Δ F X 1, where Δ is my poor man's version of del with a vertical line through it. From the definition of Δ you can see it satisfies:
∂(K A) X = 0
∂Id X = 1
∂(F+G) X=∂F X+∂G X
&part(F×G) X=∂F X×G 1+F X×∂G X
If you read McBride's paper, you'll see that the ∂ operator slices a type in half, throwing away the elements in the right half, keeping just the 'shape' of the right half.
3. Fox's Free Derivatives
Check out the definition of the Fox derivative in section 4 of Free Differential Calculus. The Fox derivative is used to compute the Alexander polynomial of a knot.
But I hope you can see that we have yet another definition that looks very similar. The common theme is differentiation with a twist. When you differentiate a product fg, one term is the usual one, fD(g), but the other term isn't D(f)g but the g is replaced by a drastically simplified version (either 1, or g(1)). Both versions 1 and 2 have a notion of chopping something in half at some point and keeping the left half. (Actually, you could tweak the definition in part 2 so that g is replaced by 1 too, then you'd no longer preserve the shape of the right half of the type and you'd really have a slicing operator, and the analogies would become clearer.)
I just need to relate the * operator in part 1 to the derivative of the inverse in the free derivative. Let me do this in a hand-waving manner: Conceptually, a*=1+a+a²+… which might be considered the same as 1/(1-a). ∂(1/(1-a)) = (using 4(ii)) 1/(1-a)∂a = a*∂a.
So what does it all mean? I've no idea. I'm just pointing out that the Fox derivative, or something like it, has what I think are hitherto unnoticed applications to computer science. Anyway, some of this will become clearer when I post some Haskell code implementing no less than five different types of derivatives of regular expressions (some time in the next week when I have a spare moment), and that should partially answer my question that I posed on λtU.
On further reflection, I can say something about what this means. The way to view derivatives is as an operation that chops things in 'half', destroying something in the middle. The Leibniz rules corresponds to the notion that when you chop a product you must either chop in the left side of the product or the right side. In the usual Leibniz rule you reassemble the pieces on either side - leaving a hole. With the Fox derivative you throw away what's on the right. With the ∂ operator in part 2 you keep only the shape on the right. And McBride's Δ operator gives you a choice of what do do with the left and right hand sides.
Labels: mathematics
posted by sigfpe at Tuesday, January 16, 2007 3 comments
Saturday, January 06, 2007
Oriented Fish and Hairy Balls
From time to time, anyone who works in computer graphics is likely to want to implement a function with this specification:
Implement a function f:R3→R3 such that for all non-zero x, f(x) is also non-zero and is perpendicular to x.
This kind of function is ubiquitous in graphics and most math libraries geared towards graphics that I have seen have some kind of implementation of this. A typical application is as follows: suppose you're animating shoals of fish. Using a variety of methods its easy to generate nice paths that guide the motion of thousands of fish. For example you could assume that the motion of the fish is approximately described by a divergence-free vector field (so that the fish don't 'pile up' anywhere) and then integrate these fields to give streamlines that the fish can follow. Unfortunately there is a slight catch with this. If you have some 3D geometry representing the fish's shape, knowing the position of the fish isn't enough to specify how it looks. You also need to give its orientation. An obvious choice is to orient the fish so that the line from tail to nose points along the flow. But you still need to be able to describe its orientation around this axis. To do this, we need to complete this orientation vector along the fish to an orthogonal basis ie. find a set of mutually orthogonal unit vectors, e1, e2, and e3 such that e1 points from tail to nose. We can easily define e3 to be e1×e2, so if we had the function f above we could write e2=f(e1).
The function f appears in a multitude of other contexts too. It's useful when orienting 'virtual' cameras. For example the user might want to specify that the camera points at a target and have the graphics system select an orientation with this property.
However, anyone who's met these problems immediately hits an issue. Try writing code to do this for yourself if you haven't already. Chances are you'll end up with a conditional branch in your code, and when you flip from one branch to another you'll see it clearly causing ugly flips in the animation. So the above spec needs to be revised to:
Implement a continuous function f:R3→R3 such that for all non-zero x, f(x) is also non-zero and is perpendicular to x.
And here's the rub: the Hairy Ball Theorem says there is no such function. So if you're having trouble writing one, it's not because you're not being ingenious enough. It just can't be done! (Though there are lots of other solutions to the animation problems I mention above.) See, topology is useful after all, it can save you wasting your time on impossible tasks.
(Funnily enough though, the best way to see that this is true is using a suitable functor...)
Actually, I think anyone who works in graphics ought to read up on the Hairy Ball Theorem, the Ham Sandwich Theorem and the Borsuk-Ulam theorem as well as learning about the double cover of SO(3) (which I'll write about here in the near future).
That was a public service announcement.
Labels: mathematics
posted by sigfpe at Saturday, January 06, 2007 6 comments
Wednesday, December 13, 2006
Why isn't ListT [] a monad?
So consider the free semiring R generated by some set S. In other words, S consists of finite sums and products of 0, 1 and elements of S and where the only simplification rules allowed are
a+0=a, a+b=b+a, a+(b+c)=(a+b)+c
a1=1, a(bc) = (ab)c
a(b+c) = ab+ac and (a+b)c = ac+bc.
For example, consider a term like ab+c(ef+gh). We can use distributivity to multiply this out to ab+cef+cgh. There's not much else we can do to simplify this. Now notice that if you multiply out all of the brackets wherever you can, ie. use distributivity until you can no longer, you end up with an expression that is a sum of terms and each term is a product of generators from S. (The sum of terms may be empty, in which case it's zero, and each of the terms may be a product of zero generators, in which case it is 1). This allows us to write elements of R in a canonical form: as a set of terms where each term is an ordered list of generators. For example, we could write ab+cef+cgh as {[a,b],[c,e,f],[c,g,h]}. Notice how the commutativity of addition is represented by the fact that we're using a set of terms, but each term is a list of generators because we're making no assumption about the commutativity of multiplication.
In Haskell it's more convenient to work with lists. Se we'll represent our running example as [['a','b'],['c','e','f'],['c','g','h']]. So if S is the Haskell type corresponding to our set of generators, then [[S]] can be thought of as the free semiring generated by elements of S, with the proviso that we consider two elements equal if one can be reordered to the other. Note that []=0 and [[]] = 1.
Now suppose that S is itself a free semiring generated by T. Then R = [[ [[T]] ]], modulo ordering. If you think about it, there's a nice way to use the algebraic structure to 'flatten' an element of [[ [[T]] ]] down to an element of [[T]]. R is freely generated by elements of S, in other words it consists of sums of products of elements of S. But the elements of S are themselves sums and products. So we have sums of products of sums of products. I emphasised two words in that sentence. This is because R contains subparts that are products of sums. If we multiply these out in the usual way, we get sums of sums of products of products. Group the sums and products together, and we get back to sums of products. Here's an example: any element of T trivially gives an element of S. So if a,b,c,e,f,g and h are all in T, then a,b,c and ef+gh are all in S and hence are generators of R, so ab+c(ef+gh) is in R. Multiply out and we obviously have the element ab+cef+cgh of S. It's not hard to see how this generalises to map any element of S back down to T. So we have maps
T -> [[T]] (by trivial inclusion of generators)
[[ [[T]] ]] -> [[T]] (by above argument)
Look to you like a monad? Of course it does! :-) But what monad is it?
First, let's implement the map [[ [[T]] ]] -> [[T]]. After tinkering I came up with
> import Control.Monad.List
> flatten x = concatMap (map concat . sequence) x
We can test it out
> x = [[ [['a']],[['b']] ], [ [['c']],[['e','f'],['g','h']] ]]
> test1 = flatten x
It multiplies out exactly the way we want.
Now compare with running this:
> x' = ListT [[ ListT [['a']],ListT [['b']] ],
> [ ListT [['c']],ListT [['e','f'],['g','h']] ]]
> test2 = runListT $ join x'
In other words, apart from the ListT fluff, join for ListT [] is flatten. So if we consider lists as a mechanism to represent sets, ListT [] is revealed as the monad of semirings. I'm not sure, but I think that historically this is where monads originally came from. Certainly there are many papers on the relationship between monads and algebraic structures like semirings.
And now I can answer my original question. In a semiring, addition is commutative, so the order of terms doesn't matter. But in ListT [], we're using lists, and order does matter in a list. So if we do take order into account, then really ListT [] is the monad of semirings where both addition and multiplication is non-commutative. And here's the problem: in general, there is no such thing as a freely generated semiring with non-commutative addition.
Here's why: consider the expression ((a+b)+(a+b))*((a+b)+(a+b)). Multiply out the inner parentheses first and we get
(a+b+a+b)*(a+b+a+b)
= a*a+a*b+a*a+a*b+...
Now multiply out the outer parentheses first and we get
(a+b)*(a+b)+(a+b)*(a+b)+(a+b)*(a+b)+(a+b)*(a+b)
= a*a+a*b+b*a+b*b+...
The terms are coming out in a different order. Essentially distributivity doesn't work in a semiring with non-commutative addition.
This translates directly into failure of one of the monad laws. First write our expression as an object of type ListT []:
u = a+b
> u = ListT [["a"],["b"]]
v = (a+b)+(a+b)
> v = ListT [[u],[u]]
w = ((a+b)+(a+b))*((a+b)*(a+b))
> w = ListT [[v,v]]
join multiplies out parentheses. So working from the outer parentheses inwards we can use:
> expanded1 = join $ join w
> go1 = runListT expanded1
Working outwards from the inner parentheses we get:
> expanded2 = join $ fmap join w
> go2 = runListT expanded2
(Note how I use fmap to 'duck down' through the outer layer of parentheses so as to multiply out each of the subexpressions first.)
Evaluate go1 and go2 and you'll see that they corresponds to the two ways of multiplying out that I gave above. And more importantly, the values of expanded1 and expanded2 aren't equal, meaning that join . join = join . fmap join isn't satisfied. You may recognise this: it's one of the monad laws. (At least it's one of the monad laws written the way category theorists write them.) So we ain't got no monad.
I think this is now a fairly complete analysis of what ListT [] is all about. So one obvious remaining question is: where do games come into all this? The answer is that games form a semiring in a way that I haven't seen documented anywhere (though is surely common knowledge). I'd explain but I've run out of time...
Note that the reason ListT [] isn't a monad is that [] isn't commutative, in some sense. This has been observed many times in the past. Two papers mentioning this are this and this.
I actually figured out all of this stuff before this. I realised that the trees I was scribbling on the backs of my envelopes to represent elements of semirings could actually be generalised to just about any kind of tree, so I wrote about the general case first.
I prefer my example of ListT [] failing to be a monad to the examples given here. The latter make use of the IO monad so they aren't quite as 'pure'.
Labels: haskell, mathematics
posted by sigfpe at Wednesday, December 13, 2006 11 comments