skip to main | skip to sidebar
Showing posts with label derive. Show all posts
Showing posts with label derive. Show all posts

Monday, July 25, 2016

Maintainer required for Derive

Summary: I'm looking for a maintainer to take over Derive. Interested?

The Derive tool is a library and set of definitions for generating fragments of code in a formulaic way from a data type. It has a mechanism for guessing the pattern from a single example, plus a more manual way of writing out generators. It supports 35 generators out of the box, and is depended upon by 75 libraries.

The tool hasn't seen much love for some time, and I no longer use it myself. It requires somewhat regular maintenance to upgrade to new versions of GHC and haskell-src-exts. There are lots of directions the tool could be taken, more derivations, integration with the GHC Generic derivation system etc. There's a few generic pieces that could be broken off (translation between Template Haskell and haskell-src-exts, the guessing mechanism).

Anyone who is interested should comment on the GitHub ticket. In the absence of any volunteers I may continue to do the regular upgrade work, or may instead have it taken out of Stackage and stop upgrading it.

Sunday, July 29, 2012

Adding the Lens derivation to Derive

Derive is a tool for generating code from Haskell data types - you specify the data type, and it automatically derives code, often instance declarations. I was asked how to add derivations for the data-lens library, and this post is the answer. I have also released derive-2.5.10 which supports Lens derivations. The Derive tool should contain derivations for as many patterns as possible, and I welcome new contributions.

Step 1: Checkout and run Derive

The first step is to fetch the code with darcs:

$ darcs get http://community.haskell.org/~ndm/darcs/derive

Then follow the instructions in the README.txt file to regenerate the derivations locally and run the test suite:

$ cd derive
$ ghci> :main --generate> :reload> :main --test

Like all my projects, Derive contains a .ghci file which sets up include paths and other useful utilities for working with each package. We first run :main --generate which autogenerates all the boilerplate in the Derive tool. We then run :reload to reload the freshly generated code, and finally :main --test which runs the test suite. In order to run the test suite it is likely that you will need to install some packages, see derive.html for a list of the likely packages.

Step 2: Add the Lens derivation

The first step when adding a derivation is to find a similar derivation (there are 35 existing derivations, so there is probably something reasonably close) and copy it. For the Lens derivation, it turns out that Ref is quite similar, so we copy Data/Derive/Ref.hs to Lens.hs and start modifying. There are several sections we need to tweak.

Firstly, we update the Haddock comments at the top of the file. These comments are purely for the users of our library, and should explain how to use the derivation.

Secondly, we modify the test section. This section is in comments just below the module header. We add import "data-lens" Data.Lens.Common then, working from the data types listed in Data/Derive/Internal/Test.hs, we modify the test to match what we expect as the output. This test section is tested with :main --test and contributes to the documentation (mainly the import line).

Thirdly, we modify the top-level names in this module, so the module name is Data.Derive.Lens and the main function is makeLens.

Step 3: Test and fix

When developing new derivations I follow test driven development. The tests are written, but the code has not been changed from the Ref derivation, so we expect the tests to fail. We follow the three standard steps in GHCi and :main --test complains with:

*** Exception: Results don't match!
Expected:
lensSpeed :: Lens Computer Int
lensSpeed = lens speed (\ x v -> v{speed = x})
Got:
refSpeed :: Ref Computer
refSpeed = Ref{select = speed, update = \ f v -> v{speed = f (speed v)}}

As expected, the test fails, showing us that the copied Ref code does not do what we want the Lens code to do. Modifying the code, it takes a few minutes to arrive at:

lensSpeed :: Lens Computer
lensSpeed = lens speed (\ x v -> v{speed = x})

The result is not quite right - the Int argument is missing from Lens, but so far we have been renaming and juggling existing code. Now we need to find the type of the field at hand, given the name of the field and the DataDecl of the type (DataDecl is from the haskell-src-exts library). Hunting around some of the Derive utility modules, particularly Language.Haskell, we can come up with:

typ = tyApps (tyCon "Lens") [dataDeclType d, fromBangType t]
Just t = lookup field $ concatMap ctorDeclFields $ dataDeclCtors d

We rerun :main --test and the test passes. This command checks that the examples match, then checks that the result of the derivation type-checks for a larger range of examples. We have now added Lens derivations to Derive.

(If you are lucky, and your derivation can be added by example, then you might not have to write any code at all - simply modifying the test case automatically generates the right code. See the Eq type class for such an example.)

Step 4: Final test

While we have satisfied the test suite, it is always reassuring to run some tests by hand. Using the Example.hs file in the root of the repo we can try:

> :main Example.hs --derive=Lens

This command prints out the expected result:

lensMemory :: Lens Computer Int
lensMemory = lens memory (\ x v -> v{memory = x})
lensSpeed :: Lens Computer Int
lensSpeed = lens speed (\ x v -> v{speed = x})
lensWeight :: Lens Computer Int
lensWeight = lens weight (\ x v -> v{weight = x})

Step 5: Send upstream

Before sending a patch, update CHANGES.txt with a one-line summary of what you have done, then you should see:

$ darcs add Data\Derive\Lens.hs
$ darcs w -s
M ./CHANGES.txt +1
M ./Data/Derive/All.hs -1 +2
A ./Data/Derive/Lens.hs
M ./derive.cabal +1
M ./derive.htm +1

Create a patch (using the standard darcs workflow) and send it to me. The more derivations the merrier.

Saturday, January 23, 2010

Optimising HLint

HLint is a tool for suggesting improvements to Haskell code. Recently I've put some effort in to optimisation and HLint is now over 20 times faster. The standard benchmark (running HLint over the source code of HLint) has gone from 30 seconds to just over 1 second. This blog post is the story of that optimisation, the dead ends I encountered, and the steps I took. I've deliberately included reasonable chunks of code in this post, so interested readers can see the whole story - less technical readers should feel free to skip them. The results of the optimisation are all available on Hackage, as new versions of hlint, uniplate and derive.

Before I start, I'd like to share my five guiding principles of optimisation:


  • Tests - make sure you have tests, so you don't break anything while optimising.

  • Profile - if you don't profile first, you are almost certainly optimising the wrong thing.

  • Necessity - only start the optimisation process if something is running too slowly.

  • Benchmark - use a sensible and representative test case to benchmark, to make sure you optimise the right thing.

  • Optimising - to make a function faster, either call it less, or write it better.



Below are the steps in the optimisation, along with their speed improvement.


Special support for Rational in Uniplate.Data, 30s to 10s

HLint uses Uniplate extensively. HLint works over large abstract syntax trees, from the library haskell-src-exts (HSE), so a generics library is essential. There are two main variants of Uniplate - Data builds on the Scrap Your Boilerplate (SYB) instances, and Direct requires special instances. HLint uses the Data variant, as it requires no instances to be written.

One of the advantages of Uniplate is that it generally outperforms most generics libraries. In particular, the variant written on top of Data instances is often many times faster than using SYB directly. The reason for outperforming SYB is documented in my PhD thesis. The essential idea is that Uniplate builds a "hit table", a mapping noting which types can be contained within which other types - e.g. that there is potentially an Int inside Either String [Int], but there isn't an Int inside String. By consulting this mapping while traversing a value, Uniplate is able to skip large portions, leading to an efficiency improvement.

When computing the hit table it is necessary for Uniplate to create dummy values of each type, which it then traverses using SYB functions. To create dummy values Uniplate uses undefined, unfortunately given the definition data Foo = Foo !Int the value Foo undefined will be forced due to the strictness annotation, and the code will raise an error - as described in bug 243. Uniplate 1.2 had a special case for Rational, which is the only type with strict components contained within HSE. Uniplate 1.3 fixed this problem more generally by catching the exception and turning off the hit table facility on such types. Unfortunately this caused Uniplate 1.3 to turn off the hit table for HSE, causing HLint to run three times slower.

The fix was simple, and pragmatic, but not general. In Uniplate 1.4 I reinstated the special case for Rational, so now HLint makes use of the hit table, and goes three times faster. A more general solution would be to manufacture dummy values for certain types (it's usually an Int or Integer that is a strict component), or to create concrete dummy values using SYB. It's interesting to observe that if HLint used SYB as it's generics library, it would not be using the hit table trick, and would run three times slower.


Use Uniplate.Direct, 10s to 5s, reverted

Uniplate also provides a Direct version, which performs better, but requires instances to be written. In order to further improve the speed of HLint I decided to try the Direct version. The biggest hurdle to using the Direct version is that many instances need to be generated, in the case of HLint it required 677. The first step was to modify the Derive tool to generate these instances (which Derive 2.1 now does), and to write a small script to decide which instances were necessary. With these instances in place, the time dropped to 5 seconds.

Unfortunately, the downside was that compilation time skyrocketed, and the instances are very specific to a particular version of HSE. While these problems are not insurmountable, I did not consider the benefit to be worthwhile, so reverted the changes. It's worth pointing out that most users of Uniplate won't require so many instances to use the Direct versions, and that a program can be switched between Direct and Data versions without any code changes (just a simple import). I also considered the possibility of discovering which Uniplate instances dominated and using the Direct method only for those - but I did not investigate further.


Add and optimise eqExpShell, 10s to 8s

The next step was to profile. I compiled and ran HLint with the following options:


ghc --make -O1 -prof -auto-all -o hlint
hlint src +RTS -p


Looking at the profiling output, I saw that the function unify took up over 60% of the execution time:


unify :: NameMatch -> Exp -> Exp -> Maybe [(String,Exp)]
unify nm (Do xs) (Do ys) | length xs == length ys = concatZipWithM (unifyStmt nm) xs ys
unify nm (Lambda _ xs x) (Lambda _ ys y) | length xs == length ys = liftM2 (++) (unify nm x y) (concatZipWithM unifyPat xs ys)
unify nm x y | isParen x || isParen y = unify nm (fromParen x) (fromParen y)
unify nm (Var (fromNamed -> v)) y | isUnifyVar v = Just [(v,y)]
unify nm (Var x) (Var y) | nm x y = Just []
unify nm x y | ((==) `on` descend (const $ toNamed "_")) x y = concatZipWithM (unify nm) (children x) (children y)
unify nm _ _ = Nothing


The function unify is an essential part of the rule matching in HLint, and attempts to compare a rule to a target expression, and if successful returns the correct substitution. Each rule is compared to every expression in all files, which means that unify is called millions of times in a normal HLint run. Looking closely, my first suspicion was the second line from the bottom in the guard - the call to descend and (==). This line compares the outer shell of two expressions, ignoring any inner expressions. It first uses the Uniplate descend function to insert a dummy value as each subexpression, then compares for equality. To test my hypothesis that this method was indeed the culprit I extracted it to a separate function, and modified unify to call it:


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell = (==) `on` descend (const $ toNamed "_")


I reran the profiling, and now all the time was being spent in eqExpShell. My first thought was to expand out the function to not use Uniplate. I quickly rejected that idea - there are expressions within statements and patterns, and to follow all the intricacies of HSE would be fragile and verbose.

The first optimisation I tried was to replace toNamed "_", the dummy expression, with something simpler. The toNamed call expands to many constructors, so instead I used Do an [] (an is a dummy source location), which is the simplest expression HSE provides. This change had a noticeable speed improvement.


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell = (==) `on` descend (const $ Do an [])


My second thought was to add a quick test, so that if the outer constructors were not equal then the expensive test was not tried. Determining the outer constructor of a value can be done by calling show then only looking at the first word (assuming a sensible Show instance, which HSE has).


eqExpShell :: Exp_ -> Exp_ -> Bool
eqExpShell x y =
((==) `on` constr) x y &&
((==) `on` descend (const $ Do an [])) x y
where constr = takeWhile (not . isSpace) . show


This change had a bigger speed improvement. I found that of the 1.5 million times eqExpShell was called, the quick equality test rejected 1 million cases.

My next thought was to try replacing constr with the SYB function toConstr. There was no noticeable performance impact, but the code is neater, and doesn't rely on the Show instance, so I stuck with it. After all these changes HLint was 2 seconds faster, but eqExpShell was still the biggest culprit on the profiling report.

Write eqExpShell entirely in SYB, 8s to 8s, reverted

My next thought was to rewrite eqExpShell entirely in SYB functions, not using the Eq instance at all. The advantages of this approach would be that I can simply disregard all subexpressions, I only walk the expression once, and I can skip source position annotations entirely. Starting from the geq function in SYB, I came up with:


data Box = forall a . Data a => Box a

eqExpShellSYB :: Exp_ -> Exp_ -> Bool
eqExpShellSYB = f
where
f :: (Data a, Data b) => a -> b -> Bool
f x y = toConstr x == toConstr y &&
and (zipWith g (gmapQ Box x) (gmapQ Box y))

g (Box x) (Box y) = tx == typeAnn || tx == typeExp || f x y
where tx = typeOf x

typeAnn = typeOf an
typeExp = typeOf ex


Unfortunately, this code takes exactly the same time as the previous version, despite being significantly more complex. My guess is that toConstr is not as fast as the Eq instance, and that this additional overhead negates all the other savings. I decided to revert back to the simpler version.

Call eqExpShell less, 8s to 4s

Having failed to optimise eqExpShell further, I then thought about how to call it less. I added a trace and found that of the 1.5 million calls, in 1.3 million times at least one of the constructors was an App. Application is very common in Haskell programs, so this is not particularly surprising. By looking back at the code for unify I found several other constructors were already handled, so I added a special case for App.


unify nm (App _ x1 x2) (App _ y1 y2) = liftM2 (++) (unify nm x1 y1) (unify nm x2 y2)


It is easy to show that if an App (or any specially handled constructor) is passed as either argument to eqExpShell then the result will be False, as if both shells had been equal a previous case would have matched. Taking advantage of this observation, I rewrote the line with the generic match as:


unify nm x y | isOther x && isOther y && eqExpShell x y = concatZipWithM (unify nm) (children x) (children y)
where
isOther Do{} = False
isOther Lambda{} = False
isOther Var{} = False
isOther App{} = False
isOther _ = True


With this change the eqExpShell function was called substantially less, it disappeared from the profile, and the speed improved to 4 seconds.

Fix Uniplate bug, 4s to 1.3s

The next step was to rerun the profiling. However, the results were very confusing - almost 70% of the execution time was recorded to three CAF's, while I could see no obvious culprit. I reran the profiler with the -caf-all flag to more precisely report the location of CAF's, and was again confused - the optimiser had rearranged the functions to make -caf-all useless. I then reran the profiler with optimisation turned off using -O0 and looked again. This time the profiling clearly showed Uniplate being the source of the CAF's. The hit table that Uniplate creates is stored inside a CAF, so was an obvious candidate.

Turning back to Uniplate, I attempted to reproduce the bug outside HLint. I enhanced the benchmarking suite by adding a method to find all the String's inside very small HSE values. The standard Uniplate benchmarks are for testing the performance of running code, and I had neglected to check the creation of the hit table, assuming it to be negligible.


testN "Module" $ Module ssi Nothing [] [] []
testN "Decl" $ FunBind ssi []
testN "Exp" $ Do ssi []
testN "Pat" $ PWildCard ssi

testN :: Biplate a String => String -> a -> IO ()
testN msg x = do
t <- timer $ evaluate $ length (universeBi x :: [String])
putStrLn $ "HSE for " ++ msg ++ " takes " ++ dp2 t ++ "s"


My initial worry was that at 2 decimal places I was likely to see 0.00 for all values. However, that turned out not to be a problem! What I saw was:


HSE for Module takes 0.54
HSE for Decl takes 0.86
HSE for Exp takes 2.54
HSE for Pat takes 0.32


These results surprised me. In particular, the hit table from Exp to String is a subset of the Module one, so should always be faster. The computation of the hit table is reasonably complex, and I was unable to isolate the problem. The tricky part of the hit table is that it is necessary to take the fixed point of the transitive closure of reachability - I had tried to keep track of recursive types and reach a fixed point with the minimal number of recomputations. I clearly failed, and probably had an exponential aspect to the algorithm that under certain circumstances caused ridiculously bad behaviour.

Rather than try to diagnose the bug, I decided instead to rethink the approach, and simplify the design. In particular, the fixed point of the transitive closure is now written as:


fixEq trans (populate box)


Where populate finds the immediate children of a constructor, trans takes the transitive closure based on the current state, and fixEq takes the fixed point. Using this new simpler design I was also able to compute which types contained with types recursively and cache it, meaning that now computing the hit table for Module not only computes the hit table for Exp, but does so in a way that means the result can be reused when asking about Exp. After rewriting the code I reran the benchmark:


HSE for Module takes 0.03
HSE for Decl takes 0.00
HSE for Exp takes 0.00
HSE for Pat takes 0.00


I had hoped that the rewrite would fix the performance problems, and it did. I have not diagnosed the original performance bug, but at 0.03 seconds I was satisfied. I have now released Uniplate 1.5 with the revised hit table code. With this change, the time for HLint drops to 1.3 seconds and all the CAF's went away from the profile.


Sharing computation, 1.3s to 1.2s

At this point, I was happy to finish, but decided to profile just one last time. The top function in the list was matchIdea:


matchIdea :: NameMatch -> Setting -> Exp_ -> Maybe Exp_
matchIdea nm MatchExp{lhs=lhs,rhs=rhs,side=side} x = do
u <- unify nm (fmap (const an) lhs) (fmap (const an) x)
u <- check u
guard $ checkSide side u
let rhs2 = subst u rhs
guard $ checkDot lhs rhs2
return $ unqualify nm $ dotContract $ performEval rhs2


This function first strips both the rule (lhs) and the expression (x) of their source position information to ensure equality works correctly. However, both the rules and expressions are reused multiple times, so I moved the fmap calls backwards so each rule/expression is only ever stripped of source position information once. With this change the runtime was reduced to 1.2 seconds.


Final Results

After all that effort, I reran the profile results and got:


parseString HSE.All 16.2 17.7
matchIdea Hint.Match 14.4 1.6
eqExpShellSlow HSE.Eq 9.2 11.4
listDecl Hint.List 6.1 4.1
lambdaHint Hint.Lambda 5.1 5.6
bracketHint Hint.Bracket 4.1 6.4
hasS Hint.Extensions 4.1 5.0
monadHint Hint.Monad 4.1 2.9
~= HSE.Match 4.1 2.5
isParen HSE.Util 3.1 0.0


Now the largest contributor to the HLint runtime is the parsing of Haskell files. There are no obvious candidates for easy optimisations, and the code runs sufficiently fast for my purposes.

Conclusions

There is still some scope for optimisation of HLint, but I leave that for future work. One possible avenue for exploration would be turning on selected packages of hints, to see which one takes the most time - profiling on a different measure.

In optimising HLint I've found two issues in Uniplate, the first of which I was aware of, and the second of which came as a total surprise. These optimisations to Uniplate will benefit everyone who uses it. I have achieved the goal of optimising HLint, simply by following the profile reports, and as a result HLint is now substantially faster than I had ever expected.

Footnote: I didn't actually profile first, as I knew that a performance regression was caused by the upgrade to Uniplate 1.3, so knew where to start looking. Generally, I would start with a profile.

Tuesday, June 16, 2009

Draft paper on Derive, comments wanted

It's been a long time since I last blogged (about 3 months). Since then I've had a paper on Firstify accepted in to the Haskell Symposium (I'll post the final version to my website shortly). I've also been writing a paper on Derive to go with my invited talk at Approaches and Applications of Inductive Programming (co-located with ICFP this year). I have to submit a final version by the 22nd of June (6 days time), but any comments on this draft would be gratefully received - either add them as comments to this post or send an email to ndmitchell AT gmail DOT com.

Download link: http://community.haskell.org/~ndm/temp/derive_draft.pdf

Title: Deriving a DSL from One Example

Abstract: Given an appropriate domain specific language (DSL), it is possible to describe the relationship between Haskell data types and many generic functions, typically type class instances. While describing the relationship is possible, it is not always an easy task. There is an alternative -- simply give one example output for a carefully chosen input, and have the relationship derived.

When deriving a relationship from only one example, it is important that the derived relationship is the intended one. We identify general restrictions on the DSL, and on the provided example, to ensure a level of predictability. We then apply these restrictions in practice, to derive the relationship between Haskell data types and generic functions. We have used our scheme in the Derive tool, where over 60% of type classes are derived from a single example.

Home page: http://community.haskell.org/~ndm/derive/

Darcs repo: http://community.haskell.org/~ndm/darcs/derive

The work presented in this paper will become the basis of Derive 2.0. Many thanks for any comments!

Monday, June 09, 2008

GSoC Hoogle: Week 2

This week I submitted my PhD thesis, emptied my entire rented house of furniture, spent £96 on petrol, drove (or was driven) 400 miles, travelled a similar distance by train, have been to the north of Scotland and am currently working on a borrowed Mac in London. Needless to say, its been rather busy - but now all the excitement is over and I should be able to focus properly on Hoogle.

In the last week I've been focusing on the database, the store of all the function names and type signatures, so a very critical piece of information. I want to support fast searching, which doesn't slow down as the number of known functions increases - a nasty property of the current version. For text searching, the trie data structure has this nice property, and can deal with searching for substrings. For fuzzy type searching, things are a lot more complex. However, I think I have an algorithm which is fast (few operations), accurate (gives better matches), scalable (independent of the number of functions in the database) and lazy (returns the best results first). The idea is to have a graph of function results, and then navigate this graph to find the best match.

Most of the database work has been theoretical, but I have done some coding. In particular, I have started on the database creation code, and polished the flag argument interaction code some more. Part of the development required the Derive tool, and in doing this work I noticed a few deficiencies. In particular, if you run Windows and run derive over a UNIX line-ending file, the tool will generate a Windows line-ending file. This problem, and a few others, are now fixed.

Next week: Database creation and searching. I want text searches to work by the end of the week.

User visible changes: The --help flag prints out information on the arguments.

PS. I was looking forward to seeing some blog posts from the other Haskell summer of code students on the Haskell Planet. If any Haskell GSoC student does have a blog, please ask for it to be included!
Subscribe to: Comments (Atom)
 

AltStyle によって変換されたページ (->オリジナル) /