skip to main | skip to sidebar

Monday, March 26, 2012

Finding Two unordered-containers Bugs

Summary: Over the past few weeks I found two bugs in unordered-containers. I strongly recommend everyone upgrades to unordered-containers-0.2.1.0 or later, which fixes these bugs.

Both Shake and Uniplate use the unordered-containers library, which provides fast Map and Set types. Both my libraries had bugs reported against them which turned out to be introduced by unordered-containers-0.2.0.0. As soon as these bugs were identified Johan released a new version (0.2.1.0) fixing the bugs, and both my packages now work as expected.

Bug 1: Shake breaks because delete fails

Just over two weeks ago Evan Laforge reported he'd started seeing "thread blocked indefinitely in an MVar operation" if there was a compile error. Shake goes to a lot of effort to clean up nicely, so something was going wrong. Since Evan's project compiles on Linux, and I only have access to Windows machines, I tried replicating it on one of my projects. I perturbed the project in several ways and did manage to replicate the error once, but the same perturbation next time made it succeed - something was clearly non-deterministic (which is the norm in a highly concurrent program such as Shake).

I'd been wanting to write a random build system generator and tester for a while, to thoroughly test Shake, and used this bug as motivation. I wrote the tester, and usually within 5 random runs (about 60 seconds of churning) the error was reproduced on my desktop. However, the error wouldn't reproduce on my laptop even after an hour (because I was on an older version of unordered-containers, I now know), and as I was off travelling I had to put the bug away to look into on my return.

On my return I investigated further. Figuring out the command line to run the random test took about an hour (note to self - write these things down). Once I had that, it still reproduced reliably in about 60 seconds. I cut out the random parts of the test that I didn't think were contributing to the bug (tests for building successfully, minimal rebuilding etc) and cut the reproduction time down to about 5 seconds. I then started augmenting the code with print statements. The cleanup code for exceptions is almost exclusively within the thread pool implementation (see Development.Shake.Pool if you are reading the source). In particular, the code that cleans up after an exception is:


t <- myThreadId
mapM_ killThread $ Set.toList $ Set.delete t $ threads s
signalBarrier done $ Just e


This code finds all threads in the thread pool, with the exception of this thread (the Set.delete t), and kills them. After that has finished, signal to the thread who first called the thread pool that everything has finished with an exception.

Adding trace statements at every step showed that the exception started being cleaned up, a handful of threads were killed, but not all the threads were killed and the barrier was never signaled. An initial guess was that my thread was being killed, and thus that Set.delete had failed to remove t. Since I had debugged a separate unordered-containers bug a week before (bug 2 started after and finished before this one), I went to the unordered-containers commit list and immediately found a bug fixed against delete. I upgraded, and ran my random tester for 3 hours without failure.

Bug 2: Uniplate breaks because filter fails

The Uniplate bug was reported just over a week ago, while I was travelling, by Aleksey Khudyakov. This bug report included a relatively small test case demonstrating a clear bug, along with the vital piece of information that the test case worked with unordered-containers-0.1, but not 0.2. With that information the question became whether Uniplate was using some non-guaranteed behaviour (such as assuming the result from Set.toList is ordered), or whether unordered-containers had a bug. The supplied test case was very sensitive - it involved three modules, and simply changing any module name caused the bug to disappear. The code was also exercising a very sensitive and complex part of Uniplate. After 12 hours of cooperative debugging between myself, Aleksey and Johan, I came up with the reduced test case:


let broken = Set.filter (`elem` good) $ Set.fromList $ good ++ bad
working = Set.fromList $ Set.toList broken
print $ Set.member (head good) broken
print $ Set.member (head good) working


This printed False/True, when it should print True/True. Of course, the choice of the good and bad lists is critical, and was based on the Hashable instance for TypeRep, which uses the fully qualified type name, and thus changes if you rename the modules.

Once I had provided Johan with the reduced test case, it was fixed the same day.

Sunday, March 04, 2012

NSIS: Windows installers through an EDSL

Summary: I've just released the NSIS library on Hackage. It's useful for building Windows installers, but is also a call-by-name embedded two-stage programming language.

I've just released a new library on Hackage, named NSIS. It's a library for building Windows installers (see an example at the top of the documentation), based on the NSIS (Nullsoft Scriptable Install System). The basic idea of NSIS is that you write a program which defines your installer - it can search for files, create registry keys, create groups of files etc. But, at its heart, it is a full programming language, merely optimised to the job of writing installers. The original NSIS system has an excellent backend, producing small but featureful installers which run well on a variety of Windows platforms. Unfortunately, the front-end programming language is probably best described as assembly code with gentle influences from scripting languages. My NSIS library defines an embedded Haskell language, of the style promoted by Lennart Augustsson, which can produce scripts to be fed into the original NSIS compiler.

Why might you want to use the original NSIS backend?

I've tried several installer generators over the past few years, and even written my own from scratch. Of all these, I prefer NSIS mainly for two reasons:


  • It's a full programming language. The installer is capable of expressing any program, which means that as you need to do custom things in the installer, you can. For example, you can prohibit installing into the Windows directory. Equally, you can calculate the 1000th prime number. You are never artificially limited by the installer.

  • The installers are slick and robust. As the NSIS documentation says, "Being a user's first experience with your product, a stable and reliable installer is an important component of successful software". NSIS consistently delivers a smooth end-user experience, provided you select the MUI2 (Modern User Interface 2) settings.



Why might you want to use my frontend?

There are many advantages to my frontend. If your script is simple it is likely to look relatively similar in either system. If your script is complex it could end up 100 lines shorter and far clearer in my system. However, there are three primary advantages.

MUI2 by default

If you are writing a simple NSIS installer, the biggest difference is that my system will use the MUI2 settings by default. As NSIS has evolved there are now three separate ways to define your user interface, using the Classic builtin commands, using MUI1 or using MUI2. Of these, MUI2 is by far the nicest and should always be used, but the Classic interface is built in by default, and MUI2 is defined on top using macros. To specify that you want to insert a particular page into the installer you can write either of:


page Directory # classic NSIS
!insertmacro MUI_PAGE_DIRECTORY # MUI2


However, with my front end, you simply write:


page Directory


Flow control

While NSIS installer scripts are very powerful, they aren't easy for script authors. Taking the example from earlier, let's warn before installing into the Windows directory or the System directory:


StrCmp $WINDIR $INSTDIR bad 0
StrCmp $SYSDIR $INSTDIR bad 0
Goto skip
bad:
MessageBox MBOK|MB_ICON_EXCLAMATION "Warning: installing into the Windows directory"
skip:


Shockingly, in 1995 someone wrote a scripting language with only goto, and hasn't added any flow control since then. In my frontend, you can write:


iff_ ("$INSTDIR" %== "$WINDIR" %|| "$INSTDIR" %== "$SYSDIR") $
alert "Warning: installing into the Windows directory"


All control flow can be accomplished with structured programming.

Variables

The original NSIS system has global mutable variables, 16 register variables and a stack - it directly mirrors assembly programming. In my system, variables are properly scoped and named. The difference for complicated functions is significant. As an example, let us define substring replacement in the original NSIS script:


Push "Hello World"
Push "Hello"
Push "Goodbye"
Call StrRep
Pop $R0
MessageBox MB_OK $R0

; Taken from http://nsis.sourceforge.net/Replace_Sub_String_(macro)
Function StrRep
;Written by dirtydingus 2003年02月20日 04:30:09
Exch $R4 ; $R4 = Replacement String
Exch
Exch $R3 ; $R3 = String to replace (needle)
Exch 2
Exch $R1 ; $R1 = String to do replacement in (haystack)
Push $R2 ; Replaced haystack
Push $R5 ; Len (needle)
Push $R6 ; len (haystack)
Push $R7 ; Scratch reg
StrCpy $R2 ""
StrLen $R5 $R3
StrLen $R6 $R1
loop:
StrCpy $R7 $R1 $R5
StrCmp $R7 $R3 found
StrCpy $R7 $R1 1 ; - optimization can be removed if U know len needle=1
StrCpy $R2 "$R2$R7"
StrCpy $R1 $R1 $R6 1
StrCmp $R1 "" done loop
found:
StrCpy $R2 "$R2$R4"
StrCpy $R1 $R1 $R6 $R5
StrCmp $R1 "" done loop
done:
StrCpy $R3 $R2
Pop $R7
Pop $R6
Pop $R5
Pop $R2
Pop $R1
Pop $R4
Exch $R3
FunctionEnd


Alternatively, with my frontend, you can write:


alert $ strReplace "Hello" "Goodbye" "Hello World"

strReplace :: Exp String -> Exp String -> Exp String -> Exp String
strReplace from to str = do
from <- constant_ from
to <- constant_ to
rest <- mutable_ str
res <- mutable_ ""
while (rest %/= "") $ do
iff (from `strIsPrefixOf` rest)
(do
res @= res & to
rest @= strDrop (strLength from) rest)
(do
res @= res & strTake 1 rest
rest @= strDrop 1 rest)
res


The difference is immense - the first is hard to follow without plenty of paper. The second is a fairly obvious imperative algorithm for string replacement (and is included in the package for reuse). Note that even though we're using a functional host language, our embedded language is very much imperative.

What technologies went into the frontend?

I really enjoyed writing this installer frontend. I got to dust off many techniques from compiler design that I haven't used for a while. Below is a flavour of some of the techniques I used:


  • Phantom types - all expressions have a phantom type, indicating what type the expression returns. In truth, all NSIS types are strings, but the phantom types let us make conversions explicit and catch user errors.

  • Call-by-name - the programming language I've implemented is call-by-name, which seems to be a particularly convenient choice for embedded languages.

  • Optimisation - since the target is a programming language, I wrote eight optimisation passes, which I iterate over. The result is that the string replacement function above becomes 21 lines of compiled code, while the original hand-coded version is 32 lines. There's no real need for the optimisation pass (the code is rarely a bottleneck or a significant size cost), but writing optimisers is fun.

  • Generics - the front end and optimisation passes are heavily based on generic programming, in particular using Uniplate.

  • Two-stage programming - the first program runs and generates a second program that is then rerun. It is possible to do computation at either level, and most errors are only caught at one or other of those levels, not both.

  • Goto programming - while I provide concepts such as iff and while, the target language is exclusively goto based, which I translate down to.

  • Overloaded literals - I use the overloaded strings and numbers extensively. Haskell doesn't permit overloaded booleans, but if it did I'd use those too.



If any of these techniques are particularly interesting, please comment below, and I'll expand on that area in a future post.

Tuesday, February 28, 2012

Four Interesting Build Tools

Summary: This post describes four interesting build systems, namely Redo, Ninja, Tup and Fabricate.

While developing Shake I investigated many existing build tools. Build tools can be divided into two categories -- those which target single-language projects with fixed rules (e.g. ocamlbuild, ghc --make, Visual Studio projects), and those which allow user specified rules (e.g. make and Shake). Focusing on the second category, the defacto standard is make, but there are many make competitors (notably Ant, CMake, Jam, Scons and Waf). Most of these tools read a list of rules, generate a dependency graph, then execute commands while traversing that graph.

Since the number of build tools is vast, I will restrict my discussion to four build tools which take different approaches (Redo, Ninja, Tup and Fabricate). Interestingly, one thing all four systems have in common is that they require a database of build data, in addition to the rules and the file system. Unlike Shake, all these build systems are limited to files.

Redo

The Redo build system has a similar dependency theory to Shake. Rules are run starting at the target. A rule may call redo-ifchange (similar to need) to ensure that this rule is repeated if any of the file arguments change. A rule can build either a specific named file, or a set of files ending with a particular extension.

While Redo has similarities to Shake, the practical implementation is significantly different. Instead of a single rule store, Redo stores each rule in a separate file, and the script language is simply shell script. The advantage of separate files is that Redo is able to depend on the actual rule used to build a result, meaning that build system changes are properly tracked. However, separating build rules makes it harder to reason about the build system, and eliminates many potential uses of abstraction. Redo does not work on Windows and has no support for multiple outputs.

Ninja

The Ninja build system is designed as a two-stage build system -- users specify their build rules in a high-level manner, which is then translated to a set of Ninja build rules. As a result, the Ninja build system is not designed to be general purpose and configuration choices are expected to be resolved by the first level. The Ninja target language supports three dependency features beyond make. Firstly, a rule can depend on the list of files contained in another file, allowing additional dependencies at build time. Secondly, the command line for each rule is tracked, resulting in a rebuild if the rule itself changes. Thirdly, a rule can generate multiple outputs, which are properly tracked.

Tup

The Tup build system is designed as an incremental build system. Tup has a similar dependency structure to make, but a significantly different implementation. Instead of scanning all dependencies, it expects the operating system to supply a list of changed files, avoiding the overhead of checking which files have changed. For large build systems the result can be a significant speed improvement if there are only a few files to rebuild. We believe a similar implementation strategy could be applied to Shake.

Another difference from make is the treatment of dead build results. If a rule to build foo is deleted from the rule list, then Tup automatically deletes the file foo. The problem of dead build results is serious, resulting in builds succeeding that should have failed, and that will fail as soon as a clean build is performed (to reduce this risk, we suggest an overnight build which starts from scratch). However, it is often useful to have build modes which generate skeleton files which are then modified by the user -- deleting these files would be most unwelcome. It would be easy to add support for deleting dead build results to Shake, but we choose not to.

Fabricate

The key innovation in the Fabricate build system is that dependencies do not need to be stated explicitly. A build system is a Python program, which primarily executes system commands in order. While executing the commands, Fabricate uses system tracing (strace on Linux) to record which files are accessed. In future runs, if the same system command is reached but none of the files it used have changed, the command is skipped. The resulting build systems are simple, and avoid the difficulties of correctly specifying dependencies.

There are two inherent difficulties for build systems without explicit dependencies. Firstly, the system tracing mechanisms on different platforms are varied, and on Windows are somewhat fragile. Secondly, parallelism cannot be inferred automatically -- Fabricate requires explicit grouping annotations to use parallelism.

Saturday, February 11, 2012

Shake: A Better Make

Summary: I have just released Shake, a library that allows you to write build systems in Haskell (think Make, but much better).

At the Haskell Implementors Workshop 2010 I described a Haskell build system named Shake (video here), hoping an implementation would be available "soon". Better late than never, I am delighted to announce that Shake is now available on Hackage. This version is a from scratch reimplementation, based on an improved underlying theory, all completed in my spare time. Several users have already experimented with this version, so it is reasonably robust.

A Simple Example

As a simple example of a Shake build system, let us build the file result.tar from the files listed by result.txt:


import Development.Shake
import Development.Shake.FilePath

main = shake shakeOptions $ do
want ["result.tar"]
"*.tar" *> \out -> do
contents <- readFileLines $ replaceExtension out "txt"
need contents
system' "tar" $ ["-cf",out] ++ contents


We start by importing the modules defining both Shake and routines for manipulating FilePath values. We define main to call shake with the default shakeOptions. As the second argument to shake, we provide a set of rules. There are two common forms of rules, want to specify target files, and *> to define a rule which builds a file pattern. We use want to require that after the build completes the file result.tar should be ready.

The *.tar rule describes how to build files with the extension .tar, including result.tar. We readFileLines on result.txt, after changing the .tar extension to .txt. We read each line into the variable contents -- being a list of the files that should go into result.tar. Next, we depend (need) all the files in contents. If any of these files change, the rule will be repeated. Finally we call the tar program. If either result.txt changes, or any of the files listed by result.txt change, then result.tar will be rebuilt.

If you are interested in writing a build system using Shake, I suggest you read the documentation. One thing noted in the documentation is that if ghc --make or cabal is capable of building your project, use that instead. Custom build systems are necessary for many complex projects, but many projects are not complex.

As an example of how to do this task in Make, see this StackOverflow question. I'll cover the differences and similarities with other build tools in a forthcoming blog post.

Sunday, January 08, 2012

Pascal's Triangle in Haskell

Summary: I'm continually amazed how concise and powerful Haskell is, compared to mainstream languages. This post describes how to write Pascal's Triangle, and gives some of the advantages of using Haskell.

Often, when programming in Haskell, I feel like I'm cheating. As an example, I recently came across this article by William Shields, which suggests that prospective interview candidates be given simple programming tasks like generating Pascal's Triangle. William gives examples in Python, including some of the answers a typical candidate might give.

Pascal's Triangle in Haskell

William describes Pascal's Triangle as:

"The root element is 1. Every other element is the sum of the one or two above it (diagonally left and diagonally right)."

As a Haskell programmer, the obvious technique to use is induction. The first row of the triangle is [1], and each row can be computed from the previous row by adding the row shifted left, and the row shifted right:


next xs = zipWith (+) ([0] ++ xs) (xs ++ [0])
pascal = iterate next [1]


Here, we define next to take one row and produce the next row. We then use the iterate function to repeatedly apply next starting at the root element. The solution is short, and follows from the definition.

Laziness for Modularity

William originally posed three questions:


  • Print out the triangle to a specific row: print $ take 100 pascal

  • Return a given row of the triangle: pascal !! 50

  • Return a given element (by row and index) of the triangle: pascal !! 10 !! 5



Thanks to laziness, we can concisely answer all these questions in terms of the original pascal definition. In contrast, using a language such as Python, the best solution (Dynamic Programming from the original article) can only perform the first task.

Interview problems

The original article was not about the choice of programming language, but about choosing suitable questions for interviewing programmers. I agree with William that Pascal's Triangle is a suitable problem - it isn't a trick puzzle, it isn't an API knowledge quiz - it's about understanding how to program. Given how much easier the problem is to solve in Haskell, I wonder if using Haskell in a job interview should be considered cheating? ;-)

Hiding Win32 Windows

Summary: This post describes how to hide windows on the Windows operating system, by using the Win32 API from Haskell.

Imagine that whenever your computer restarts, it pops up a message box:



If you interact with this window, your computer will be destroyed. You can't kill the process, or dismiss the window harmlessly. (This scenario isn't hypothetical...) The solution is to hide the window, so it still exists but is out of the way of misplaced clicks. To hide the window, we first find it's OS handle, then we call some Win32 API functions.

Find the Window

To find the handle of the window, we use Spy++. Spy++ comes with Visual Studio, and is bundled in the Express edition (the free version) from 2010 onwards. Start Spy++, got to Search, Find Window, then use the finder tool to select the window in question. Check that the caption of the window matches what Spy++ reports:



The important information is the handle: 0004061E.

Hide the Window

To hide the window you need a programming language capable of making Win32 API calls. In the past I have used Word VBA as the host language, but Haskell is probably easier. Start GHCi, and type:


$ import System.Win32
$ import Graphics.Win32
$ showWindow (castUINTToPtr 0x0004061E) sW_HIDE


Replace 0x0004061E on the final line with 0xyour-handle. The final line should cause the window to be hidden, saving your computer from destruction.

Thanks: Thanks to Roman Leshchinskiy for reminding me that there were better solutions than just trying not to click the window. Thanks to the Win32 Haskell developers - the Win32 binding was a lot of work, which not many people ever see.

Wednesday, December 14, 2011

Enabling Reply To All in Outlook - version 2

I previously described how to enable Reply To All in Outlook. Since then I've modified the VBA script so it works on both Office 2003 and 2007, gives better error messages, automatically replaces the disabled Reply To All buttons, allows editing messages (even if that feature has been disabled) and strips messages of inline images. Installation instructions and links to the code are in the user manual:

http://community.haskell.org/~ndm/darcs/office/ReplyToAll.htm

I don't want to duplicate the installation instructions here, as I will likely revise them over time. However, at the bottom of the manual is a plea to Outlook administrators not to disable Reply To All in the first place. I don't think arguments against disabling essential email functionality can be made often enough, so here is my plea:

Plea to Outlook Administrators



Please do not disable essential email functionality. With the workarounds linked to above, the attempt is futile, but remains deeply inconvenient. Consider the situation where Alice sends an email to Bob, Charlie and Dave asking for some financial details of Mega Corp. Bob has the details on a post-it note on his desk and quickly replies to everyone with the information. But in a world without Reply To All...


  • Bob replies just to Alice. Charlie, being the helpful soul he is, decides to search through the filing cabinet for the information. 15 minutes later Charlie finds the details and emails Alice, only to be told "Thanks, Bob answered this 15 minutes ago". Charlie realises he just wasted 15 minutes of his life, and goes to get a cookie to make himself feel better.

  • Bob replies just to Alice. At the end of the week Dave is reviewing his emails and realises that Alice still hasn't got a reply, but before he potentially wastes 15 minutes, he drops Alice an email - "Do you still want those details?". Alice replies "No". Dave concludes that he's a clever bunny for not going to the filing cabinet, and decides a one week latency when replying to Alice is just common sense.

  • Bob replies, but realising the potential cost of replying just to Alice, also replies to Charlie and Dave. Bob spent a minute retyping the recipient list, and wonders "Why can't Outlook have a button that does this for me?".

  • Bob replies, also including Charles and Dave. Woops! That should have been Charlie, not Charles. As a best case scenario, Charles gets annoyed with a useless email, but at worst Charles brings down Mega Corp with the sensitive information gleaned from the email. Charlie still ends up going to get a cookie.



Removing Reply To All increases the volume of email required, and increases the risk of email accidents. I've heard only two arguments against Reply To All, both of which are wrong:


  • When Bob replies to everyone, the financial information about Mega Corp gets sent to Charlie, but what if Charlie shouldn't have access to that information? Of course, if Charlie shouldn't have access to that information then Alice was at fault for sending the request to Charlie in the first place. Bob should also check when sending sensitive information, but most emails are not sensitive (it is a poor default), checking the senders should not require retyping the senders (it is a poor user interface) and by default mailing lists are not fully expanded (so he won't be able to tell the full list of senders anyway).

  • When Zorg, the owner of Mega Corp, sends a Christmas email to all his employees, what if one of them hits Reply To All? That wastes a lot of company time, and should be prevented - but not on the client side. It is possible to restrict the number of recipients to an email, Zorg could send out the email with everyone in the BCC field, or IT could set up a mailing list which does not permit replies.

Saturday, November 05, 2011

Abstract Generics with Uniplate

Summary: The new version of Uniplate has several wrappers for working with abstract data types, such as Map from the containers package. These wrappers let you transform abstract data types without breaking their invariants.

Abstract data types, such as Map from the containers package, hide their internal structure so they can maintain invariants necessary for their correct operation. Generic programming, such as the Data class used by SYB, allows programs to generically traverse data types, without detailed knowledge of their structure. The two are somewhat incompatible - if Map allows manipulating it's internal structure, the invariants could be broken causing Map to perform incorrectly.

Using the Data class, a Map String Int claims it has no constructors, but if you operate on it with gfoldl is behaves as though it were [(String,Int)]. When Uniplate traverses the Map, such as when searching for an Int, it first analyses the available constructors to see if a Map String Int can possibly contain an Int. Since Map has no constructors, it concludes that Map cannot contain an Int, and Uniplate fails to operate on the contained Int's.

For people who use the Data-style Uniplate module, using the new version of Uniplate you can now correctly operate over Map String Int, provided you use the newtype Map wrapper from Data.Generics.Uniplate.Data.Instances. When you transform over Bool (which does not touch the Map) it will ignore the Map and take O(1). When you transform over Int it will reconstruct the Map in O(n), and if you transform String or Char it will reconstruct the Map in O(n log n). Regardless of what operations you do, it will work efficiently and correctly. As an example:


$ import qualified Data.Map as Map
$ import Data.Char
$ import Data.Generics.Uniplate.Data
$ import Data.Generics.Uniplate.Data.Instances
$ fromMap $ transformBi toUpper $ toMap $ Map.fromList [("haskell",12),("test",18)]
fromList [("HASKELL",12),("TEST",18)]


There are two approaches for dealing with the Map problem in Uniplate. For users of the Direct-style Uniplate module, there is a function plateProject, which has been available for some time. For users of the Data-style Uniplate module, or for users of the SYB package, there is a new module Data.Generics.Uniplate.Data.Instances in uniplate-1.6.4 (released today) which provides three types with special Data instances (Hide, Trigger, Invariant). Using these three types we can construct wrappers providing Data instances for abstract types, and Uniplate includes wrappers for several of the types in the containers package (Map, Set, IntMap, IntSet).

plateProject

The plateProject function helps you define Direct Uniplate instances for abstract types. Instead of defining how to examine the data type, you instead define how to transform the data type into one you can examine, and how to transform it back. As an example:


instance Biplate (Map.Map [Char] Int) Char where
biplate = plateProject Map.toList Map.fromList


If the types ensure that any operations will not change the keys we can optimise and use the fromDistictAscList function to reconstruct the Map:


instance Biplate (Map.Map [Char] Int) Int where
biplate = plateProject Map.toAscList Map.fromDistinctAscList


Hide

The Hide data type is useful for wrapping values that you want to ignore with Uniplate. The type is defined as:


data Hide a = Hide {fromHide :: a}


But has a Data instance that pretends it is defined using the extension EmptyDataDecls as:


data Hide a


As an example of avoiding particular values, you can write:


transformBi (+1) (1, 2, Hide 3, Just 4) == (2, 3, Hide 3, Just 5)


As a result of having no constructors, any calls to the methods toConstr or gunfold will raise an error.

Trigger

The Trigger data type is useful for determining when a value was constructed with the Data methods. It is defined as:


data Trigger a = Trigger {trigger :: Bool, fromTrigger :: a}


But the Data instance pretends that it is defined as:


data Trigger a = Trigger a


However, whenever gfoldl or gunfold constructs a new value, it will have the trigger field set to True. The trigger information is useful to indicate whether any invariants have been broken, and thus need fixing. As an example:


data SortedList a = SortedList (Trigger [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Trigger False $ sort xs
fromSortedList (SortedList (Trigger t xs)) = if t then sort xs else xs


This data type represents a sorted list. When constructed the items are initially sorted, but operations such as gmapT could break that invariant. The Trigger type is used to detect when the Data operations have been performed, and resort the list.

The Trigger type is often used in conjunction with Invariant, which fixes the invariants.

Invariant

The Invariant data type is useful for ensuring that an invariant is always applied to a data type. It is defined as:


data Invariant a = Invariant {invariant :: a -> a, fromInvariant :: a}


But the Data instance pretends that it is defined as:


data Invariant a = Invariant a


Whenever a gfoldl constructs a new value, it will have the function in the invariant field applied to it. As an example:


data SortedList a = SortedList (Invariant [a]) deriving (Data,Typeable)
toSortedList xs = SortedList $ Invariant sort (sort xs)
fromSortedList (SortedList (Invariant _ xs)) = xs


Any time an operation such as gmapT is applied to the data type, the invariant function is applied to the result. The fromSortedList function can then rely on this invariant.

The gunfold method is partially implemented - all constructed values will have an undefined value for all fields, regardless of which function is passed to fromConstrB. If you only use fromConstr (as Uniplate does) then the gunfold method is sufficient.

Map

Using the Hide, Trigger and Invariant types, we can define a wrapper for the containers Map type as:


newtype Map k v = Map (Invariant (Trigger [k], Trigger [v], Hide (Map.Map k v)))
deriving (Data, Typeable)


The Map type is defined as an Invariant of three components - the keys, the values, and the underlying Map. We use Invariant to ensure that the keys/values/map always remain in sync. We use Trigger on the keys and values to ensure that whenever the keys or values change we rebuild the Map, but if they don't, we reuse the previous value. The function to extract a containers Map from the wrapper requires only simple pattern matching:


fromMap (Map (Invariant _ (_,_,Hide x))) = x


The function to wrap a containers Map is slightly harder, as we need to come up with an invariant restoring function:


toMap :: Ord k => Map.Map k v -> Map k v
toMap x = Map $ Invariant inv $ create x
where
create x = (Trigger False ks, Trigger False vs, Hide x)
where (ks,vs) = unzip $ Map.toAscList x

inv (ks,vs,x)
| trigger ks = create $ Map.fromList $ zip (fromTrigger ks) (fromTrigger vs)
| trigger vs = create $ Map.fromDistinctAscList $ zip (fromTrigger ks) (fromTrigger vs)
| otherwise = (ks,vs,x)


The create function creates a value from a Map, getting the correct keys and values. The inv function looks at the triggers on the keys/values. If the keys trigger has been tripped, then we reconstruct the Map using fromList. If the values trigger has been tripped, but they keys trigger has not, we can use fromDistinctAscList, reducing the complexity of constructing the Map. If nothing has changed we can reuse the previous value.

The end result is that all Uniplate (or SYB) traversals over Map result in a valid value, which has had all appropriate transformations applied.

Tuesday, November 01, 2011

Haskell Implementors Workshop 2011

The videos from the Haskell Implementors Workshop 2011 are now online at YouTube. I'm currently uploading them to Vimeo, but that will take another week (I ran out of free quota).

Many thanks to Kento EMOTO who did all the editing and conversion, and to all the speakers for the great talks.

Sunday, October 30, 2011

Calling Haskell from R

Summary: This post describes how to write a function in Haskell, and then call it from R.

R is a very popular language for statistics, particular with biologists (and computational paleobiologists). For writing high performance code, the R developers recommend the use of C or Fortran - not languages that are particularly easy for beginners. However, you can instead write a Haskell function that can be called directly from R. The basic idea is to create a C compatible library using Haskell (as described in the GHC users manual) and then call that library from R (as described in this document). As a simple example, let's write a function that adds up the square roots of a list of numbers.

Create an R-compatible Haskell library

In normal Haskell, we would define the function to add up the square roots of a list as:

sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


However, to make a function that is compatible with R, we have to follow two rules:

  • Every argument must be a Ptr to a C compatible type, typically Int, Double or CString. (To be pedantic, we should probably use CInt or CDouble, but using GHC on Windows these types are equivalent - keeping things simpler.)
  • The result must be IO ()


Obeying these restrictions, we need to use the type:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = ...


Instead of passing in the list xs, we now pass in:

  • n, the length of the list xs
  • xs, the elements of the list
  • result, a space to put the result


We can implement sumRootsR by using the functions available in the Foreign module:

sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n
xs <- peekArray n xs
poke result $ sumRoots xs


This function first gets the value for n, then for each element in 0..n-1 gets the element out of the pointer array xs and puts it in a nice list. We then call the original sumRoots, and store the value in the space provided by result. As a general rule, you should put all the logic in one function (sumRoots), and the wrapping in another (sumRootsR). We can then export this function with the definition:

foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()


Putting everything together, we end up with the Haskell file:

-- SumRoots.hs
{-# LANGUAGE ForeignFunctionInterface #-}
module SumRoots where
import Foreign
foreign export ccall sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR :: Ptr Int -> Ptr Double -> Ptr Double -> IO ()
sumRootsR n xs result = do
n <- peek n xs <- peekArray n xs poke result $ sumRoots xs sumRoots :: [Double] -> Double
sumRoots xs = sum (map sqrt xs)


We also need a C stub file. The one described in the GHC users guide works well:

// StartEnd.c
#include <Rts.h>
void HsStart()
{
int argc = 1;
char* argv[] = {"ghcDll", NULL}; // argv must end with NULL
// Initialize Haskell runtime
char** args = argv;
hs_init(&argc, &args);
}
void HsEnd()
{
hs_exit();
}


We can now compile our library with the commands:

ghc -c SumRoots.hs
ghc -c StartEnd.c
ghc -shared -o SumRoots.dll SumRoots.o StartEnd.o


This creates the library SumRoots.dll.

Calling Haskell from R

At the R command prompt, we can load the library with:

dyn.load("C:/SumRoots.dll") # use the full path to the SumRoots library
.C("HsStart")


We can now invoke our function:

input = c(9,3.5,5.58,64.1,12.54)
.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result


This prints out the answer 18.78046.

We can make this function easier to use on the R side by writing a wrapper, for example:

sumRoots <- function(input) { return(.C("sumRootsR", n=as.integer(length(input)), xs=as.double(input), result=as.double(0))$result) } 
Now we can write:
sumRoots(c(12,444.34))
And get back the answer 24.54348. With a small amount of glue code, it's easy to call Haskell libraries from R programs.

Update: See the comments below from Alex Davis for how to do such things on newer versions of Mac OS.

Saturday, October 01, 2011

Haskell Implementors Workshop 2011 slides

I've just uploaded the slides for all the talks given at the Haskell Implementors Workshop 2011 - just click on "Slides" next to any of the talks. Many thanks to all the speakers for very interesting talks, and for sending me their slides.

There will be video, hosted on either YouTube or Vimeo, at some point in the future. I've got all the copyright permission forms, but I don't yet have the video footage. I'm currently working to get a copy of the video sent from Japan.

Template Haskell fights with Generic Programming

Summary: The InfixE construction in Template Haskell fits poorly with generic programming, because its type does not capture all its restrictions. This mismatch can result in confusing bugs, but there is a simple workaround.

I have often said that anyone manipulating abstract syntax trees, without using some form of generic programming, is doing it wrong. Recently I have been manipulating Template Haskell syntax trees using Uniplate, my preferred generic programming library. Consider the problem of replacing all instances of delete with deleteBy (==) - this task can be done with Template Haskell:


{-# LANGUAGE TemplateHaskell #-}
module IHateDelete where
import Data.List
import Language.Haskell.TH
import Data.Generics.Uniplate.Data

iHateDelete :: Q [Dec] -> Q [Dec]
iHateDelete = fmap (transformBi f)
where
f :: Exp -> Exp
f (VarE x) | x == 'delete = VarE 'deleteBy `AppE` VarE '(==)
f x = x


We can test this function with:


{-# LANGUAGE TemplateHaskell #-}
import IHateDelete
import Data.List

$(iHateDelete
[d|
mapDelete x = map (delete x)
myElem x xs = length (delete x xs) /= length xs
|])


To see the result of running iHateDelete pass the flag -ddump-splices. As far as we can tell, our iHateDelete function works perfectly. But wait - let's try another example:


$(iHateDelete
[d|
myDelete x xs = x `delete` xs
|])


In GHC 6.12, we get a GHC panic. In GHC 7.2 we get the error message:


Operator application with a non-variable operator: deleteBy (==)
(Probably resulting from a Template Haskell splice)


(I would find this message far clearer if it said "Infix application..." rather than "Operation application...")

The body of myDelete starts out as:


InfixE (Just (VarE 'x)) (VarE 'delete) (Just (VarE' xs))


After our transformation, this becomes:


InfixE (Just (VarE 'x)) (AppE (VarE 'deleteBy) ('(==))) (Just (VarE' xs))


Or, written in Haskell syntax:


x `deleteBy (==)` xs


This expression is not valid Haskell, and causes an error when spliced back in (when inserted back into the Haskell code).

The Problem

The underlying problem is called out in the Template Haskell Exp documentation:


InfixE (Maybe Exp) Exp (Maybe Exp)
-- ^ It's a bit gruesome to use an Exp as the operator, but how else can we distinguish constructors from non-constructors?
-- Maybe there should be a var-or-con type? Or maybe we should leave it to the String itself?


The operator in InfixE has a type which permits any expression, but has the restriction that when spliced back in the expression must only be a VarE or ConE. This restriction poses a problem for generic programming, where values are treated in a uniform manner. Sadly, both of the suggested fixes would also cause problems for generic programming. Perhaps the true fix is to let Haskell have arbitrary expressions for infix operators? Or perhaps Template Haskell should translate InfixE to AppE if the operator is incompatible with Haskell?

The Workaround

As a workaround, you can translate away all InfixE expressions that have a complex middle expression. I use the following function:


fixupInfix :: [Dec] -> [Dec]
fixupInfix = transformBi f
where
bad VarE{} = False
bad ConE{} = False
bad _ = True

f (InfixE a b c) | bad b = case (a,c) of
(Nothing, Nothing) -> b
(Just a , Nothing) -> b `AppE` a
(Nothing, Just c ) -> VarE 'flip `AppE` b `AppE` c
(Just a , Just c ) -> b `AppE` a `AppE` c
f x = x


The original iHateDelete can then be modified to become:


iHateDelete = fmap (fixupInfix . transformBi f)


.

Tuesday, September 06, 2011

Faster WPF ComboBoxes

If you create a WPF ComboBox with 2000 items it will take about 2 seconds to drop down, and about 1 second to retract back. But you can make all operations complete instantly if the items are simple (such as strings). If you are writing XAML, add:


<ComboBox>
<ComboBox.ItemsPanel>
<ItemsPanelTemplate>
<VirtualizingStackPanel />
</ItemsPanelTemplate>
</ComboBox.ItemsPanel>

...
</ComboBox>


If you are writing C#, add:


comboBox.ItemsPanel = new ItemsPanelTemplate(new FrameworkElementFactory(typeof(VirtualizingStackPanel)));


Thanks to Henry Hahn for the XAML code I started from. Reading the C# version, I am reminded of the Hammer Factory joke.

Sunday, September 04, 2011

Sharing in Haskell

Summary: The let and lambda constructs give a precise way to control sharing, but their exact use can be tricky. This post gives some worked examples and general guidance.

An easy way to improve performance is to call something fewer times, which requires understanding how many times something gets called. One topic I find myself regularly explaining is how lambda expressions under let expressions affect sharing. Consider the two following examples:

Example 1


f x y = sqrt x + y
result = f 1 2 + f 1 4


Example 2


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y
result = let f_1 = f 1 in f_1 2 + f_1 4


Question

In each example, how many times is sqrt executed to compute result? (Assume no advanced optimisations - these often break down on larger examples.)

Answer

In Example 1 we execute sqrt twice, while in Example 2 we execute sqrt once. To go from Example 1 to Example 2 we need to make two changes:


  • Step 1: Rewrite f to compute sqrt after one argument instead of two.

  • Step 2: Rewrite result to share the result of f with one argument.



Performing either rewrite alone will still result in sqrt being executed twice.

Step 1: Rewriting f

Let's take a look at the original definition of f:


f x y = sqrt x + y


Rewriting this function in English, we can describe it as:


given x and y, compute sqrt x + y


But the computation of sqrt x does not depend on y. If the computation of sqrt x is expensive, and if we know the function will often be called with the same x for many different values of y, it is better to describe it as:


given x, compute sqrt x, then given y, add that value to y


The Haskell syntax for this description is:


f = \x -> let sqrt_x = sqrt x in \y -> sqrt_x + y


Which would usually be written in the equivalent declaration form as:


f x = let sqrt_x = sqrt x in \y -> sqrt_x + y


Step 2: Using the rewritten f

If we look at the definition of result:


result = f 1 2 + f 1 4


We see that the subexpression f 1 occurs twice. We can perform common subexpression elimination (CSE) and write:


result = let f_1 = f 1 in f_1 2 + f_1 4


With the original definition of f, commoning up f 1 would have had no performance benefit - after f was applied to 1 argument it did nothing but wait for the second argument. However, with the revised definition of f, the value f_1 will create the computation of sqrt 1, which will be performed only once when executed by f_1 2 and f_1 4.

The Optimisation

This optimisation technique can be described as:


  • Step 1: Rewrite the function to perform some computation before all arguments are supplied.

  • Step 2: Share the partially applied function.



Crucially the function in Step 1 must take it's arguments in an order that allows computation to be performed incrementally.

A Practical Example

In previous versions of Hoogle, the function I wrote to resolve type synonyms (e.g. type String = [Char]) was:


resolveSynonyms :: [Synonym] -> TypeSig -> TypeSig


Given a list of type synonyms, and a type signature, return the type signature with all synonyms expanded out. However, searching through a list of type synonyms is expensive - it is more efficient to compute a table allowing fast lookup by synonym name. Therefore, I used the optimisation technique above to write:


resolveSynonyms synonyms = let info = buildSynonymTable synonyms in \x -> transformSynonyms info x


This technique worked well, especially given that the list of synonyms was usually constant. However, from simply looking at the type signatures, someone else is unlikely to guess that resolveSynonyms should be partially applied where possible. An alternative is to make the sharing more explicit in the types, and provide:


data SynonymTable
buildSynonymTable :: [Synonym] -> SynonymTable
resolveSynonyms :: SynonymTable -> TypeSig -> TypeSig


The disadvantage is the increase in the size of the API - we have gone from one function to two functions and a data type. Something that used to take one function call now takes two.

Conclusion

I think all Haskell programmers benefit from understand how the interaction of lambda and let affect sharing. Pushing lambda under let is often a useful optimisation technique, particularly when the resulting function is used in a map. However, I wouldn't usually recommend exporting public API's that rely on partial application to get acceptable performance - it's too hard to discover.

Friday, August 19, 2011

IFL 2011 submission

Summary: Submit to IFL 2011. It's a great breeding ground for ideas.

The IFL 2011 submission deadline has been extended until 31st August, and, in a move I thoroughly agree with, the notification of acceptance is now within 24 hours of submission!

IFL (and TFP) do not work in the same way as ICFP/Haskell Symposium. You submit a paper, describing the work you've done (something functional programming related) and then present the paper. After the conference, you resubmit the paper, and the result is reviewed in greater depth. There is a great opportunity to learn from the experience, and produce a second paper that is far better.

I submitted my original supercompilation work to IFL 2007. You can compare the paper I submitted before the conference with the paper I submitted afterwards. Note that the original paper doesn't even mention the word supercompilation! At the conference I talked to many people, learnt a lot about supercompilation, program optimisation and termination orderings (and lots of other interesting topics). Using this experience, and building on the enthusiasm people had expressed, I was able to produce a much better second paper. My work on supercompilation went into my thesis, and lead to a subsequent ICFP paper.

I think that IFL/TFP are great venues for to present your work, and in presenting and discussing it, to understand more about the work - making it better in the process. You have just under 2 weeks to make the IFL 2011 deadline of 31st August.

Sunday, July 17, 2011

Submit to the Haskell Implementors Workshop

Are you going to ICFP 2011 in Japan? Do you work with Haskell? You should consider offering a talk for the Haskell Implementors Workshop! You have until this coming Friday (22nd July 2011) to offer a talk, by emailing an abstract (less than 200 words) to benl -AT- cse.unsw.edu.au - for full details see the Call for Talks. Talks are 20 minutes long, plus 10 minutes for questions.

What topics are suitable?

If you have been doing or thinking stuff that would interest a bunch of Haskell implementors in a pub, you probably have something suitable for a talk. The aim of the workshop is to provide an "opportunity to bat around ideas, share experiences, and ask for feedback from fellow experts". Unlike ICFP or the Haskell Symposium, there are no published proceedings. If you have some work that isn't quite ready for a more formal setting, this workshop is a useful venue to seek feedback. If you have an idea which seems a bit crazy, it's probably also very interesting!

What if I have less to say?

In addition to the 30 minute talks, the Haskell Implementors Workshop will also have 2-10 minute lightening talks, organised on the day. Attend the workshop, and sign up on the day to give a brief talk.

Saturday, June 18, 2011

Changing the Windows titlebar gradient colors

Summary: This post describes how to change the gradient colors of the titlebar of a single window on Windows XP, using C#.

On Windows XP the titlebar of most windows have a gradient coloring:



For various reasons, I wanted to override the titlebar gradient colors on a single window. The SetSysColors function can change the colors, but that applies to all windows and immediately forces all windows to repaint, which is slow and very flickery. You can paint the titlebar your self (like Chrome does), but then you need to draw and handle min/max buttons etc. After some experimentation, using the unofficial SetSysColorsTemp function, I was able to change the gradient for a single window. As an example, here are some screenshots, overriding the left color with green, then overriding both the colors with green/blue and orange/blue:



Disclaimer: This code uses an undocumented function, and while it seems to work robustly on my copy of Windows XP, it unlikely to work on future versions of Windows. Generally, it is a bad idea to override user preferences - for example the titlebar text may not be visible with overridden gradient colors.

All the necessary code is available at the bottom of this post, and can be used by anyone, for any purpose. The crucial function is SetSysColorsTemp, which takes an array of colors and an array of brushes created from those colors. If you call SetSysColorsTemp passing a new array of colors/brushes they will override the global system colors until a reboot, without forcing a repaint. Passing null for both arrays will restore the colors to how they were before. To make the color change only for the current window we hook into WndProc , and detect when the colors are about to be used in a paint operation. We set the system colors before, call base.WndProc which paints the titlebar (using our modified system colors), then put the colors back.

There are two known issues:

1) If you change the colors after the titlebar has drawn, it will not use the colors until the titlebar next repaints. I suspect this problem is solvable.
2) As you can see in the demo screenshots, the area near the min/max buttons does not usually get recolored. If you only change the left gradient color this doesn't matter.



public class GradientForm : Form
{
// Win32 API calls, often based on those from pinvoke.net
[DllImport("gdi32.dll")] static extern bool DeleteObject(int hObject);
[DllImport("user32.dll")] static extern int SetSysColorsTemp(int[] lpaElements, int [] lpaRgbValues, int cElements);
[DllImport("gdi32.dll")] static extern int CreateSolidBrush(int crColor);
[DllImport("user32.dll")] static extern int GetSysColorBrush(int nIndex);
[DllImport("user32.dll")] static extern int GetSysColor(int nIndex);
[DllImport("user32.dll")] private static extern IntPtr GetForegroundWindow();

// Magic constants
const int SlotLeft = 2;
const int SlotRight = 27;
const int SlotCount = 28; // Math.Max(SlotLeft, SlotRight) + 1;

// The colors/brushes to use
int[] Colors = new int[SlotCount];
int[] Brushes = new int[SlotCount];

// The colors the user wants to use
Color titleBarLeft, titleBarRight;
public Color TitleBarLeft{get{return titleBarLeft;} set{titleBarLeft=value; UpdateBrush(SlotLeft, value);}}
public Color TitleBarRight{get{return titleBarRight;} set{titleBarRight=value; UpdateBrush(SlotRight, value);}}

void CreateBrushes()
{
for (int i = 0; i < SlotCount; i++)
{
Colors[i] = GetSysColor(i);
Brushes[i] = GetSysColorBrush(i);
}
titleBarLeft = ColorTranslator.FromWin32(Colors[SlotLeft]);
titleBarRight = ColorTranslator.FromWin32(Colors[SlotRight]);
}

void UpdateBrush(int Slot, Color c)
{
DeleteObject(Brushes[Slot]);
Colors[Slot] = ColorTranslator.ToWin32(c);
Brushes[Slot] = CreateSolidBrush(Colors[Slot]);
}

void DestroyBrushes()
{
DeleteObject(Brushes[SlotLeft]);
DeleteObject(Brushes[SlotRight]);
}

// Hook up to the Window

public GradientForm()
{
CreateBrushes();
}

protected override void Dispose(bool disposing)
{
if (disposing) DestroyBrushes();
base.Dispose(disposing);
}

protected override void WndProc(ref System.Windows.Forms.Message m)
{
const int WM_NCPAINT = 0x85;
const int WM_NCACTIVATE = 0x86;

if ((m.Msg == WM_NCACTIVATE && m.WParam.ToInt32() != 0) ||
(m.Msg == WM_NCPAINT && GetForegroundWindow() == this.Handle))
{

int k = SetSysColorsTemp(Colors, Brushes, Colors.Length);
base.WndProc(ref m);
SetSysColorsTemp(null, null, k);
}
else
base.WndProc(ref m);
}
}

Tuesday, June 07, 2011

INLINE pragmas in the Safe library

Summary: The Safe library has lots of small functions, none of which have INLINE pragmas on them. The lack of INLINE pragmas probably has no effect on the optimisation.

I was recently asked a question about the Safe library:


I was wondering why you didn't use any INLINE pragmas in the Safe library. I'm not a Haskell expert yet, but I've noticed that in many other libraries most one-liners are annotated by INLINE pragmas, so is there any reason you didn't add them to Safe as well?


When compiling with optimisations, GHC tries to inline functions/values that are either "small enough" or have an INLINE pragma. These functions will be inlined within their module, and will also be placed in their module's interface file, for inlining into other modules. The advantage of inlining is avoiding the function call overhead, and possibly exposing further optimisations. The disadvantage is that the code size may grow, which may result in poor utilisation of the processor's instruction cache.

The Safe library contains wrappers for lots of partial Prelude and Data.List functions (i.e. head), with versions that don't fail, or fail with more debugging information. Using the tail function as an example, the library supplies four additional variants:


  • tail :: [a] -> [a], crashes on tail []

  • tailNote :: String -> [a] -> [a], takes an extra argument which supplements the error message

  • tailDef :: [a] -> [a] -> [a], takes a default to return instead of errors

  • tailMay :: [a] -> Maybe [a], wraps the result in a Maybe

  • tailSafe :: [a] -> [a], returns some sensible default if possible, [] in the case of tail



As the questioner correctly noted, there are no INLINE pragmas in the library. Taking the example of tailSafe, it is defined as:


tailSafe :: [a] -> [a]
tailSafe = liftSafe tail null

liftSafe :: (a -> a) -> (a -> Bool) -> (a -> a)
liftSafe func test val = if test val then val else func val


Given this indirection, and the lack of an INLINE pragma, is calling tailSafe as cheap as you might hope? We can answer this question by looking at the interface file at the standard optimisation level, by running ghc -ddump-hi Safe.hs -O -c. Looking at the definition attached to tailSafe, we see:


tailSafe = \val -> case val of
[] -> []
ds1:ds2 -> ds2


The tailSafe function has been optimised as much as it can be, and will be inlined into other modules. Adding an INLINE pragma would have no effect at standard optimisation. When compiling without optimisation, even with an INLINE pragma, the function is not included in the module's interface. Adding the INLINE pragma to tailSafe is of no benefit.

Let us now look at all functions in the Safe module. When compiled without optimisation (-O0) no functions are placed in the interface file. At all higher levels of optimisation (-O1 and -O2) the interface files are identical. Looking through the functions, only 6 functions are not included in the interface files:

abort

The abort function is defined as:


abort = error


Neither adding an INLINE pragma or eta expanding (adding an additional argument) includes the function in the interface file. I suspect that GHC decides no further optimisation is possible, so doesn't ever inline abort.

atMay and $watMay

The function atMay is recursive, and thus cannot be inlined even when it's definition is available, so GHC sensibly omits it from the interface file. The function $watMay is the worker/wrapper definition of atMay, which is also recursive.

readNote and readMay

The functions readNote and readMay both follow exactly the same pattern, so I describe only readNote. The function readNote is quite big, and GHC has split it into two top-level functions - producing both readNote and readNote1. The function readNote is included in the interface file, but readNote1 is not. The result is that GHC will be able to partially inline the original definition of readNote - however, the read functions tend to be rather complex, so GHC is unlikely to benefit from the additional inlining it has managed to acheive.

atNote

The atNote function is bigger than the inlining threshold, so does not appear in the interface file unless an INLINE pragma is given. Since the inner loop of atNote is recursive, it is unlikely that inlining would give any noticable benefit, so GHC's default behaviour is appropriate.

Conclusion

Having analysed the interface files, I do not think it is worth adding INLINE pragmas to the Safe library - GHC does an excellent job without my intervention. My advice is to only add INLINE pragmas after either profiling, or analysing the resulting Core program.

Sunday, May 22, 2011

Hoogle talk from TFP 2011 [PDF]

Last week I went to TFP 2011, and gave a talk on Hoogle entitled "Finding Functions from Types". The slides are now available online. These slides give some information about how the type searching works in Hoogle, and I intend to write further details in the future.

Sunday, May 08, 2011

CmdArgs is not dangerous

Summary: CmdArgs can be used purely, and even if you choose to use it in an impure manner, you don't need to worry.

As a result of my blog post yesterday, a few people have said they have been put off using CmdArgs. There are three reasons why you shouldn't be put off using CmdArgs, even though it has some impure code within it.

1: You can use CmdArgs entirely purely

You do not need to use the impure code at all. The module System.Console.CmdArgs.Implict provides two ways to write annotated records. The first way is impure, and uses cmdArgs and &=. The second way is pure, and uses cmdArgs_ and +=. Both ways have exactly the same power. For example, you can write either of:


sample = cmdArgs $
Sample{hello = def &= help "World argument" &= opt "world"}
&= summary "Sample v1"

sample = cmdArgs_ $
record Sample{} [hello := def += help "World argument" += opt "world"]
+= summary "Sample v1"


The first definition is impure. The second is pure. Both are equivalent. I prefer the syntax of the first version, but the second version is not much longer or uglier. If the impurities scare you, just switch. The Implicit module documentation describes four simple rules for converting between the two methods.

2: You do not need to use annotated records at all

Notice that the above module is called Implicit, CmdArgs also features an Explicit version where you create a data type representing your command line arguments (which is entirely pure). For example:


arguments :: Mode [(String,String)]
arguments = mode "explicit" [] "Explicit sample program" (flagArg (upd "file") "FILE")
[flagOpt "world" ["hello","h"] (upd "world") "WHO" "World argument"
,flagReq ["greeting","g"] (upd "greeting") "MSG" "Greeting to give"
,flagHelpSimple (("help",""):)]
where upd msg x v = Right $ (msg,x):v


Here you construct modes and flags explicitly, and pass functions which describe how to update the command line state. Everything in the Implicit module maps down to the Explicit module. In addition, you can use the GetOpt compatibility layer, which also maps down to the Explicit parser.

Having written command line parsers with annotated records, I'd never go back to writing them out in full. However, if you want to map your own command line parser description down to the Explicit version you can get help messages and parsing for free.

3: I fight the optimiser so you don't have to

Even if you choose to use the impure implicit version of CmdArgs, you don't need to do anything, even at high optimisation levels. I have an extensive test suite, and will continue to ensure CmdArgs programs work properly - I rely on it for several of my programs. While I find the implicit impure nicer to work with, I am still working on making a pure version with the same syntax, developing alternative methods of describing the annotations, developing quasi-quoters etc.
Subscribe to: Comments (Atom)
 

AltStyle によって変換されたページ (->オリジナル) /