The Mercury team have written quite a few papers about the Mercury programming language, its implementation techniques, design, theoretical basis and other topics. In addition, we have written several papers on related topics. Almost all are available here as PostScript files, compressed using gzip, or as PDF.
Below the papers, the notes from a number of presentations given on Mercury, at a variety of levels, are also available.
Multicore computing is ubiquitous, so programmers need to write parallel programs to take advantage of the full power of modern computer systems. However, the most popular parallel programming methods are difficult and extremely error-prone. Most such errors are intermittent, which means they may be unnoticed until after a product has been shipped; they are also often very difficult to fix. This problem has been addressed by pure declarative languages that support explicit parallelism. However, this does nothing about another problem: it is often difficult for developers to find tasks that are worth parallelising. When they can be found, it is often too easy to create too much parallelism, such that the overheads of parallel execution overwhelm the benefits gained from the parallelism. Also, when parallel tasks depend on other parallel tasks, the dependencies may restrict the amount of parallelism available. This makes it even harder for programmers to estimate the benefit of parallel execution.
In this dissertation we describe our profile feedback directed automatic parallelisation system, which aims at solving this problem. We implemented this system for Mercury, a pure declarative logic programming language. We use information gathered from a profile collected from a sequential execution of a program to inform the compiler about how that program can be parallelised. Ours is, as far as we know, the first automatic parallelisation system that can estimate the parallelism available among any number of parallel tasks with any number of (non-cyclic) dependencies. This novel estimation algorithm is supplemented by an efficient exploration of the program’s call graph, an analysis that calculates the cost of recursive calls (as this is not provided by the profiler), and an efficient search for the best parallelisation of N computations from among the 2N−1 candidates.
We found that in some cases where our system parallelised a loop, spawning off virtually all of its iterations, the resulting programs exhibited excessive memory usage and poor performance. We therefore designed and implemented a novel program transformation that fixes this problem. Our transformation allows programs to gain large improvements in performance and in several cases, almost perfect linear speedups. The transformation also allows recursive calls within the parallelised code to take advantage of tail recursion.
Also presented in this dissertation are many changes that improve the performance of Mercury’s parallel runtime system, as well as a proposal and partial implementation of a visualisation tool that assists developers with parallelising their programs, and helps researchers develop automatic parallelisation tools and improve the performance of the runtime system.
Overall, we have attacked and solved a number of issues that are critical to making automatic parallelism a realistic option for developers.
@PhdThesis{bone12:autoparallelism_thesis,
author = {Paul Bone},
title = {Automatic Parallelisation for Mercury},
school = {Department of Computing and Information Systems,
The University of Melbourne},
year = {2012},
address = {Australia},
month = {Decemeber}
}
Recently we built a system that uses profiling data to automatically parallelize Mercury programs by finding conjunctions with expensive conjuncts that can run in parallel with minimal synchronization delays. This worked very well in many cases, but in cases of tail recursion, we got much lower speedups than we expected, due to excessive memory usage. In this paper, we present a novel program transformation that eliminates this problem, and also allows recursive calls inside parallel conjunctions to take advantage of tail recursion optimization. Our benchmark results show that our new transformation greatly increases the speedups we can get from parallel Mercury programs; in one case, it changes no speedup into almost perfect speedup on four cores.
@InProceedings{bone12:_contr_loops_paral_mercur_code,
author = {Paul Bone and Zoltan Somogyi and Peter Schachte},
title = {Controlling Loops in Parallel Mercury Code},
booktitle = {Declarative Aspects and Applications of Multicore Programming},
year = 2012,
address = {Philadelphia, PA, USA},
month = {January}
}
The behavior of parallel programs is even harder to understand than the behavior of sequential programs. Parallel programs may suffer from any of the performance problems affecting sequential programs, as well as from several problems unique to parallel systems. Many of these problems are quite hard (or even practically impossible) to diagnose without help from specialized tools. We present a proposal for a tool for profiling the parallel execution of Mercury programs, a proposal whose implementation we have already started. This tool is an adaptation and extension of the ThreadScope profiler that was first built to help programmers visualize the execution of parallel Haskell programs.
@inprocedings{bone-somogyi_2011_threadscope,
author = {Paul Bone and Zoltan Somogyi},
title = {Profiling parallel {M}ercury programs with {T}hread{S}cope},
booktitle = {21st Workshop on Logic-based methods in Programming Environments (WLPE 2011)},
year = {2011},
month = {July},
editor = {Peter Schneider-Kamp},
pages = {32--46},
}
Researchers working on the automatic parallelization of programs have long known that too much parallelism can be even worse for performance than too little, because spawning a task to be run on another CPU incurs overheads. Autoparallelizing compilers have therefore long tried to use granularity analysis to ensure that they only spawn off computations whose cost will probably exceed the spawn-off cost by a comfortable margin. However, this is not enough to yield good results, because data dependencies may also limit the usefulness of running computations in parallel. If one computation blocks almost immediately and can resume only after another has completed its work, then the cost of parallelization again exceeds the benefit.
We present a set of algorithms for recognizing places in a program where it is worthwhile to execute two or more computations in parallel that pay attention to the second of these issues as well as the first. Our system uses profiling information to compute the times at which a procedure call consumes the values of its input arguments and the times at which it produces the values of its output arguments. Given two calls that may be executed in parallel, our system uses the times of production and consumption of the variables they share to determine how much their executions would overlap if they were run in parallel, and therefore whether executing them in parallel is a good idea or not.
We have implemented this technique for Mercury in the form of a tool that uses profiling data to generate recommendations about what to parallelize, for the Mercury compiler to apply on the next compilation of the program. We present preliminary results that show that this technique can yield useful parallelization speedups, while requiring nothing more from the programmer
@article{bone-somogyi-schachte_2011_overlap,
author = {Paul Bone and Zoltan Somogyi and Peter Schachte},
title = {Estimating the overlap between dependent computations
for automatic parallelization},
journal = {Theory and Practice of Logic Programming, 27th Int’l.
Conference on Logic Programming (ICLP’11) Special Issue},
year = {2011},
month = {July},
publisher = {Cambridge University Press},
volume = {11},
number = {(4--5)},
pages = {575--587}
}
Parallel implementations of programming languages need to control synchronization overheads. Synchronization is essential for ensuring the correctness of parallel code, yet it adds overheads that aren't present in sequential programs. This is an important problem for parallel logic programming systems, because almost every action in such programs requires accessing variables, and the traditional approach of adding synchronization code to all such accesses is so prohibitively expensive that a parallel version of the program may run more slowly on four processors than a sequential version would run on one processor. We present a program transformation for implementing dependent AND-parallelism in logic programming languages that uses mode information to add synchronization code only to the variable accesses that actually need it.
@inproceedings{wang-somogyi_2011_dep-par-conj,
author = {Peter Wang and Zoltan Somogyi},
title = {Minimizing the overheads of dependent {AND}-parallelism},
booktitle = {Technical Communications of the 27th Int’l. Conference on
Logic Programming (ICLP’11)},
pages = {128-138},
volume = {11},
year = {2011},
month = {July},
publisher = {Schloss Dagstuhl--Leibniz-Zentrum fuer Informatik},
series = {Leibniz International Proceedings in Informatics (LIPIcs)},
}
Parallel programming is becoming more and more important as commonly available machines begin to grow in their capability for parallel execution. Parallel streams of execution that operate on shared data to achieve a common task need to be carefully synchronised. The standard way of achieving this synchronisation is to use locks, but lock-based programming is difficult and error-prone, making parallel programs costly and unreliable.
Another method of providing synchronised access to shared data is the transactional memory model. Software Transactional Memory (STM), implementing this model purely in software and thus compatible with today's hardware, has been an area of much research in the last decade.
This report describes the implementation of a sophisticated STM system using recently developed techniques as well as novel innovations. This implementation is for the logic programming language Mercury, building on an earlier and simpler STM implementation.
Region-based memory management (RBMM) is a form of compile-time memory management, well-known from the functional programming world. In this thesis we describe our work of investigating and developing RBMM for the logic programming language Mercury. Mercury is designed with strong type, mode, and determinism systems. These systems not only provide Mercury programmers with several direct software engineering benefits, such as self-documenting code and clear program logic, but also give the language developers useful information for optimizations.
The first challenge in realizing RBMM for a programming language is to divide program data into regions such that the regions can be reclaimed as soon as possible. In the thesis we have developed program analyses that determine the distribution of data over regions as well as their lifetimes in a Mercury program. The program then is transformed to a region-annotated program containing the necessary region constructs. We provide the correctness proofs of the related program analyses and transformation, which guarantee the safeness of memory accesses in annotated programs.
We have implemented runtime support that tackles the special challenge posed by backtracking. Backtracking can require regions removed during forward execution to be ``resurrected'', and any memory allocated during a computation that has been backtracked over must be recovered promptly, without waiting for the regions involved to come to the end of their life.
We study the effects of RBMM for a selection of benchmark programs including well-known difficult cases for RBMM. Our RBMM-enabled Mercury system obtains clearly faster runtime for two third of the benchmarks compared to the Mercury system using the Boehm runtime garbage collector, with an average speedup of 20%. In terms of memory usage, the region system achieves optimal memory consumption in some programs. Our in-depth case study reveals the impact of sharing on memory reuse in programs in region-based systems.
Finally, we propose region reuse extension to our current region framework. Our RBMM system augmented with region reuse can automatically improve memory reuse for a popular class of programs that otherwise often require non-intuitive manual program-rewriting so that RBMM systems can obtain good memory reuse.
Dividing the heap memory of programs into regions is the starting point of region-based memory management. In our existing work of enabling region-based memory management for Mercury, a program analysis was used to distribute data over the regions. An important goal of the analysis is to decide which program variables should end up in the same region. For a popular class of programs, it covetously puts program variables in the same region, while more memory could have been reused if they had been kept in separate ones. In this paper we define a new refined region analysis that is keen to keep program variables in separate regions by taking into account the different execution paths of a procedure. With the more precise, path-sensitive analysis we can reduce the memory footprint for several programs.
Dividing the heap memory of programs into regions is the starting point of region-based memory management. In our existing work of enabling region-based memory management for Mercury, a program analysis was used to distribute data over the regions. An important goal of the analysis is to decide which program variables should end up in the same region. For a popular class of programs, it covetously puts program variables in the same region, while more memory could have been reused if they had been kept in separate ones. In this paper we define a new refined region analysis that is keen to keep program variables in separate regions by taking into account the different execution paths of a procedure. With the more precise, path-sensitive analysis we can reduce the memory footprint for several programs.
Applying region-based memory management (RBMM) to logic programming languages poses a special challenge: backtracking can require regions removed during forward execution to be ``resurrected'', and any memory allocated during a computation that has been backtracked over must be recovered promptly, without waiting for the regions involved to come to the end of their life. In this paper, we describe how we implemented runtime support for RBMM in the logic programming language Mercury, whose specialized implementation of the language constructs involved in backtracking required equally specialized support. Our benchmark Mercury programs run about 25% faster on average with RBMM than with the usual Boehm garbage collector, and for some programs, RBMM achieves optimal memory consumption.
Region-based memory management is a form of compile-time memory management, well-known from the functional programming world. This paper describes a static region analysis for the logic programming language Mercury. We use region points-to graphs to model the partitioning of the memory used by a program into separate regions. The algorithm starts with a region points-to analysis that determines the different regions in the program. We then compute the liveness of the regions by using an extended live variable analysis. Finally, a program transformation adds region annotations to the program for region support. These annotations generate data for a region simulator that generates reports on the memory behaviour of region-annotated programs. Our approach obtains good memory consumption for several benchmark programs; for some of them it achieves optimal memory management.
The rate at which computers are becoming faster at sequential execution has dropped significantly. Instead parallel processing power is increasing, and multicore computers are becoming more common. Automatically parallelising programs is becoming much more desirable. Parallelising programs written in imperative programming languages is difficult and often leads to unreliable software. Parallelising programs in declarative languages is easy, to the extent that compilers are able to do this automatically. However this often leads to cases where the overheads of parallel execution outweigh the speedup that might have been available by parallelising the program.
This thesis describes a new implicit parallelism implementation that calculates the speedup due to parallelism in dependent conjunctions for Mercury --- a purely declarative logic programming language. This is done by analysing profiling data and a representation of the program in order to determine when during the execution of a parallel conjunct variables are likely to be produced and consumed. This enables us to calculate how long a conjunct may have to wait for a variable to be produced, and how much parallelism is actually available in a parallel conjunction. This information should enable the compiler to parallelise a program only in places where doing so is profitable.
We expect that two of the components we implemented for our implicit parallelism analysis, coverage profiling and a generic feedback framework, will also be quite useful in other applications. For example this can help the compiler select the best calls to inline.
Packrat parsing is a newly popular technique for efficiently implementing recursive descent parsers. Packrat parsing avoids the potential exponential costs of recursive descent parsing with backtracking by ensuring that each production rule in the grammar is tested at most once against each position in the input stream. This paper argues that
We present experimental evidence to support these claims.
The benchmark data on which the performance evaluation section of this paper is based is available here (2.6M).
Concurrent programming is now becoming more important than ever. But for concurrent programs to work deterministically, sections of the code must be synchronised. The most common method of synchronising code is to protect the code with locks. However, code which uses locks is difficult to write and even more difficult to debug. Locking also makes it difficult to compose large programs from smaller ones. A relatively new method of synchronisation, known as Software Transactional Memory, is promising to be a much easier method of synchronisation. This thesis describes the design and implementation of a Software Transactional Memory system in Mercury.
Logic programming systems often need to deal with large but otherwise regular predicates, e.g. wide ground facts. Such predicates can be treated as any other predicate by the compiler, but there are good reasons to treat them specially, the most important being that separating the code from the data really pays off. We call the technique exo-compilation: it reduces the memory needed for the code to about one third of the normal WAM compilation schema without undue slowdown. As a bonus, queries with lots of void variables get a significantly better treatment.
We first introduce the idea of exo-compilation by an example and present its implementation in hProlog. We show how other optimisations can be built on top of it, and evaluate how it performs in practice. We then show that the same ideas can also be applied (albeit in a slightly different form) to Mercury, whose implementation is based on very different principles.
Mercury is a purely declarative programming language that is highly suited to parallel execution. We present two main contributions towards parallel execution of Mercury programs. The first is a source-to-source transformation that allows goals with dependencies to execute in parallel, with no negative impact on code which doesn't make use of it. The second is the design and implementation of a new parallel execution mechanism that allows parallel code to execute far more efficiently than in the existing Mercury implementation.
The logic programming language Mercury is designed to support programming in the large. Programmer declarations in conjunction with powerful compile-time analysis and optimization allow Mercury programs to be very efficient. The original design of Mercury did not support constraint logic programming (CLP). This paper describes the extensions we added to Mercury to support CLP. Unlike similarly motivated extensions to Prolog systems, our objectives included preserving the purity of Mercury programs as much as possible, as well as avoiding any impact on the efficiency of non-CLP predicates and functions.
While the idea of declarative debugging has been around for a quarter of a century, the technology still hasn't been adopted by working programmers, even by those working in declarative languages. The reason is that making declarative debuggers practical requires solutions to a whole host of problems. In this paper we address one of these problems, which is that retaining a complete record of every step of the execution of a program is infeasible unless the program's runtime is very short, yet this record forms the space searched by the declarative debugger. Most parts of this search space therefore have to be stored in an implicit form. Each time the search algorithm visits a previously unexplored region of the search space, it must decide how big a part of the search space to rematerialize (which it does by reexecuting a call in the program). If it materializes too much, the machine may start to thrash or even run out of memory and swap space. If it materializes too little, then materializing all the parts of the search space required by a debugging session will require too many reexecutions of (parts of) the program, which will take too long. We present a simple algorithm, the ideal depth strategy, for steering the ideal middle course: minimizing reexecutions while limiting memory consumption to what is feasible. We show that this algorithm performs well even when used on quite long running programs.
For any LP system, tabling can be quite handy in a variety of tasks, especially if it is efficiently implemented and fully integrated in the language. Implementing tabling in Mercury poses special challenges for several reasons. First, Mercury is both semantically and culturally quite different from Prolog. While decreeing that tabled predicates must not include cuts is acceptable in a Prolog system, it is not acceptable in Mercury, since if-then-elses and existential quantification have sound semantics for stratified programs and are used very frequently both by programmers and by the compiler. The Mercury implementation thus has no option but to handle interactions of tabling with Mercury's language features safely. Second, the Mercury implementation is vastly different from the WAM, and many of the differences (e.g. the absence of a trail) have significant impact on the implementation of tabling. In this paper, we describe how we adapted the copying approach to tabling to implement tabling in Mercury.
Debugging is the most unpredictable and potentially expensive phase of the software development life-cycle. Declarative debuggers ask the user questions about the correctness of subcomputations in their program. Based on the user's answers, subcomputations that cannot be the cause of the buggy behaviour are eliminated. Eventually one subcomputation is left which must be the cause of the buggy behaviour. Declarative debuggers thus keep track of which parts of the computation are still suspect, relieving the user of the burden of having to do so. They also direct the bug search, something that many users (especially novices) would find difficult to do manually. Even expert users often find it hard to explore large search spaces systematically, a limitation that does not apply to software systems. Declarative debuggers thus have the potential to make the debugging process easier and much more predictable.
Despite these expected benefits, declarative debugging is not yet widely used in practice to find real bugs. There are three main reasons for this:
The declarative nature of Mercury makes it relatively easy to implement a declarative debugger that can handle the full language. The version of the Mercury declarative debugger that was the starting point for this thesis already handled almost all of Mercury. By extending it to handle exceptions we made it handle the full language.
One problem posed by large search spaces is that they cannot be stored in memory all at once. This requires only portions of the search space to be stored in memory at any one time, materializing missing pieces when they are needed by reexecuting the program. We present the first algorithm for controlling this rematerialization process that is practical in the presence of multiple search strategies, minimising reexecutions while keeping memory consumption within acceptable limits.
Another problem with large search spaces is that previous search strategies either asked far too many questions, demanded too much in the way of CPU and/or memory resources or were too inflexible to coexist with other search strategies. For example, the divide-and-query search strategy is query-optimal in the worst case, however previous implementations of it often required more memory than is typically available. We have implemented heuristics which enable divide-and-query to be used on partially materialized search spaces, making it practical.
We also address the third problem, namely the problem of reducing the complexity of the debugger's queries. The new declarative debugger allows users to specify exactly which part of an atom is wrong. The subterm dependency tracking strategy exploits this extra information to jump directly to the part of the program that computed the wrong subterm. In many cases, only a few such jumps are required to arrive at the bug. Subterm dependency tracking can converge on the bug even more quickly than divide-and-query, and it tends to yield questions and question sequences that are easier for users to answer.
We also support a variety of other methods of making questions easier to answer. By trusting some predicates the user can automate answers to all questions about those predicates (implementing this capability, especially in the presence of higher order code, is trickier than it seems). We also support a novel technique that allows custom visualisations of terms to be easily created. If a call fails a precondition then neither `yes' or `no' is an appropriate answer to a question from the debugger about the validity of an answer computed for that call. Our debugger therefore allows users to answer `inadmissible' to such questions. If all else fails, users can also skip hard questions.
We give evidence that the new declarative debugger can be used on complex, real world programs by presenting several case studies of real bugs found in real programs with the aid of the debugger.
For any LP system, tabling can be quite handy in a variety of tasks, especially if it is efficiently implemented and fully integrated in the language. Implementing tabling in Mercury poses special challenges for several reasons. First, Mercury is both semantically and culturally quite different from Prolog. While decreeing that tabled predicates must not include cuts is acceptable in a Prolog system, it is not acceptable in Mercury, since if-then-elses and existential quantification have sound semantics for stratified programs and are used very frequently both by programmers and by the compiler. The Mercury implementation thus has no option but to handle interactions of tabling with Mercury's language features safely. Second, the Mercury implementation is vastly different from the WAM, and many of the differences (e.g. the absence of a trail) have significant impact on the implementation of tabling. In this paper, we describe how we adapted the copying approach to tabling to implement minimal model tabling in Mercury.
We have implemented a declarative debugger for Mercury that is capable of finding bugs in large, long-running programs. This debugger implements several search strategies. We discuss the implementation of two of these strategies and the conditions under which each strategy is useful.
The divide and query strategy tries to minimize the number of questions asked of the user. While divide and query can reduce the number of questions to roughly logarithmic in the size of the computation, implementing it presents practical difficulties for computations whose representations do not fit into memory. We discuss how we get around this problem, making divide and query practical.
Our declarative debugger allows users to specify exactly which part of an atom is wrong. The subterm dependency tracking strategy exploits this extra information to jump directly to the part of the program that computed the wrong subterm. In many cases, only a few such jumps are required to arrive at the bug. Subterm dependency tracking can converge on the bug even more quickly than divide and query, and it tends to yield question sequences that are easier for users to answer.
© ACM, (2005). This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the Sixth International Symposium on Automated and Analysis-driven Debugging.
Many programmers find debugging a frustrating and unproductive activity. Declarative debugging promises to alleviate this problem by automating some of the reasoning used in the debugging process. We have implemented a declarative debugger for Mercury. In the process, we found a problem not addressed in the existing literature on declarative debuggers, which considers programs to consist of clauses (conjunctions of literals): how to handle if-then-elses. The problem is that the condition of an if-then-else should be treated as a negated goal if and only if the condition fails. Negated contexts may cause a declarative debugger to switch from wrong answer analysis to missing answer analysis and vice versa. Since missing answer analysis explores a search space that is subtly different from the space explored for wrong answer analysis, the two kinds of analysis need different kinds of trees to represent the program execution. For the conditions of if-then-elses, the debugger does not know which kind of tree to build until the condition has either succeeded or failed, by which time it is too late. To solve this problem, we have designed a novel data structure, the annotated event trace, which is flexible enough to support both wrong and missing answer analysis. The advantages of this data structure are that it is quite compact, requiring little more space than an ordinary stored event trace, and that the algorithms to build this data structure and to derive from it the information required for two kinds of diagnosis are all simple as well as efficient.
One of the key advantages of modern programming languages is that they free the programmer from the burden of explicit memory management. Usually, this means that memory management is delegated to the runtime system by the use of a run-rime garbage collector (RTGC). Basically, a RTGC is a dedicated process that is run in parallel with the user program. Whenever the user program needs to store some data, the RTGC provides the desired memory space. At regular intervals, the RTGC reviews the uses of the allocated memory space, and recovers those memory cells that have become garbage, i.e. that cannot be accessed any more by the user program.
A complementary form of automatic memory management is compile-time garbage collection (CGTC), where the decisions form memory management are taken at compile time instead of at run-time. The compiler determines the lifetime of the variables that are created during the execution of the program, and thus also the memory that will be associated with these variables. Whenever the compiler can guarantee that a variable, or more precisely, parts of the memory resources that this variable points to at run-time, will never ever be accessed beyond a certain program instruction, then the compiler can add instructions to deallocate these resources at that particular instruction without compromising the correctness of the code. If the program instruction is followed by a series of instructions that require the allocation of new memory cells, then the compiler can replace the sequence of deallocation and allocation instructions, by instructions updating the garbage cells, thus reusing these cells.
We study the technique of compile-time garbage collection in the context of Mercury, a pure declarative language. A key element of declarative languages is that they disallow explicit memory updates (which are common operations in most other programming paradigms) but they rely instead on term construction and deconstruction to manipulate the program data. This places a high demand on the memory management and makes declarative languages a primary target for compile-time garbage collection. Moreover, the clear mathematical foundations of Mercury, being a pure declarative language, makes the development of the program analyses that are necessary for CTGC feasible.
In this thesis we define a number of semantics for the logic programming language Mercury and formally establish the equivalence between them; we use these semantics to formalise the different program analysis steps that are needed to implement a basic CTGC system for Mercury and prove their safeness. We extend this basic CGTC system such that it is able to correctly deal with programs organised into modules and we implement a complete CTGC system within the Melbourne Mercury Compiler. To the best of our knowledge, this is the first and only complete CTGC system that has ever been built for a programming language.
In this thesis we look at mode analysis of logic programs. Being based on the mathematical formalism of predicate logic, logic programs have no a priori notion of data flow—a single logic program may run in multiple modes where each mode describes, or prescribes, a pattern of data flow.
A mode system provides an abstract domain for describing the flow of data in logic programs, and an algorithm for analysing programs to infer the modes of a program or to check the correctness of mode declarations given by the programmer. Such an analysis can provide much useful information to the compiler for optimising the program. In a prescriptive mode system, mode analysis is also an important part of the semantic analysis phase of compilation (much like type analysis) and can inform the programmer of many errors or potential errors in the program at compile time. We therefore believe it is an essential component of any industrial strength logic programming system.
Our aim is to develop a strong and prescriptive mode system that is both as precise and expressive as possible. We believe this requires a strongly typed and purely declarative language and so we focus on the language Mercury.
The first contribution of our work is to give a detailed description of Mercury's existing mode system, which is based on abstract interpretation. Although most of this system has been around for several years, this is the first time it has been described in this level of detail. This is also the first time the relationship of the mode system to the formalism of abstract interpretation has been made clear.
Following that, we look at ways of extending the mode system to provide further precision and expressiveness, and to overcome some of the limitations of the current system.
The first of these extensions is to support a form of constrained parametric polymorphism for modes. This is analogous to constrained parametric polymorphic type systems such as type classes, and adds a somewhat similar degree of expressiveness to the mode system.
Next we look at a method for increasing the precision of the mode analysis by keeping track of aliases between variables. The increased precision we gain from this allows an increase in expressiveness by allowing the use of partially instantiated data structures and more complex uniqueness annotations on modes.
The final area we look at is an alternative approach to mode analysis using Boolean constraints. This allows us to design a mode system that can capture complex mode constraints between variables and more clearly separates the various tasks required for mode analysis. We believe that this constraint-based system provides a good platform for further extension of the Mercury mode system.
The work we describe has all been implemented in the Melbourne Mercury compiler, although only constrained parametric polymorphism has so far become part of an official compiler release.
Debuggers for logic programming languages have traditionally had a capability most other debuggers did not: the ability to jump back to a previous state of the program, effectively travelling back in time in the history of the computation. This ``retry'' capability is very useful, allowing programmers to examine in detail a part of the computation that they previously stepped over. Unfortunately, it also creates a problem: while the debugger may be able to restore the previous values of variables, it cannot restore the part of the program's state that is affected by I/O operations. If the part of the computation being jumped back over performs I/O, then the program will perform these I/O operations twice, which will result in unwanted effects ranging from the benign (e.g. output appearing twice) to the fatal (e.g. trying to close an already closed file).
We present a simple mechanism for ensuring that every I/O action called for by the program is executed at most once, even if the programmer asks the debugger to travel back in time from after the action to before the action. The overhead of this mechanism is low enough and can be controlled well enough to make it practical to use it to debug computations that do significant amounts of I/O.
This report presents a new termination analysis for Mercury that approximates interargument size relationships using convex constraints. These relationships are derived using an analysis based upon abstract interpretation. Although this analysis is more expensive than that of the existing termination analyser, it is able to prove the termination of a larger class of predicates. We consider how this analysis interacts with aspects of Mercury such as the module system, declarative I/O, higher-order code and the foreign language interface.
This thesis presents the foundations for extending the implementation of the declarative logic programming language Mercury to support the parallel execution of programs.
The new material in this thesis is in three parts. The first part presents an extension to the existing (sequential) execution model for Mercury that allows programs to be executed in parallel. Programmers can exploit this extension simply by replacing some sequential conjunction operators connecting independent goals with a new parallel conjunction operator. Such changes do not change the declarative semantics of the program, but can improve performance.
The second part of the new material presents a model for explicit threading in Mercury, which is useful when the programmer's goal is not efficiency but the explicit representation of concurrent tasks and control of their interactions. We show how our extended execution model supports the usual operations on threads. We also present a new interpretation for Mercury's system for input and output that provides an intuitive understanding of the interactions between the I/O operations executed by the different threads of a program or by different programs.
The final part of the new material presented in this thesis presents a new technique for obtaining a detailed and accurate picture of the performance of a program. The basis of our technique is associating a complete context with each measurement, rather than approximating the context as conventional profilers do. In order to make our new profiling system feasible we have had to develop sophisticated techniques for reducing the cost of recording complete contexts; in order to make its output tractable, we also had to develop techniques for dealing with interactions between higher order constructs and recursion. We have also developed a tool for helping programmers (and eventually compilers) to digest the large volume of data generated by our profiling technique.
All the ideas presented in this thesis have been implemented in the Melbourne Mercury compiler.
The aim of this thesis is the design of a type system for an industrial strength logic programming language. The type system we describe has been implemented for the Mercury programming language, in the Melbourne Mercury compiler.
We begin by presenting a simple higher order extension of the Mycroft-O'Keefe type system. We then use this type system as a basis for two significant extensions.
The first extension is the adoption of a type class system similar to that found in some modern functional languages in the context of higher order logic programming. We give a set of typing rules which both provide a formal definition of type correctness and define the source-to-source transformation we have used to implement type classes. This transformation is simple and effective, and can be easily shown to preserve Mercury's mode, determinism and uniqueness correctness properties.
The second extension is to allow existentially quantified type variables in the types of function symbols and of predicates. This represents the most significant contribution of this thesis. We then formally define the type system that results from the combination of both type classes and existential quantification of type variables. The two type system extensions are quite orthogonal. As a result, the definition of type correctness in the combined system is a fairly simple combination of the definitions for the individual systems. However, the mechanisms contributed by the two systems combine synergistically; the resulting type system is extremely expressive.
We then show how the type system we have designed allows the programmer to take advantage of many object oriented design techniques. We give an in depth analysis of object oriented design and isolate the mechanisms which are likely to result in reusable and maintainable software systems. We show that our type system allows the programmer to directly express these kinds of designs and discourages the use of the kinds of object oriented designs which reduce maintainability by introducing unnecessary implementation dependencies.
We show that these principles apply in a direct and simple manner to the modelling of component interfaces as supported by modern component frameworks such as CORBA. We present MCORBA, an implementation of a CORBA binding for Mercury. This interface is bi-directional, allowing the programmer to write CORBA components in Mercury and to use existing CORBA components from within Mercury programs.
Recent logic programming languages, such as Mercury and HAL, require type, mode and determinism declarations for predicates. This information allows the generation of efficient target code and the detection of many errors at compile-time. Unfortunately, mode checking in such languages is difficult. One of the main reasons is that, for each predicate mode declaration, the compiler is required to decide which parts of the procedure bind which variables, and how conjuncts in the predicate definition should be re-ordered to enforce this behaviour. Current mode checking systems limit the possible modes that may be used because they do not keep track of aliasing information, and have only a limited ability to infer modes, since inference does not perform reordering. In this paper we develop a mode inference system for Mercury based on mapping each predicate to a system of Boolean constraints that describe where its variables can be produced. This allows us to handle programs that are not supported by the existing system.
The value of a variable is often given by a field of a heap cell, and frequently the program will pick up the values of several variables from different fields of the same heap cell. By keeping some of these variables out of the stack frame, and accessing them in their original locations on the heap instead, we can reduce the number of loads from and stores to the stack at the cost of introducing a smaller number of loads from the heap. We present an algorithm that finds the optimal set of variables to access via a heap cell instead of a stack slot, and transforms the code of the program accordingly. We have implemented this optimization in the Mercury compiler, and our measurements show that it can reduce program runtimes by up to 12% while at the same time reducing program size. The optimization is straightforward to apply to Mercury and to other languages with immutable data structures; its adaptation to languages with destructive assignment would require the compiler to perform mutability analysis.
Previous attempts at garbage collection in uncooperative environments have generally used conservative or mostly-conservative approaches. We describe a technique for doing fully type-accurate garbage collection in an uncooperative environment, using a ``shadow stack'' to link structs of pointer-containing variables, together with the data or code needed to trace them. We have implemented this in the Mercury compiler, which generates C code, and present preliminary performance data on the overheads of this technique. We also show how this technique can be extended to handle multithreaded applications.
The .NET Common Language Runtime (CLR) offers a new opportunity to experiment with multi-language interoperation, and provides a relatively rare chance to explore deep interoperation of a wide range of programming language paradigms. This article describes how Mercury is compiled to the CLR. We describe the problems we have encountered with generating code for the CLR, give some preliminary benchmark results, and suggest some possible improvements to the CLR regarding separate compilation, verifiability, tail calls, and efficiency.
Many logic programming implementations compile to C, but they compile to very low-level C, and thus discard many of the advantages of compiling to a high-level language. We describe an alternative approach to compiling logic programs to C, based on continuation passing, that we have used in a new back-end for the Mercury compiler. The new approach compiles to much higher-level C code, which means the compiler back-end and run-time system can be considerably simpler.
We present a formal schema for the transformation, and give benchmark results which show that this approach delivers performance that is more than competitive with the fastest previous implementation, with greater simplicity and better portability and interoperability.
The approach we describe can also be used for compiling to other target languages, such as IL (the Microsoft .NET intermediate language).
The benchmark data on which the performance evaluation section of this paper is based is available here (9.7M).
Declarative programs differ from imperative programs in several respects, the main ones being their heavy use of recursion, of various forms of polymorphism, and of higher order. Existing profilers tend not to produce very useful information in the presence of these constructs. We present a new profiling technique we call deep profiling that yields detailed and useful information about programs even in the presence of these constructs, information that is significantly richer than the output of other profilers.
The heart of deep profiling is a source-to-source transformation. We have implemented this transformation and its associated infrastructure in the compiler for Mercury, a purely declarative logic programming language. While our measurements of this implementation show that deep profiling has slightly greater overhead than some other profiling techniques, the wealth of information it provides makes this extra overhead worthwhile. The deep profiling algorithms themselves are applicable to most other language styles, including imperative, object-oriented, and functional languages.
Compile-time garbage collection (CTGC) is still a very uncommon feature within compilers. In previous work we have developed a compile-time structure reuse system for Mercury, a logic programming language. This system indicates which datastructures can safely be reused at run-time. As preliminary experiments were promising, we have continued this work and have now a working and well performing near-to-ship CTGC-system built into the Melbourne Mercury Compiler (MMC).
In this paper we present the multiple design decisions leading to this system, we report the results of using CTGC for a set of benchmarks, including a real-world program, and finally we discuss further possible improvements. Benchmarks show substantial memory savings and a noticeable reduction in execution time.
In this paper we present a binding-time analysis for the logic programming language Mercury. Binding-time analysis is a key analysis needed to perform off-line program specialisation. Our analysis deals with the higher-order aspects of Mercury, and is formulated by means of constraint normalisation. This allows (at least part of) the analysis to be performed on a modular basis.
In previous work Bruynooghe, Janssens and Kågedal developed a live-structure analysis for Mercury which detects memory cells available for reuse. Separate compilation of modules is an essential ingredient of a language such as Mercury which supports programming in the large. Hence, to be practical, a live-structure analysis also has to be module based. This paper develops a modular live-structure analysis and extends it with a modular reuse analysis. It also describes preliminary results obtained with a first prototype of the module based analysis.
We present two optimizations for making Mercury programs tail recursive. Both operate by taking computations that occur after a recursive call and moving them before the recursive call, modifying them as necessary. The first optimization moves calls to associative predicates; it is a pure source to source transformation. The second optimization moves construction unifications; it required extensions to the mode system (to record aliases) and to the parameter passing convention (to allow arguments to be returned in memory). The two optimizations are designed to work together, and can make a large class of programs tail recursive.
The raw data on which the evaluation is based is available as a 5 Kb tar file.
Recursive predicates frequently generate some state which is updated after the recursive call. We present a source to source transformation which can move the state update before the recursive call, thus helping to make the predicate tail recursive, and report on its implementation in the Mercury compiler.
The logic/functional language Mercury allows the programmer to annotate predicates and functions to mark impure predicates, allowing impure code to be safely integrated into a declarative language.
By using purity declarations with the foreign language interface, programmers can take advantage of many of the features of a high level programming language while writing imperative code to interface with existing imperative libraries.
This paper outlines the purity system in Mercury and how it affects operational semantics, compares this purity system with other approaches to declaring impurity in a pure language, and gives an extended example of how impurity and foreign language interfaces can work together to simplify the chore of writing declarative interfaces to libraries.
While Mercury allows destructive input/unique output modes which direct the compiler to reuse memory, use of these modes can be cumbersome for the programmer. In most situations, it would be nicer if the programmer didn't have to worry about the details of memory management.
The paper briefly reports on some experiments with a prototype analyser which aims at detecting memory available for reuse. The prototype is based on the live-structure analysis developed by us for logic programs extended with declarations.
Yet the major contribution of this paper consists of the development of the principles of a module based analysis which are essential for the analysis of large Mercury programs with code distributed over many modules.
In this paper, we describe a binding-time analysis (BTA) for a statically typed and strongly moded pure logic programming language, in casu Mercury. Binding-time analysis is the key concept in achieving off-line program specialisation: the analysis starts from a description of the program's input available for specialisation, and propagates this information throughout the program, deriving directives for when and how to perform specialisation. Exploiting the possibilities offered by Mercury's strong type and mode system, we present a completely automatic BTA dealing with partially static binding-times.
Every programming language needs a debugger. Mercury now has three debuggers: a simple procedural debugger similar to the tracing systems of Prolog implementations, a prototype declarative debugger, and a debugger based on the idea of automatic trace analysis. In this paper, we present the shared infrastructure that underlies the three debuggers, and describe the implementation of the procedural debugger. We give our reasons for each of our main design decisions, and show how several of these decisions are rooted in our experience with the debugging of large programs working with large data structures.
This paper has been superceded by the LNCS version.
The logic/functional language Mercury uses a strong, mostly static type system based on polymorphic many-sorted logic. For efficiency, the Mercury compiler uses type specific representations of terms, and implements polymorphic operations such as unifications via generic code invoked with descriptions of the actual types of the operands. These descriptions, which consist of automatically generated data and code, are the main components of the Mercury runtime type information (RTTI) system. We have used this system to implement several extensions of the Mercury system, including an escape mechanism from static type checking, generic input and output facilities, a debugger, and automatic memoization, and we are in the process of using it for an accurate, native garbage collector. We give detailed information on the implementation and uses of the Mercury RTTI system as well as measurements of the space costs of the system.
The raw data on which the evaluation is based is available as a 70 Kb tar file
This paper describes the implementation of several of the high-level optimization passes of the Mercury compiler, including deforestation, type specialization, constraint propagation and structure reuse.
MCORBA is a binding to the CORBA distributed object framework for the purely declarative logic/functional language Mercury. The binding preserves the referential transparency of the language, and has several advantages over similar bindings for other strongly typed declarative languages. As far as we know, it is the first such binding to be bidirectional; it allows a Mercury program both to operate upon CORBA components and to provide services to other CORBA components. Whereas the Haskell binding for COM maps COM interfaces onto Haskell types, MCORBA maps CORBA interfaces onto Mercury type classes. Our approach simplifies the mapping, makes the implementation of CORBA's interface inheritance straightforward, and makes it trivial for programmers to provide several different implementations of the same interface. It uses existential types to model the operation of asking CORBA for an object that satisfies a given interface but whose representation is unknown.
In this paper, we explain how we have extended Mercury's type system to include support for type classes. We give a formal semantics for this extension to our type system, adapting the typing rules used in functional languages to the differing demands of logic programming languages. We show that type classes integrate very nicely with Mercury's mode, uniqueness and determinism systems, and describe how our implementation works.
This paper presents the algorithms of the Mercury termination analyser, discusses how real-world aspects of the language such as modules, higher-order features, foreign language code, and declarative input/output can be handled, and evaluates the performance of the analyser both on a set of standard test programs and on the Mercury compiler itself.
The raw data on which the evaluation is based is available as a 5.2 Mb tar file.
This paper contains a brief overview of the Mercury language, and a reasonably detailed overview of the implementation technology used in the Mercury compiler. It describes the abstract machine that the compiler generates code for. (Our other papers listed below go into more detail on exactly how the code is generated, and on how the abstract machine instructions are implemented as C or GNU C code.)
The raw data on which the evaluation is based is available on our benchmarking page.
This paper discusses Mercury's determinism system in detail, including the algorithms for switch detection, deep indexing, determinism inference, and determinism checking.
This paper describes the structure of the Mercury compiler, its calling conventions, and the algorithms it uses in generating code. These algorithms include lazy code generation and a novel way of handling nested disjunctions.
This paper discusses the merits of using C, and in particular GNU C, as an intermediate target language for the compilation of logic programs, and describes the approach we have taken in the implementation of Mercury.
An overview paper.
The first paper on Mercury. It is superseded by the paper in the Journal of Logic Programming.
This is the first paper on the code generator. Warning: several aspects of the code generator have changed since this paper was written. Some of these are documented in the version in the ILPS 95 proceedings.
This paper covers a case study where Mercury was used as an implementation language.
In this paper we introduce ODASE (Ontology Driven Architecture for Software Engineering). We present how we used ODASE to build a 250 person month e-insurance project for a multi-national insurance firm, where only 35% of the requirements were known at kick- off. We required one third of the time of the next closest quote for the project, and a similar project built classically at another insurance firm required also around three times the resources. [The ODASE platform uses Mercury as its implementation language.]
The mode system and the uniqueness system of Mercury are based on the following papers:
The same kind of approach was also once used in the Mercury interface to the Aditi deductive database system, although both of these no longer exist.
All these papers and presentations are for A4 paper. Unfortunately, we cannot generate US paper size versions.