Vector addition systems are a model of computation. For this model, coverability is one of the easiest decision problems. This post presents an old (1978) upper bound, with accompanying code.
29 July 2015
31 October 2014
Dynamic Dispatch
In which I explain how to solve puzzle questions involving dynamic dispatch.
28 October 2014
Why Do We Fail?
What is there to do if a problem refuses to be knocked down? There's a generic problem solving strategy for that.
26 August 2014
Watching Horn
How to use watched literals (or, rather, vertices) to do breadth-first search in Horn hyperdigraphs.
24 July 2014
Depth First Search in Python
Everybody thinks they understand depth-first search. The algorithm is indeed simple. But — I think — it is deceptively simple. In this post I play with some code.
11 December 2013
Edit Distance, Benchmarks
You could compute the edit distance of two strings by memoization or by dynamic programming. The latter should be more efficient, at the expense of clarity. But, how much more efficient? This post shows some (micro) benchmarks.
05 April 2012
Fixed-Point Magic: Successive Recursive Calls
A brief summary of the Y-combinator trick, and an unusual use.
18 August 2011
Putting Formulas in Normal Form
What would be a beautiful description for conversion to DNF?
The problem of converting a formula to DNF is easy to explain: Given a boolean formula, write it as a disjunction of conjunctions. For example, $(x \lor y) \land z$ may be rewritten, by distributivity, as $(x \land z) \lor (y \land z)$. The mechanism is also easy to describe: Push $\lnot$ down by de Morgan, then pull $\lor$ above $\land$ by distributivity.
How to turn this description into an algorithm?
I know a nice algorithm for the first half. Expressed in OCaml, it goes
like this: We want a function with the type f1->f2, where
type f1 = [`Var of string | `Not of f1 | `And of f1 list | `Or of f1 list] type f2 = [`Pos of string | `Neg of string | `And of f2 list | `Or of f2 list]
The trick is to keep around a parity.
let rec pnd (p : bool) : f1 -> f2 = function | `Var x -> if p then `Pos x else `Neg x | `Not f -> pnd (not p) f | `And fs -> let fs = List.map (pnd p) fs in if p then `And fs else `Or fs | `Or fs -> let fs = List.map (pnd p) fs in if p then `Or fs else `And fs
This algorithm I use when I do boolean manipulations on paper. So I like it. The code is nice, rather than beautiful, because it has some repetition.
What would be a similarly nice solution for the second half—applying
distributivity? Expressed in OCaml, we want a function with the type
f2->dnf, where
type dnf = [`Pos of string | `Neg of string] list list
Of course, I'm interested in solutions in any language.
16 August 2011
Loops and Side-effects
TeX has a better looping construct.
I designed a small Java-like language for some experiments. I think it is a good idea to be able to say "this piece of code has no side-effects." (If you saw how I write C/++, you'd probably be surprised by what I just said.) There are various ways of isolating side-effects. My requirements were that (1) it must look much like Java and (2) there should be some way to contain side-effects. So I simply banned side-effects from expressions.
Then I proceeded to translate the following Java code.
while (i.hasNext()) {
Object o = i.next();
// bla
}
Method calls, however, possibly have side-effects so I could not use them in expressions. A method call is a statement in my toy language.
while (true) {
boolean c = i.hasNext();
if (!c) break;
Object o = i.next();
// bla
}
This loop reminded me of \loop ... \if ... \repeat in $\TeX$. The keywords loop and repeat, however, are rather un-Java-like-esque. So I decided to say do {...} while (...) {...}. This way, the normal do-loop and the while-loop are special cases obtained by omitting one of the bodies.
do boolean c = i.hasNext()
while (c) {
Object o = i.next();
// bla
}
Note that in Java (and C/++) one has the illusion that loops test their condition at the beginning (while) or at the end (do) but never in the middle. This is not true, because expressions may have side-effects.
If languages would have do-while-loops, then I'd probably break far less often.
I decided to write this post after seeing another post about how to write a similar kind of loop in OCaml.
01 April 2011
Some Random Code Metrics
How compressible is source code? What is the average size of a file? The answers to these and other useless questions are born from too much caffeine.
The projects:
- Ant: A build system used by many Java projects, which is itself written in Java.
- CPython: The reference Python implementation, written in C.
- Frama-C: Analyzers of C programs, written themselves mostly in OCaml.
- FreeBoogie: A project of mine written in Java.
- GHC: The most used Haskell compiler, written in Haskell.
- jEdit: A nice source code editor, written in Java.
- Linux: An operating system macro-kernel, written in C.
- OCaml: Compiler, standard library, and related tools for the language OCaml, written in OCaml.
- SGB: A library for handling graphs and a few example algorithms written by Knuth in CWEB.
Methodology. Get the repo, spend $\le 1$ min choosing a subset of files that look like "the source", run a few quick commands in the shell, don't check the results. That's so you know how much you can trust what follows. Nevertheless, it's likely I got the orders of magnitude right.
Project size contest. The most common measure for a project's size is the number of lines of code. This comes with plenty of caveats. In any case, if we define "lines of code" to be "number of '\n' characters in the files that Radu happened to choose as the 'source'" then here are the results:
- Linux: 9.1 million
- GHC: 460 thousand
- CPython: 450 thousand
- OCaml: 260 thousand
- Ant: 200 thousand
- jEdit: 160 thousand
- Frama-C: 126 thousand
- SGB: 19 thousand
- FreeBoogie: 8 thousand
There's an alternative measure that I like much better and that is not used much: the compressed size of the source code. Why do I like it? Because it should make a good proxy for the information content in the source code. For example, it doesn't matter much if coders use space indentation or tab indentation, long lines or short line, etc. There are plenty of caveats here too. For example, it is likely that the true information content (say, Kolmogorov complexity) is much lower, and that would be apparent if compressers would exploit the structure of the language in which the code is written.
Anyway, here is the bzip2 size of the projects.
- Linux: 45 MB
- GHC: 4.3 MB
- CPython: 2.0 MB
- OCaml: 1.2 MB
- Ant: 930 kB
- Frama-C: 735 kB
- jEdit: 690 kB
- SGB: 190 kB
- FreeBoogie: 50 kB
File size contest. All these projects are broken up into files, which roughly correspond to modules or abstraction boundaries. The idea is that you should be able to focus on the internals of one file at a time without needing to know too much about the other files. And that is true of the compiler too, not only of you. Or, at least, that's one way to look at it.
So, lines per file contest:
- CPython: 860
- Linux: 680
- SGB: 580
- OCaml: 330
- jEdit: 330
- Frama-C. 300
- GHC: 280
- Ant: 260
- FreeBoogie: 140
And compressed bytes per file contest:
- SGB: 5900 B
- CPython: 3700 B
- Linux: 3300 B
- GHC: 2600 B
- Frama-C: 1700 B
- OCaml: 1500 B
- jEdit: 1400 B
- Ant: 1200 B
- FreeBoogie: 780 B
Information density. So, which project should I read if I want to get most per byte? And which one can be read on the bus without missing much? Well, here's information per character (where information is "measured" as earlier: compressed size, so these are basically inverses of compression ratios).
- SGB: 0.25
- GHC: 0.21
- FreeBoogie: 0.19
- Linux: 0.18
- Frama-C: 0.16
- jEdit: 0.16
- CPython: 0.15
- OCaml: 0.14
- Ant: 0.14
Looks like GHC is almost as incompressible as the code Knuth writes.
03 December 2010
More OCaml Pretty-Printing
How to print expressions with binary operators using as few parentheses as possible. In this post, we avoid parentheses that are rendered unnecessary by precedence rules.
Consider the following type.
type op = Plus | Times type expr = Op of op * expr list | I of int
And here is a value:
let v = Op (Plus, [I 3; Op (Times, [I 5; I 7; Op (Plus, [I 4; I 2])]); I 2])
If we don't care about how many parentheses are printed, we'd write
open Format let op2s = function Plus -> '+' | Times -> '*' let rec pp_list sep pp ppf = function | [] -> () | [x] -> fprintf ppf "%a" pp x | x::xs -> fprintf ppf "%a%c%a" pp x sep (pp_list sep pp) xs let rec pp ppf = function | Op (o, es) -> fprintf ppf "@[(%a)@]@," (pp_list (op2s o) pp) es | I x -> fprintf ppf "%d@," x let _ = pp std_formatter v
The rule for precedence is simple: It's safe to strip parentheses in
$(A \circ B) \bullet C$ if and only if $\circ$ has higher precedence than
$\bullet$. It would be nice to isolate the code that decides whether
parentheses are needed. We begin by removing grouping concerns from
pp.
let pp r ppf = function | Op (o, es) -> pp_list (op2s o) r ppf es | I x -> fprintf ppf "%d" x
We can recover parentheses everywhere with a simple Y-like combinator.
let rec y1 f ppf x = fprintf ppf "@[(%a)@]@," (f (y1 f)) x let pp1 = y1 pp let _ = pp1 std_formatter v
Now pp1 does (almost) the
same as the initial pp,
the big difference being that we isolated the code that handles grouping.
Because of this, implementing a policy based on operator precedences is
simply a matter of defining another combinator.
let precedence = function
| Op (Plus, _::_::_) -> 1
| Op (Times, _::_::_) -> 2
| _ -> 0
let rec y2 up f ppf x =
let down = precedence x in
let lp, rp =
if down <= up && up <> 0 && down <> 0 then ("(", ")") else ("","") in
fprintf ppf "@[%s%a%s@]@," lp (f (y2 down f)) x rp
let pp2 = y2 0 pp
let _ = pp2 std_formatter v
15 September 2010
A Certain Type of Pretty-Printing in OCAML
You want to print
Node(Leaf, [{ra=1;rb="a"};{ra=2;rb="b"}], Node(Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf))
as "1 * a * 2 * b * 3 * foo * bar". How do you do it?
Let us begin with the simpler problem of printing a star-separated list of integers.
open Format let simple_print = function -> | [] -> () | [x] -> printf "%d" x | x :: xs -> printf "%d@ * "; simple_print xs
Since there is one less star than there are integers, the last integer in the list is treated in a special way. Of course, we may want to also print lists of strings in this way; or we may want to use some other separator than a star; or we may want to use a different formatter.
let better_print separator pp ppf = function -> | [] -> () | [x] -> fprintf ppf "%a" pp x | x::xs -> fprintf ppf "%a@ %s " pp x separator; better_print separator pp ppf xs let pp_string ppf x = fprintf ppf "%s" x let pp_int ppf x = fprintf ppf "%d" x better_print "||" pp_string std_formatter ["foo"; "bar"]
Now comes the interesting part. Suppose that what we really want to print
are the primitive values that hang in a complicated tree of nested data
structures like lists, variants, records, sets, and so on. For example, the
value mentioned in the summary has type t.
type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t
let v =
Node
(Leaf,
[{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
Node
(Leaf,
[{ra=3;rb=["foo";"bar"]}],
Leaf))
We'll probably have a pretty-printing function pp_r for type r
and a pretty-printing function pp_t for type
t. Let's think about the latter. For Node(left,data,right) we'd be tempted to say
"print left, print star, print data, print star, print right." Of course,
left might be a leaf, in which case we
should omit the first "print star." Or perhaps, the list data is empty, in which case we should omit
one of the two "print star"s. In a more complicated scenario data might be a list of sets and, although the
list is not empty, all of its sets may be. In order to figure whether this is
the case we'd need to traverse the whole list. Or perhaps we have a record
with $n$ fields. After printing the first $k$ fields we should print a star
when one of the first $k$ fields is nonempty and one of the last $n-k$ fields
is nonempty. So we'd better start by asking once in the beginning each
field if it is empty: Otherwise we may end up asking each field $\sim n$
times and we know that checking emptiness may take linear time. (It would be
pretty stupid for pretty-printing of a big structure with few leafs to take
quadratic time.) Or perhaps, …
OK. That's a nightmare, so let's try to take a step back. We want:
- a way to traverse the data structure,
- a way to print each possible leaf, and
- a way to know when we get to a leaf whether it is the first (or last).
That's easy! It sounds like what we want is a fold. Well, almost, because when we hit a string leaf we need to call one function and when we hit an integer leaf we need to call another. So it is a specialized fold but, nevertheless, our implementation may look very similar to what we'd do for a fold.
let pp_string' ppf first x =
if not first then fprintf ppf "@ * "; pp_string ppf x; false
let pp_int' ppf first x =
if not first then fprintf ppf "@ * "; pp_int ppf x; false
let pp_r ppf first {ra=ra; rb=rb} =
let first = pp_int' ppf first ra in
List.fold_left pp_string' first rb
let pp_t ppf first = function
| Leaf -> first
| Node (left, data, right) ->
let first = pp_t ppf first left in
let first = List.fold_left (pp_r ppf) first data in
pp_t ppf first right
let pp_r' ppf = pp_r ppf false
let pp_t' ppf = pp_t ppf false
This works and is already much nicer than the previous option. However, There are still problems.
- The functions
pp_string'andpp_int'are obtained from their unprimed counterparts in a very systematic way, so we shouldn't write the same code once for each leaf type. - Similarly for
pp_r'andpp_t'. - Note that for lists we used
List.fold_leftso for the typetwe should be able to do something similar: Implement a generic fold, and then use it. - The star is hardcoded. What if we want a different separator?
So let's see how can we encode the recipe for obtaining
pp_string' and
pp_int'. We may be tempted to try the
following.
let pp_sep separator pp = fun ppf first x -> if not first then fprintf ppf "@ %s " separator; false
Yay: We also solved the problem with the star being hard-coded! What
we can do is to add a (first) parameter pp to
pp_r and
pp_t. Then we redefine the primed
versions.
let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false
We then replace calls to pp_int'
by pp pp_int. Everything's sweet, right?
Well, … no, this doesn't typecheck. The reason is in
OCAML's FAQ.
You may want to try these modifications to convince yourself that they
indeed do not work. Anyway, the workaround is in OCAML's FAQ. So here's
how to do it.
open Format
(* example data structures *)
type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t
(* a value to test with *)
let v =
Node
(Leaf,
[{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
Node
(Leaf,
[{ra=3;rb=["foo";"bar"]}],
Leaf))
(* how to print 'leafs' *)
let pp_string ppf x = fprintf ppf "%s" x
let pp_int ppf x = fprintf ppf "%d" x
(* a recipe for making leaf printing foldable *)
type sep_wrapper = {
primitive : 'a. (formatter->'a->unit)->formatter->bool->'a->bool
}
let pp_sep separator = {
primitive = fun pp ppf first x ->
if not first then fprintf ppf "@ %s " separator; pp ppf x; false
}
(* folds for our homogeous data structures *)
let rec fold_t f s = function
| Leaf -> s
| Node (l, xs, r) -> fold_t f (List.fold_left f (fold_t f s l) xs) r
(* pretty printing workers, used mostly internally *)
let pp_r pp ppf first {ra=ra; rb=rb} =
let first = pp.primitive pp_int ppf first ra in
List.fold_left (pp.primitive pp_string ppf) first rb
let pp_t pp ppf =
fold_t (pp_r pp ppf)
(* pretty printing for the typical user *)
let pp_whole pp =
fun separator ppf x -> ignore (pp (pp_sep separator) ppf true x)
let pp_r' = pp_whole pp_r
let pp_t' = pp_whole pp_t
(* a small test *)
let _ =
printf "@[%a@." (pp_t' "*") v
Note that the strategy of implementing a general fold and then use it
does not work for type r because it holds
two kinds of leafs (integers and strings). Also note that what I called leafs (or leaves, whatever) do not need to be OCAML primitive types.
03 June 2010
Are these BDDs the same?
A BDD (binary decision diagram) is a data structure that encodes boolean functions. Each node has a variable and two children, low and high. To make it concrete, let's pick a C-language structure.
struct dd_node {
int V; // variable, at least 1 and at most N
int L, H; // indexes in nodes
};
struct dd_node nodes[nodes_size];
A valid BDD representation obeys a few constraints.
- nodes[k].L!=nodes[k].H for k≥2
- nodes[k].L==nodes[k].H==k for k<2
- nodes[nodes[k].X].V>nodes[k].V for k≥2 and X either L or R
- nodes[k]==nodes[kk] implies k==kk
- For all 2≤k<nodes_size there is a kk such that nodes[kk].L==k || nodes[kk].R==k.
Each such node basically encodes a (!v?l:h) expression. For example, if value[V] is the value of variable V then the represented boolean function is evaluated by eval(value, nodes_size-1).
int eval(int* value, int k) {
if (k < 2) return k;
return !value[nodes[k].V]? eval(value, nodes[k].L) : eval(value, nodes[k].H);
}
Here comes the problem. Suppose you are given two BDDs, nodes1 of size s1 and nodes2 of size s2. Design an algorithm to decide if they represent the same boolean function. And, by the way, do not use a stack, not even the call stack.
19 August 2009
Java is fast, they say on ##java
Which class from the Java runtime environment you should use when you need a stack.
The package java.util
contains a few classes suitable for pushing, poping, and
peeking in stack style.
| push | pop | peek | |
|---|---|---|---|
| Stack | s.push(e) |
s.pop() |
s.peek() |
| ArrayList | s.add(e) |
s.remove(s.size()-1) |
s.get(s.size()-1) |
| LinkedList | s.addFirst(e) |
s.removeFirst() |
s.peekFirst() |
| ArrayDeque | s.addFirst(e) |
s.removeFirst() |
s.peekFirst() |
All but ArrayDeque implement
the widely used and known interface List.
Let's see how efficient are the implementations in JRE6, in relative units.
| time | space | |
|---|---|---|
| Stack | x2.09 | x1.6 |
| ArrayList | x1.15 | x1.3 |
| LinkedList | x1.27 | x3.02 |
| ArrayDeque | 1 | 1 |
Yep, ArrayDeque is best
both in terms of time and speed. If you need a subtype of
List then go with
ArrayList. If you want
to eat memory like crazy for no good reason then try
LinkedList. And if you
have a thing for names like Vector
(which Stack extends) then,
well, go for it. Just don't say I told you to use it.
Finally, if you are a speed junkie and you know in advance the
maximum possible size of the stack, then you can implement
your own for a x1.2 speed-up (compared to ArrayDeque).
Just for the record, a basic stack operation for ArrayDeque takes about 15ns on my
old laptop.
Good bye, and stay tuned for the next useless microbenchmark results.
03 July 2009
Type Inference in C++
A short note to let you know about a long-missed C++ feature you can now use.
Since gcc 4.4 (21 April 2009) you can now skip giving the types of variables that you initialize. Do you remember the old macro
#define foreach(i, c) for (typeof((c).begin()) i = (c).begin(); i != (c).end(); ++i)
? Well, you can now write
#define foreach(i, c) for (auto i = (c).begin(); i != (c).end(); ++i)
In fact, since the definition is shorter now, you might even consider ditching the macro altogether. After all, macros cause bugs that take much time to track down.
Edit: Oh, yeah. I forgot to mention that auto will be a standard C++ feature, while typeof is a GNU extension.
01 July 2009
Type Inference in Java, for Generics
An unexpected failure of the Java compiler. Or is it a failure of the Java type system?
Java programmers write repetitive code like
HashMap<String, Integer> foo = new HashMap<String, Integer>();
all the time. Those who seek beauty in their code may choose to use the Google collections library and write instead
HashMap<String, Integer> foo = Maps.newHashMap();
Constructors and static methods interact in very different ways
with generics. If you say new Foo()
you instantiate a raw type, which is quite different from a what
you get by saying new Foo<Bar>().
On the other hand, static methods are the beneficiaries of a little
bit of type inference, as specified by sections
15.12.2.7 and
15.12.2.8
of the Java Language Specification (Third Edition). The actual
rules are complicated (and I haven't read them yet) but the idea
seems to be to use two sources of information:
- the types of the actual arguments
- if the result is "assignment convertible" to some type T then this type T is used too.
The reason why the Google collections library works is the
second bullet: The compiler uses the type of foo
to figure out how to instantiate the type arguments of
Maps.newHashMap.
OK, that was the background. Now let's see some examples.
Would you expect the following code to compile?
interface A {
static enum Foo { BAR; }
static enum Bar { FOO; }
}
class B<T extends Enum<T>> {
private B() {}
public static <T extends Enum<T>> B<T> get() {
return new B<T>();
}
}
public class C {
public static void main(String[] args) {
B<A.Foo> b = B.get();
}
}
Yes, it works. The Sun Java compiler 1.6.0_13 has no problem in handling the sort-of recursive bound put on the type.
Now, would you expect the following to work?
interface A {
static enum Foo { BAR; }
static enum Bar { FOO; }
}
class B<T, S> {
private B() {}
public static <T, S> B<T, S> get() {
return new B<T, S>();
}
}
public class C {
public static void main(String[] args) {
B<A.Foo, A.Bar> b = B.get();
}
}
Again, it works flawlessly. After all, why would multiple type arguments confuse the compiler?
So, you'd expect the following to work too, won't you?
interface A {
static enum Foo { BAR; }
static enum Bar { FOO; }
}
class B<T extends Enum<T>, S extends Enum<S>> {
private B() {}
public static <T extends Enum<T>, S extends Enum<S>> B<T, S> get() {
return new B<T, S>();
}
}
public class C {
public static void main(String[] args) {
B<A.Foo, A.Bar> b = B.get();
}
}
Tough luck, though: It doesn't work. Apparently
"no instance(s) of type variable(s) T,S exist so
that B<T,S> conforms to B<A.Foo,A.Bar>".
I would have thought that T=A.Foo
and S=A.Bar would do the job.
As I said, I haven't read yet the relevant sections in the Java specification to see if this a problem with the language or a problem with the compiler. The thing is that the two sections take 10 and a half screens on my monitor, which signals that I should put aside maybe one whole day for it, and I don't have one whole day to spend. My hope is that it is a bug in the compiler, because otherwise I'd loose some confidence in the language itself. After all, there's something fishy about a 10-pages algorithm that doesn't end up trying the obvious solution.
25 May 2009
Puzzle Answer: Postfix Expressions
A solution to a two-month old puzzle.
A while back I posted a puzzle asking for a recursive and stackless postfix expression evaluator. A bit over a week ago I received a solution from Ashutosh Mehra. (Go on, have a look at his blog. It's quite nice.)
At first sight this seems like a lowly coding exercise. And it is, but with a catch: It is designed to illustrate two fundamental ideas that should be part of every programmer's ethos.
- Recursion is equivalent to loops.
- Recursion is implemented with stacks.
If one of these two fairly obvious statements is not built into
you, then you'll have trouble finding a solution. The first statement
implies that any program can be written without using loops. How?
First, write all the loops using only while.
Second, replace each loop while (C) B;
with a call to a new function loop
defined as follows:
void loop(ARGS) {
if (!C) return;
B; loop();
}
The condition C and the body B will probably refer to some local
variables. In order to have them accessible for reading and for writing
from loop we send as arguments ARGS
pointers to them, and modify C and B accordingly.
Exercise: What to do if B contains
break? (Note: The solution to this is supposed to be easy if the
previous two paragraphs were understood.)
What about the stack? We also want to get rid of the stack.
The solution is again easy if you remember that local variables (and function arguments) come from a "stack". So the key idea is to use the call stack as the operand stack. This said, enjoy the solution.
#include <stdio.h>
#include <ctype.h>
int add(int x, int y) { return x + y; }
int sub(int x, int y) { return x - y; }
int mul(int x, int y) { return x * y; }
typedef int (*op_t)(int, int);
op_t op[256];
int c; /* last read character */
void read_digits(int* y) {
if (!isdigit(c = getchar())) return;
*y = eval(*y, c - '0');
read_digits(y);
}
int eval(int x, int y) {
read_digits(&y);
return op[c](x, y);
}
int main() {
op['+'] = add, op['-'] = sub, op['*'] = mul, op['|'] = add;
printf("%d\n", eval(0, getchar() - '0'));
}
When the function eval(x, y)
is called, the top value in the operand stack is
y and the value just before
is x. While the next token
is a digit it calls itself recursively
eval(y, new_digit). The "while" part is implemented
by the recursive function read_digits.
When the token that says what operation should be performed between
x and y
is read, the operation is performed and the result returned.
This variant is somewhat wasteful of memory.
03 April 2009
WHY Java
Y in Java.
Y is a cute concept from lambda calculus. It's easy to define in Haskell.
y f = f (y f) fact r 0 = 1 fact r n = n * r (n-1) f = y fact
An execution of f 3 computes 3!:
f 3 = y fact 3 = fact (y fact) 3 = 3 * y fact 2 = 3 * fact (y fact) 2 = 3 * 2 * y fact 1 = 3 * 2 * fact (y fact) 1 = 3 * 2 * 1 * y fact 0 = 3 * 2 * 1 * 1
While showing this to Claudia I realized that I never coded Y in Java. It's interesting to see how long it is.
interface Fun { int go(int n); }
interface UntiedFun { int go(Fun f, int n); }
public class Rec {
public static void main(String[] args) {
System.out.println(f(10));
}
public static int fact(Fun f, int n) {
if (n == 0) return 1;
return n * f.go(n-1);
}
public static int Y(final UntiedFun f, int n) {
return f.go(new Fun() {
public int go(int n) { return Y(f, n); }
}, n);
}
public static int f(int n) {
return y(new UntiedFun() {
public int go(Fun f, int n) {return fact(f, n);}
}, n);
}
}
Great stuff you can do with recursion. Which reminds me: Did you figure out the last puzzle?
30 March 2009
Postfix Expressions
Puzzle: recursive postfix evaluation.
On Saturday the Computer Science&Informatics School of the University College Dublin has a local programming competition. That, and a problem from Mihai, reminded me that I didn't post puzzles here in a long time. This one is an implementation puzzle.
A typical way to evaluate a postfix expression is to use a stack.
#include <stdio.h>
int stack[1<<20]; /* operand stack */
int sz; /* stack[0..sz) is used */
int main() {
int c; /* last read character */
while ((c = getchar()) != '|') {
switch (c) {
case '+': --sz; stack[sz-1] += stack[sz]; break;
case '-': --sz; stack[sz-1] -= stack[sz]; break;
case '*': --sz; stack[sz-1] *= stack[sz]; break;
default:
if ('0' <= c && c <= '9') stack[sz++] = c - '0';
}
}
printf("%d\n", *stack);
}
The code above assumes that operands are digits, that the only allowed operators are +-*, that the input is terminated by |, and that there are no other characters.
The problem is: Can you implement this without:
- an explicit stack
- loops
Both should be replaced by recursion. As usual, you can email me solutions to radugrigore at gmail.com and I'll post the best solution plus a list of those who solved the puzzle.
28 March 2009
Command Line Options
A short introduction to CLOPS, a Java library for parsing the command line.
Did you ever write
public static void main(String[] args)?
If so, did you ever use args later?
If yes and yes then you need
CLOPS.
I can hear you saying that "handling
args is very easy and doesn't
justify learning a new technology." You are right that the
benefits of a technology have to be weighted against its costs,
and one of the most important costs that a technology incurs is
the time spent learning it. Obviously, this poses a problem: How
can you balance benefits and costs before learning? To solve
this dilemma, it seems to me, most people rely on the assessment
of others. So hear my word: Learning CLOPS is much easier
than you think! Alas, I'm biased because I contributed to
the design. I understand if you don't believe me: I started
to contribute late precisely because in the beginning I was a
skeptic myself. I had to see something working before changing
my mind.
Most technologies have a complexity threshold: Products more complex than the threshold benefit from using the technology; products less complex than the threshold are hurt. HTML is a technology. Is it worth learning HTML for writing a blog? Not really. But is it worth learning if you want to develop a serious web-site? Probably. ViM is another technology. Is it worth learning for writing email? Definitely not. Is it worth learning for writing C programs? Maybe.
So what's the complexity threshold of CLOPS? A simple way to assess the complexity threshold of a technology is to pick a task and complete it with and without the technology. Let's write a program that prints "Hello world!" when run with no arguments, prints "Goodbye cruel world!" when run with the argument -bye, and prints a usage message in all other cases.
public class Main {
public static void main(String[] args) {
if (args.length > 1 || (args.length == 1 && !args[0].equals("-bye"))) {
System.out.println("Usage: hello [-bye]");
return;
}
if (args.length == 1)
System.out.println("Goodbye cruel world!");
else
System.out.println("Hello world!");
}
}
The CLOPS version is almost the same length:
public class Main {
public static void main(String[] args) {
HelloParser parser = new HelloParser();
if (!parser.parse(args)) {
System.out.println("Usage: hello [-bye]");
return;
}
if (parser.getOptionStore().isByeSet())
System.out.println("Goodbye cruel world!");
else
System.out.println("Hello world!");
}
}
That's not too bad. But we do have to write one more file, let's call it hello.clo, that describes the command line.
NAME:: Hello
ARGS:: Bye: {"-bye"}
FORMAT:: Bye?;
To compile we must say
clops hello.clo javac -cp clops-runtime.jar:. *.java
and to run we say
java -cp clops-runtime.jar:. Main
This is definitely more complicated than not using CLOPS. It indicates that for such a simple example it is not worth using CLOPS. But, by the same argument, for this example it is not worth using the build system ANT. Why write a build.xml file when you can get away without? Yet, for some strange reason some people keep using ANT even for such small projects. (Not me.) I believe the reason they are using it is that they get into a routine: copy the build.xml file of an old project, modify a little inside, and then operate in a familiar environment. It makes sense. Even if later the build process becomes more complicated than just invoking the Java compiler, it still is the case that a build can be achieved by one command: ant. The price, copying and editing slightly a file in the beginning, is something most developers are willing to pay.
It's the same with CLOPS. As soon as the command line starts to get a little more complicated you begin to feel the benefits. Are there two boolean options that shouldn't be given at the same time? You add one line to hello.clo: Your program and your documentation are updated. Does an option represent a file name that must exist? You add one keyword in hello.clo: Your program and your documentation are updated. Can an option appear only after another certain option? You guessed: You modify hello.clo slightly and you are done.
OK, now that you are convinced that it's worth using CLOPS when you write a command line tool it is time to address another question: Why would you write a command line tool in Java as opposed to, say, C? (If you are still not convinced that you should try CLOPS, then please pretend that you are for the next 5 minutes.) Well, why not? There are many programs that contain command line tools and are written in Java, such as web service testers, 3D graphics tools, bug finders, messenging and communication tools, testing frameworks, backup tools, and data reporting tools. It is true that at the moment most command line tools are written in C, but this will change. Of course, we'd be happy to see a C port of CLOPS.
We'd also be happy to hear what you think.