Theory and Practice: coding

Showing posts with label coding. Show all posts

29 July 2015

Coverability for Vector Addition Systems

Vector addition systems are a model of computation. For this model, coverability is one of the easiest decision problems. This post presents an old (1978) upper bound, with accompanying code.

No comments:

31 October 2014

Dynamic Dispatch

Labels: coding, teaching

In which I explain how to solve puzzle questions involving dynamic dispatch.

1 comment:

28 October 2014

Why Do We Fail?

Labels: coding, opinion, research

What is there to do if a problem refuses to be knocked down? There's a generic problem solving strategy for that.

No comments:

26 August 2014

Watching Horn

Labels: coding, learning, research

How to use watched literals (or, rather, vertices) to do breadth-first search in Horn hyperdigraphs.

No comments:

24 July 2014

Depth First Search in Python

Labels: coding

Everybody thinks they understand depth-first search. The algorithm is indeed simple. But — I think — it is deceptively simple. In this post I play with some code.

1 comment:

11 December 2013

You could compute the edit distance of two strings by memoization or by dynamic programming. The latter should be more efficient, at the expense of clarity. But, how much more efficient? This post shows some (micro) benchmarks.

No comments:

05 April 2012

Fixed-Point Magic: Successive Recursive Calls

Labels: coding, learning

A brief summary of the Y-combinator trick, and an unusual use.

No comments:

18 August 2011

Putting Formulas in Normal Form

Labels: coding

What would be a beautiful description for conversion to DNF?

The problem of converting a formula to DNF is easy to explain: Given a boolean formula, write it as a disjunction of conjunctions. For example, $(x \lor y) \land z$ may be rewritten, by distributivity, as $(x \land z) \lor (y \land z)$. The mechanism is also easy to describe: Push $\lnot$ down by de Morgan, then pull $\lor$ above $\land$ by distributivity.

How to turn this description into an algorithm?

I know a nice algorithm for the first half. Expressed in OCaml, it goes like this: We want a function with the type f1->f2, where

type f1 = [`Var of string | `Not of f1 | `And of f1 list | `Or of f1 list]
type f2 = [`Pos of string | `Neg of string | `And of f2 list | `Or of f2 list]

The trick is to keep around a parity.

let rec pnd (p : bool) : f1 -> f2 = function
 | `Var x -> if p then `Pos x else `Neg x
 | `Not f -> pnd (not p) f
 | `And fs ->
 let fs = List.map (pnd p) fs in
 if p then `And fs else `Or fs
 | `Or fs ->
 let fs = List.map (pnd p) fs in
 if p then `Or fs else `And fs

This algorithm I use when I do boolean manipulations on paper. So I like it. The code is nice, rather than beautiful, because it has some repetition.

What would be a similarly nice solution for the second half—applying distributivity? Expressed in OCaml, we want a function with the type f2->dnf, where

type dnf = [`Pos of string | `Neg of string] list list

Of course, I'm interested in solutions in any language.

5 comments:

16 August 2011

Loops and Side-effects

Labels: coding

TeX has a better looping construct.

I designed a small Java-like language for some experiments. I think it is a good idea to be able to say "this piece of code has no side-effects." (If you saw how I write C/++, you'd probably be surprised by what I just said.) There are various ways of isolating side-effects. My requirements were that (1) it must look much like Java and (2) there should be some way to contain side-effects. So I simply banned side-effects from expressions.

Then I proceeded to translate the following Java code.

while (i.hasNext()) {
 Object o = i.next();
 // bla
}

Method calls, however, possibly have side-effects so I could not use them in expressions. A method call is a statement in my toy language.

while (true) {
 boolean c = i.hasNext();
 if (!c) break;
 Object o = i.next();
 // bla
}

This loop reminded me of \loop ... \if ... \repeat in $\TeX$. The keywords loop and repeat, however, are rather un-Java-like-esque. So I decided to say do {...} while (...) {...}. This way, the normal do-loop and the while-loop are special cases obtained by omitting one of the bodies.

do boolean c = i.hasNext()
while (c) {
 Object o = i.next();
 // bla
}

Note that in Java (and C/++) one has the illusion that loops test their condition at the beginning (while) or at the end (do) but never in the middle. This is not true, because expressions may have side-effects.

If languages would have do-while-loops, then I'd probably break far less often.

I decided to write this post after seeing another post about how to write a similar kind of loop in OCaml.

No comments:

01 April 2011

Some Random Code Metrics

Labels: coding, random

How compressible is source code? What is the average size of a file? The answers to these and other useless questions are born from too much caffeine.

The projects:

Ant: A build system used by many Java projects, which is itself written in Java.
CPython: The reference Python implementation, written in C.
Frama-C: Analyzers of C programs, written themselves mostly in OCaml.
FreeBoogie: A project of mine written in Java.
GHC: The most used Haskell compiler, written in Haskell.
jEdit: A nice source code editor, written in Java.
Linux: An operating system macro-kernel, written in C.
OCaml: Compiler, standard library, and related tools for the language OCaml, written in OCaml.
SGB: A library for handling graphs and a few example algorithms written by Knuth in CWEB.

Methodology. Get the repo, spend $\le 1$ min choosing a subset of files that look like "the source", run a few quick commands in the shell, don't check the results. That's so you know how much you can trust what follows. Nevertheless, it's likely I got the orders of magnitude right.

Project size contest. The most common measure for a project's size is the number of lines of code. This comes with plenty of caveats. In any case, if we define "lines of code" to be "number of '\n' characters in the files that Radu happened to choose as the 'source'" then here are the results:

Linux: 9.1 million
GHC: 460 thousand
CPython: 450 thousand
OCaml: 260 thousand
Ant: 200 thousand
jEdit: 160 thousand
Frama-C: 126 thousand
SGB: 19 thousand
FreeBoogie: 8 thousand

There's an alternative measure that I like much better and that is not used much: the compressed size of the source code. Why do I like it? Because it should make a good proxy for the information content in the source code. For example, it doesn't matter much if coders use space indentation or tab indentation, long lines or short line, etc. There are plenty of caveats here too. For example, it is likely that the true information content (say, Kolmogorov complexity) is much lower, and that would be apparent if compressers would exploit the structure of the language in which the code is written.

Anyway, here is the bzip2 size of the projects.

Linux: 45 MB
GHC: 4.3 MB
CPython: 2.0 MB
OCaml: 1.2 MB
Ant: 930 kB
Frama-C: 735 kB
jEdit: 690 kB
SGB: 190 kB
FreeBoogie: 50 kB

File size contest. All these projects are broken up into files, which roughly correspond to modules or abstraction boundaries. The idea is that you should be able to focus on the internals of one file at a time without needing to know too much about the other files. And that is true of the compiler too, not only of you. Or, at least, that's one way to look at it.

So, lines per file contest:

CPython: 860
Linux: 680
SGB: 580
OCaml: 330
jEdit: 330
Frama-C. 300
GHC: 280
Ant: 260
FreeBoogie: 140

And compressed bytes per file contest:

SGB: 5900 B
CPython: 3700 B
Linux: 3300 B
GHC: 2600 B
Frama-C: 1700 B
OCaml: 1500 B
jEdit: 1400 B
Ant: 1200 B
FreeBoogie: 780 B

Information density. So, which project should I read if I want to get most per byte? And which one can be read on the bus without missing much? Well, here's information per character (where information is "measured" as earlier: compressed size, so these are basically inverses of compression ratios).

SGB: 0.25
GHC: 0.21
FreeBoogie: 0.19
Linux: 0.18
Frama-C: 0.16
jEdit: 0.16
CPython: 0.15
OCaml: 0.14
Ant: 0.14

Looks like GHC is almost as incompressible as the code Knuth writes.

No comments:

03 December 2010

More OCaml Pretty-Printing

Labels: coding

How to print expressions with binary operators using as few parentheses as possible. In this post, we avoid parentheses that are rendered unnecessary by precedence rules.

Consider the following type.

type op = Plus | Times
type expr = Op of op * expr list | I of int

And here is a value:

let v = Op (Plus, [I 3; Op (Times, [I 5; I 7; Op (Plus, [I 4; I 2])]); I 2])

If we don't care about how many parentheses are printed, we'd write

open Format
let op2s = function Plus -> '+' | Times -> '*'
let rec pp_list sep pp ppf = function
 | [] -> ()
 | [x] -> fprintf ppf "%a" pp x
 | x::xs -> fprintf ppf "%a%c%a" pp x sep (pp_list sep pp) xs
let rec pp ppf = function
 | Op (o, es) -> fprintf ppf "@[(%a)@]@," (pp_list (op2s o) pp) es
 | I x -> fprintf ppf "%d@," x
let _ = pp std_formatter v

The rule for precedence is simple: It's safe to strip parentheses in $(A \circ B) \bullet C$ if and only if $\circ$ has higher precedence than $\bullet$. It would be nice to isolate the code that decides whether parentheses are needed. We begin by removing grouping concerns from pp.

let pp r ppf = function
 | Op (o, es) -> pp_list (op2s o) r ppf es
 | I x -> fprintf ppf "%d" x

We can recover parentheses everywhere with a simple Y-like combinator.

let rec y1 f ppf x = fprintf ppf "@[(%a)@]@," (f (y1 f)) x
let pp1 = y1 pp
let _ = pp1 std_formatter v

Now pp1 does (almost) the same as the initial pp, the big difference being that we isolated the code that handles grouping. Because of this, implementing a policy based on operator precedences is simply a matter of defining another combinator.

let precedence = function 
 | Op (Plus, _::_::_) -> 1
 | Op (Times, _::_::_) -> 2
 | _ -> 0
let rec y2 up f ppf x =
 let down = precedence x in
 let lp, rp = 
 if down <= up && up <> 0 && down <> 0 then ("(", ")") else ("","") in
 fprintf ppf "@[%s%a%s@]@," lp (f (y2 down f)) x rp
let pp2 = y2 0 pp
let _ = pp2 std_formatter v

3 comments:

15 September 2010

A Certain Type of Pretty-Printing in OCAML

Labels: coding

You want to print Node(Leaf, [{ra=1;rb="a"};{ra=2;rb="b"}], Node(Leaf, [{ra=3;rb=["foo";"bar"]}], Leaf)) as "1 * a * 2 * b * 3 * foo * bar". How do you do it?

Let us begin with the simpler problem of printing a star-separated list of integers.

open Format
let simple_print = function ->
 | [] -> ()
 | [x] -> printf "%d" x
 | x :: xs -> printf "%d@ * "; simple_print xs

Since there is one less star than there are integers, the last integer in the list is treated in a special way. Of course, we may want to also print lists of strings in this way; or we may want to use some other separator than a star; or we may want to use a different formatter.

let better_print separator pp ppf = function ->
 | [] -> ()
 | [x] -> fprintf ppf "%a" pp x
 | x::xs -> 
 fprintf ppf "%a@ %s " pp x separator; 
 better_print separator pp ppf xs
let pp_string ppf x = fprintf ppf "%s" x
let pp_int ppf x = fprintf ppf "%d" x
better_print "||" pp_string std_formatter ["foo"; "bar"]

Now comes the interesting part. Suppose that what we really want to print are the primitive values that hang in a complicated tree of nested data structures like lists, variants, records, sets, and so on. For example, the value mentioned in the summary has type t.

type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t
let v =
 Node
 (Leaf,
 [{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
 Node
 (Leaf,
 [{ra=3;rb=["foo";"bar"]}],
 Leaf))

We'll probably have a pretty-printing function pp_r for type r and a pretty-printing function pp_t for type t. Let's think about the latter. For Node(left,data,right) we'd be tempted to say "print left, print star, print data, print star, print right." Of course, left might be a leaf, in which case we should omit the first "print star." Or perhaps, the list data is empty, in which case we should omit one of the two "print star"s. In a more complicated scenario data might be a list of sets and, although the list is not empty, all of its sets may be. In order to figure whether this is the case we'd need to traverse the whole list. Or perhaps we have a record with $n$ fields. After printing the first $k$ fields we should print a star when one of the first $k$ fields is nonempty and one of the last $n-k$ fields is nonempty. So we'd better start by asking once in the beginning each field if it is empty: Otherwise we may end up asking each field $\sim n$ times and we know that checking emptiness may take linear time. (It would be pretty stupid for pretty-printing of a big structure with few leafs to take quadratic time.) Or perhaps, …

OK. That's a nightmare, so let's try to take a step back. We want:

a way to traverse the data structure,
a way to print each possible leaf, and
a way to know when we get to a leaf whether it is the first (or last).

That's easy! It sounds like what we want is a fold. Well, almost, because when we hit a string leaf we need to call one function and when we hit an integer leaf we need to call another. So it is a specialized fold but, nevertheless, our implementation may look very similar to what we'd do for a fold.

let pp_string' ppf first x = 
 if not first then fprintf ppf "@ * "; pp_string ppf x; false
let pp_int' ppf first x =
 if not first then fprintf ppf "@ * "; pp_int ppf x; false
let pp_r ppf first {ra=ra; rb=rb} =
 let first = pp_int' ppf first ra in
 List.fold_left pp_string' first rb
let pp_t ppf first = function
 | Leaf -> first
 | Node (left, data, right) ->
 let first = pp_t ppf first left in
 let first = List.fold_left (pp_r ppf) first data in
 pp_t ppf first right
let pp_r' ppf = pp_r ppf false
let pp_t' ppf = pp_t ppf false

This works and is already much nicer than the previous option. However, There are still problems.

The functions pp_string' and pp_int' are obtained from their unprimed counterparts in a very systematic way, so we shouldn't write the same code once for each leaf type.
Similarly for pp_r' and pp_t'.
Note that for lists we used List.fold_left so for the type t we should be able to do something similar: Implement a generic fold, and then use it.
The star is hardcoded. What if we want a different separator?

So let's see how can we encode the recipe for obtaining pp_string' and pp_int'. We may be tempted to try the following.

let pp_sep separator pp = 
 fun ppf first x -> if not first then fprintf ppf "@ %s " separator; false

Yay: We also solved the problem with the star being hard-coded! What we can do is to add a (first) parameter pp to pp_r and pp_t. Then we redefine the primed versions.

let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false
let pp_r' separator ppf = pp_r (pp_primitive separator) ppf false

We then replace calls to pp_int' by pp pp_int. Everything's sweet, right? Well, … no, this doesn't typecheck. The reason is in OCAML's FAQ. You may want to try these modifications to convince yourself that they indeed do not work. Anyway, the workaround is in OCAML's FAQ. So here's how to do it.

open Format
(* example data structures *)
type r = { ra : int; rb : string list }
type t = Leaf | Node of t * r list * t
(* a value to test with *)
let v =
 Node
 (Leaf,
 [{ra=1;rb=["a"]};{ra=2;rb=["b"]}],
 Node
 (Leaf,
 [{ra=3;rb=["foo";"bar"]}],
 Leaf))
(* how to print 'leafs' *)
let pp_string ppf x = fprintf ppf "%s" x
let pp_int ppf x = fprintf ppf "%d" x
(* a recipe for making leaf printing foldable *)
type sep_wrapper = { 
 primitive : 'a. (formatter->'a->unit)->formatter->bool->'a->bool
}
let pp_sep separator = {
 primitive = fun pp ppf first x -> 
 if not first then fprintf ppf "@ %s " separator; pp ppf x; false
}
(* folds for our homogeous data structures *)
let rec fold_t f s = function
 | Leaf -> s
 | Node (l, xs, r) -> fold_t f (List.fold_left f (fold_t f s l) xs) r
(* pretty printing workers, used mostly internally *)
let pp_r pp ppf first {ra=ra; rb=rb} =
 let first = pp.primitive pp_int ppf first ra in
 List.fold_left (pp.primitive pp_string ppf) first rb
let pp_t pp ppf = 
 fold_t (pp_r pp ppf)
(* pretty printing for the typical user *)
let pp_whole pp = 
 fun separator ppf x -> ignore (pp (pp_sep separator) ppf true x)
let pp_r' = pp_whole pp_r
let pp_t' = pp_whole pp_t
(* a small test *)
let _ =
 printf "@[%a@." (pp_t' "*") v

Note that the strategy of implementing a general fold and then use it does not work for type r because it holds two kinds of leafs (integers and strings). Also note that what I called leafs (or leaves, whatever) do not need to be OCAML primitive types.

No comments:

03 June 2010

Are these BDDs the same?

Labels: coding, learning

This is exercise 7.1.4-12 in TAoCP. It is easy, but somehow I liked the solution.

A BDD (binary decision diagram) is a data structure that encodes boolean functions. Each node has a variable and two children, low and high. To make it concrete, let's pick a C-language structure.

struct dd_node {
 int V; // variable, at least 1 and at most N
 int L, H; // indexes in nodes
};
struct dd_node nodes[nodes_size];

A valid BDD representation obeys a few constraints.

nodes[k].L!=nodes[k].H for k≥2
nodes[k].L==nodes[k].H==k for k<2
nodes[nodes[k].X].V>nodes[k].V for k≥2 and X either L or R
nodes[k]==nodes[kk] implies k==kk
For all 2≤k<nodes_size there is a kk such that nodes[kk].L==k || nodes[kk].R==k.

Each such node basically encodes a (!v?l:h) expression. For example, if value[V] is the value of variable V then the represented boolean function is evaluated by eval(value, nodes_size-1).

int eval(int* value, int k) {
 if (k < 2) return k;
 return !value[nodes[k].V]? eval(value, nodes[k].L) : eval(value, nodes[k].H);
}

Here comes the problem. Suppose you are given two BDDs, nodes1 of size s1 and nodes2 of size s2. Design an algorithm to decide if they represent the same boolean function. And, by the way, do not use a stack, not even the call stack.

No comments:

19 August 2009

Java is fast, they say on ##java

Labels: coding

Which class from the Java runtime environment you should use when you need a stack.

The package java.util contains a few classes suitable for pushing, poping, and peeking in stack style.

push	pop	peek
Stack	`s.push(e)`	`s.pop()`	`s.peek()`
ArrayList	`s.add(e)`	`s.remove(s.size()-1)`	`s.get(s.size()-1)`
LinkedList	`s.addFirst(e)`	`s.removeFirst()`	`s.peekFirst()`
ArrayDeque	`s.addFirst(e)`	`s.removeFirst()`	`s.peekFirst()`

All but ArrayDeque implement the widely used and known interface List.

Let's see how efficient are the implementations in JRE6, in relative units.

time	space
Stack	x2.09	x1.6
ArrayList	x1.15	x1.3
LinkedList	x1.27	x3.02
ArrayDeque	1	1

Yep, ArrayDeque is best both in terms of time and speed. If you need a subtype of List then go with ArrayList. If you want to eat memory like crazy for no good reason then try LinkedList. And if you have a thing for names like Vector (which Stack extends) then, well, go for it. Just don't say I told you to use it. Finally, if you are a speed junkie and you know in advance the maximum possible size of the stack, then you can implement your own for a x1.2 speed-up (compared to ArrayDeque).

Just for the record, a basic stack operation for ArrayDeque takes about 15ns on my old laptop.

Good bye, and stay tuned for the next useless microbenchmark results.

4 comments:

03 July 2009

Type Inference in C++

Labels: coding

A short note to let you know about a long-missed C++ feature you can now use.

Since gcc 4.4 (21 April 2009) you can now skip giving the types of variables that you initialize. Do you remember the old macro

#define foreach(i, c) for (typeof((c).begin()) i = (c).begin(); i != (c).end(); ++i)

? Well, you can now write

#define foreach(i, c) for (auto i = (c).begin(); i != (c).end(); ++i)

In fact, since the definition is shorter now, you might even consider ditching the macro altogether. After all, macros cause bugs that take much time to track down.

Edit: Oh, yeah. I forgot to mention that auto will be a standard C++ feature, while typeof is a GNU extension.

6 comments:

01 July 2009

Type Inference in Java, for Generics

Labels: coding

An unexpected failure of the Java compiler. Or is it a failure of the Java type system?

Java programmers write repetitive code like

HashMap<String, Integer> foo = new HashMap<String, Integer>();

all the time. Those who seek beauty in their code may choose to use the Google collections library and write instead

HashMap<String, Integer> foo = Maps.newHashMap();

Constructors and static methods interact in very different ways with generics. If you say new Foo() you instantiate a raw type, which is quite different from a what you get by saying new Foo<Bar>(). On the other hand, static methods are the beneficiaries of a little bit of type inference, as specified by sections 15.12.2.7 and 15.12.2.8 of the Java Language Specification (Third Edition). The actual rules are complicated (and I haven't read them yet) but the idea seems to be to use two sources of information:

the types of the actual arguments
if the result is "assignment convertible" to some type T then this type T is used too.

The reason why the Google collections library works is the second bullet: The compiler uses the type of foo to figure out how to instantiate the type arguments of Maps.newHashMap.

OK, that was the background. Now let's see some examples.

Would you expect the following code to compile?

interface A {
 static enum Foo { BAR; }
 static enum Bar { FOO; }
}
class B<T extends Enum<T>> {
 private B() {}
 public static <T extends Enum<T>> B<T> get() {
 return new B<T>();
 }
}
public class C {
 public static void main(String[] args) {
 B<A.Foo> b = B.get();
 }
}

Yes, it works. The Sun Java compiler 1.6.0_13 has no problem in handling the sort-of recursive bound put on the type.

Now, would you expect the following to work?

interface A {
 static enum Foo { BAR; }
 static enum Bar { FOO; }
}
class B<T, S> {
 private B() {}
 public static <T, S> B<T, S> get() {
 return new B<T, S>();
 }
}
public class C {
 public static void main(String[] args) {
 B<A.Foo, A.Bar> b = B.get();
 }
}

Again, it works flawlessly. After all, why would multiple type arguments confuse the compiler?

So, you'd expect the following to work too, won't you?

interface A {
 static enum Foo { BAR; }
 static enum Bar { FOO; }
}
class B<T extends Enum<T>, S extends Enum<S>> {
 private B() {}
 public static <T extends Enum<T>, S extends Enum<S>> B<T, S> get() {
 return new B<T, S>();
 }
}
public class C {
 public static void main(String[] args) {
 B<A.Foo, A.Bar> b = B.get();
 }
}

Tough luck, though: It doesn't work. Apparently "no instance(s) of type variable(s) T,S exist so that B<T,S> conforms to B<A.Foo,A.Bar>". I would have thought that T=A.Foo and S=A.Bar would do the job.

As I said, I haven't read yet the relevant sections in the Java specification to see if this a problem with the language or a problem with the compiler. The thing is that the two sections take 10 and a half screens on my monitor, which signals that I should put aside maybe one whole day for it, and I don't have one whole day to spend. My hope is that it is a bug in the compiler, because otherwise I'd loose some confidence in the language itself. After all, there's something fishy about a 10-pages algorithm that doesn't end up trying the obvious solution.

5 comments:

25 May 2009

Puzzle Answer: Postfix Expressions

Labels: coding, puzzle

A solution to a two-month old puzzle.

A while back I posted a puzzle asking for a recursive and stackless postfix expression evaluator. A bit over a week ago I received a solution from Ashutosh Mehra. (Go on, have a look at his blog. It's quite nice.)

At first sight this seems like a lowly coding exercise. And it is, but with a catch: It is designed to illustrate two fundamental ideas that should be part of every programmer's ethos.

Recursion is equivalent to loops.
Recursion is implemented with stacks.

If one of these two fairly obvious statements is not built into you, then you'll have trouble finding a solution. The first statement implies that any program can be written without using loops. How? First, write all the loops using only while. Second, replace each loop while (C) B; with a call to a new function loop defined as follows:

void loop(ARGS) {
 if (!C) return;
 B; loop();
}

The condition C and the body B will probably refer to some local variables. In order to have them accessible for reading and for writing from loop we send as arguments ARGS pointers to them, and modify C and B accordingly.

Exercise: What to do if B contains break? (Note: The solution to this is supposed to be easy if the previous two paragraphs were understood.)

What about the stack? We also want to get rid of the stack.

The solution is again easy if you remember that local variables (and function arguments) come from a "stack". So the key idea is to use the call stack as the operand stack. This said, enjoy the solution.

#include <stdio.h>
#include <ctype.h>
int add(int x, int y) { return x + y; }
int sub(int x, int y) { return x - y; }
int mul(int x, int y) { return x * y; }
typedef int (*op_t)(int, int);
op_t op[256];
int c; /* last read character */
void read_digits(int* y) {
 if (!isdigit(c = getchar())) return;
 *y = eval(*y, c - '0');
 read_digits(y);
}
int eval(int x, int y) {
 read_digits(&y);
 return op[c](x, y);
}
int main() {
 op['+'] = add, op['-'] = sub, op['*'] = mul, op['|'] = add;
 printf("%d\n", eval(0, getchar() - '0'));
}

When the function eval(x, y) is called, the top value in the operand stack is y and the value just before is x. While the next token is a digit it calls itself recursively eval(y, new_digit). The "while" part is implemented by the recursive function read_digits. When the token that says what operation should be performed between x and y is read, the operation is performed and the result returned.

This variant is somewhat wasteful of memory.

No comments:

03 April 2009

WHY Java

Labels: coding

Y in Java.

Y is a cute concept from lambda calculus. It's easy to define in Haskell.

y f = f (y f)
fact r 0 = 1
fact r n = n * r (n-1)
f = y fact

An execution of f 3 computes 3!:

f 3
 = y fact 3
 = fact (y fact) 3
 = 3 * y fact 2
 = 3 * fact (y fact) 2
 = 3 * 2 * y fact 1
 = 3 * 2 * fact (y fact) 1
 = 3 * 2 * 1 * y fact 0
 = 3 * 2 * 1 * 1

While showing this to Claudia I realized that I never coded Y in Java. It's interesting to see how long it is.

interface Fun { int go(int n); }
interface UntiedFun { int go(Fun f, int n); }
public class Rec {
 public static void main(String[] args) {
 System.out.println(f(10));
 }
 public static int fact(Fun f, int n) {
 if (n == 0) return 1;
 return n * f.go(n-1);
 }
 public static int Y(final UntiedFun f, int n) {
 return f.go(new Fun() {
 public int go(int n) { return Y(f, n); }
 }, n);
 }
 public static int f(int n) {
 return y(new UntiedFun() {
 public int go(Fun f, int n) {return fact(f, n);}
 }, n);
 }
}

Great stuff you can do with recursion. Which reminds me: Did you figure out the last puzzle?

No comments:

30 March 2009

Postfix Expressions

Labels: coding, puzzle

Puzzle: recursive postfix evaluation.

On Saturday the Computer Science&Informatics School of the University College Dublin has a local programming competition. That, and a problem from Mihai, reminded me that I didn't post puzzles here in a long time. This one is an implementation puzzle.

A typical way to evaluate a postfix expression is to use a stack.

#include <stdio.h>
int stack[1<<20]; /* operand stack */
int sz; /* stack[0..sz) is used */
int main() {
 int c; /* last read character */
 while ((c = getchar()) != '|') {
 switch (c) {
 case '+': --sz; stack[sz-1] += stack[sz]; break;
 case '-': --sz; stack[sz-1] -= stack[sz]; break;
 case '*': --sz; stack[sz-1] *= stack[sz]; break;
 default:
 if ('0' <= c && c <= '9') stack[sz++] = c - '0';
 }
 }
 printf("%d\n", *stack);
}

The code above assumes that operands are digits, that the only allowed operators are +-*, that the input is terminated by |, and that there are no other characters.

The problem is: Can you implement this without:

an explicit stack
loops

Both should be replaced by recursion. As usual, you can email me solutions to radugrigore at gmail.com and I'll post the best solution plus a list of those who solved the puzzle.

No comments:

28 March 2009

Command Line Options

Labels: coding

A short introduction to CLOPS, a Java library for parsing the command line.

Did you ever write public static void main(String[] args)? If so, did you ever use args later? If yes and yes then you need CLOPS.

I can hear you saying that "handling args is very easy and doesn't justify learning a new technology." You are right that the benefits of a technology have to be weighted against its costs, and one of the most important costs that a technology incurs is the time spent learning it. Obviously, this poses a problem: How can you balance benefits and costs before learning? To solve this dilemma, it seems to me, most people rely on the assessment of others. So hear my word: Learning CLOPS is much easier than you think! Alas, I'm biased because I contributed to the design. I understand if you don't believe me: I started to contribute late precisely because in the beginning I was a skeptic myself. I had to see something working before changing my mind.

Most technologies have a complexity threshold: Products more complex than the threshold benefit from using the technology; products less complex than the threshold are hurt. HTML is a technology. Is it worth learning HTML for writing a blog? Not really. But is it worth learning if you want to develop a serious web-site? Probably. ViM is another technology. Is it worth learning for writing email? Definitely not. Is it worth learning for writing C programs? Maybe.

So what's the complexity threshold of CLOPS? A simple way to assess the complexity threshold of a technology is to pick a task and complete it with and without the technology. Let's write a program that prints "Hello world!" when run with no arguments, prints "Goodbye cruel world!" when run with the argument -bye, and prints a usage message in all other cases.

public class Main {
 public static void main(String[] args) {
 if (args.length > 1 || (args.length == 1 && !args[0].equals("-bye"))) {
 System.out.println("Usage: hello [-bye]");
 return;
 }
 if (args.length == 1)
 System.out.println("Goodbye cruel world!");
 else
 System.out.println("Hello world!");
 }
}

The CLOPS version is almost the same length:

public class Main {
 public static void main(String[] args) {
 HelloParser parser = new HelloParser();
 if (!parser.parse(args)) {
 System.out.println("Usage: hello [-bye]");
 return;
 }
 if (parser.getOptionStore().isByeSet())
 System.out.println("Goodbye cruel world!");
 else
 System.out.println("Hello world!");
 }
}

That's not too bad. But we do have to write one more file, let's call it hello.clo, that describes the command line.

NAME:: Hello
ARGS:: Bye: {"-bye"}
FORMAT:: Bye?;

To compile we must say

clops hello.clo
javac -cp clops-runtime.jar:. *.java

and to run we say

java -cp clops-runtime.jar:. Main

This is definitely more complicated than not using CLOPS. It indicates that for such a simple example it is not worth using CLOPS. But, by the same argument, for this example it is not worth using the build system ANT. Why write a build.xml file when you can get away without? Yet, for some strange reason some people keep using ANT even for such small projects. (Not me.) I believe the reason they are using it is that they get into a routine: copy the build.xml file of an old project, modify a little inside, and then operate in a familiar environment. It makes sense. Even if later the build process becomes more complicated than just invoking the Java compiler, it still is the case that a build can be achieved by one command: ant. The price, copying and editing slightly a file in the beginning, is something most developers are willing to pay.

It's the same with CLOPS. As soon as the command line starts to get a little more complicated you begin to feel the benefits. Are there two boolean options that shouldn't be given at the same time? You add one line to hello.clo: Your program and your documentation are updated. Does an option represent a file name that must exist? You add one keyword in hello.clo: Your program and your documentation are updated. Can an option appear only after another certain option? You guessed: You modify hello.clo slightly and you are done.

OK, now that you are convinced that it's worth using CLOPS when you write a command line tool it is time to address another question: Why would you write a command line tool in Java as opposed to, say, C? (If you are still not convinced that you should try CLOPS, then please pretend that you are for the next 5 minutes.) Well, why not? There are many programs that contain command line tools and are written in Java, such as web service testers, 3D graphics tools, bug finders, messenging and communication tools, testing frameworks, backup tools, and data reporting tools. It is true that at the moment most command line tools are written in C, but this will change. Of course, we'd be happy to see a C port of CLOPS.

We'd also be happy to hear what you think.

No comments: