That said, one possible overhead is that, although arc compiler tries to use ar-funcall2 to avoid consing the arguments, they are consed after all since _+ and _< are defined to take a list of arguments.
The original (time (fib 34)) took 79.8s on my machine.
Changing arc< in ac.scm with this:
(define arc<
(case-lambda
[(a b)
(cond [(and (number? a) (number? b)) (< a b)]
[(and (string? a) (string? b)) (string<? a b)]
[(and (symbol? a) (symbol? b)) (string<? (symbol->string a)
(symbol->string b))]
[(and (char? a) (char? b)) (char<? a b)]
[else (< a b)])]
[args
(cond ((all number? args) (apply < args))
((all string? args) (pairwise string<? args #f))
((all symbol? args) (pairwise (lambda (x y)
(string<? (symbol->string x)
(symbol->string y)))
args
#f))
((all char? args) (pairwise char<? args #f))
(#t (apply < args)))]))
made (time (fib 34)) 72.8s, and further changing _+ with this: (xdef '+
(case-lambda
[() 0]
[(a b)
(cond [(and (string? a) (string? b)) (string-append a b)]
[(and (arc-list? a) (arc-list? b))
(ac-niltree (append (ar-nil-terminate a)
(ar-nil-terminate b)))]
[else (+ a b)])]
[args
(cond [(all string? args)
(apply string-append args)]
[(all arc-list? args)
(ac-niltree (apply append (map ar-nil-terminate args)))]
[else (apply + args)])]))
made (time (fib 34)) 49.5s.But generally, these kind of tuning for microbenchmarks only have small effect in real applications, for microbenchmarks magnifies inefficiency of very specific parts.
(Afterthought: It won't be too difficult to write a Scheme macro that expands variable-arity type-dispatching functions like _+ or _+ into case-lambda form. Then this kind of optimization can be applied without cluttering the source too much.)