I wrote a simple Boot script for converting to and from Roman numerals using instaparse. Here's my EBNF grammar (roman.bnf
):
number = thousands hundreds tens ones
I = 'I'
V = 'V'
X = 'X'
L = 'L'
C = 'C'
D = 'D'
M = 'M'
IV = 'IV'
IX = 'IX'
XL = 'XL'
XC = 'XC'
CD = 'CD'
CM = 'CM'
ones = V? I? I? I? | IV | IX
tens = L? X? X? X? | XL | XC
hundreds = D? C? C? C? | CD | CM
thousands = M? M? M?
And here's the Boot script itself (roman.clj
):
#!/usr/bin/env boot
(set-env! :dependencies '[[instaparse "1.4.5"]])
(require '[boot.cli :as cli]
'[instaparse.core :as insta])
(defn digits
"Returns a lazy sequence of the digits of n in the specified base, starting
with the least significant digit."
[base n]
(->> (iterate #(quot % base) n)
(take-while pos?)
(map #(mod % base))))
(defn int->roman
"Returns the Roman numeral representation of n."
[n]
(let [[ones tens hundreds thousands] (concat (digits 10 n) (repeat 0))]
(->> (letfn [(digit [one five ten n]
(case n
4 [one five]
9 [one ten]
(cons (when (<= 5 n) five)
(repeat (mod n 5) one))))]
[(repeat thousands "M")
(digit "C" "D" "M" hundreds)
(digit "X" "L" "C" tens)
(digit "I" "V" "X" ones)])
(apply concat)
(apply str))))
(def parser (insta/parser (slurp "roman.bnf")))
(def symbols
{:I 1 :V 5 :X 10 :L 50 :C 100 :D 500 :M 1000
:IV 4 :IX 9 :XL 40 :XC 90 :CD 400 :CM 900})
(defn roman->int
"Converts the Roman numeral string s to an integer."
[s]
(let [parsed (parser s)]
(if-some [failure (insta/get-failure parsed)]
failure
(->> (rest parsed)
(mapcat rest)
(map first)
(map symbols)
(apply +)))))
(cli/defclifn -main
"Converts to and from Roman numerals."
[t to ARABIC int "The number to convert to a Roman numeral."
f from ROMAN str "The number to convert to an Arabic numeral."]
(cond
(and to from) (println "Please choose only one conversion.")
to (println (if (< 0 to 4000) (int->roman to) "Out of range."))
from
(let [result (roman->int from)]
((if (insta/failure? result) print println) result))))
Here are some examples (test.sh
):
./roman.clj -t 1
./roman.clj -t 52
./roman.clj -t 294
./roman.clj -t 2780
./roman.clj -f IX
./roman.clj -f LXIX
./roman.clj -f CCXCV
./roman.clj -f MMDCLXXXII
$ chmod +x roman.clj
$ chmod +x test.sh
$ ./test.sh
I
LII
CCXCIV
MMDCCLXXX
9
69
295
2682
Obviously this is completely impractical due to Clojure's startup time, but meh.
Are there any improvements I can make? I'm especially interested in ways to make the grammar more elegant; for instance, I can't seem to find a good way to remove the repetition in I? I? I?
etc., but if there is a way to genericize the maximum number of occurrences, that would be cool. Any improvements on the Clojure side of things would also be most welcome.
1 Answer 1
Looking at int->roman
...
- It simply ignores digits that are too big.
10000
maps to""
! Surely better to repeat the biggest signifier as often as necessary. - The logic for the
one five ten
units fordigit
is in your head. We can derive them from a list of thesymbols
in increasing order. And it's easier to wrap them as a single argument todigit
. - I've done the string concatenation within
digit
. - I've dumped the
base
argument and done the digit-ising inline.
For what it's worth ...
(def symbols "IVXLCDM")
(defn digit [[one five ten] n]
(let [chars (case n
4 [one five]
9 [one ten]
(cons (when (<= 5 n) five)
(repeat (mod n 5) one)))]
(apply str chars)))
(defn int->roman
"Returns the Roman numeral representation of n."
[n]
{:pre [(integer? n) (pos? n) (odd? (count symbols))]}
(let [triples (partition 3 2 symbols)
heads (->> n (iterate #(quot % 10)))
digits (map (fn [t i] (digit t (mod i 10))) triples heads)
left-over (nth heads (count triples))
left-overs (->> (last symbols)
(repeat left-over)
(apply str))]
left-over
(->> (concat digits left-overs)
reverse
(apply str))))
For example,
(int->roman 11976)
;"MMMMMMMMMMMCMLXXVI"
It could be made faster as a big fat reduce
, but I prefer it slow and fairly clear. It gave me enough trouble!
-
\$\begingroup\$ Thanks! Your changes to
digit
are particularly helpful. I was (and still am) hesitant to convert the higher digits toM
because I have seen specs for the Roman numeral system that forbid repeating any symbol more than three times in a row, and also because it can lead to memory overflow if not used very carefully. \$\endgroup\$Sam Estep– Sam Estep2017年04月10日 18:02:56 +00:00Commented Apr 10, 2017 at 18:02