I'm working on my first Clojure project, which is a parser. Since it's my first project I'm unsure of many aspects, from my general approach to solving this problem to the small implementation details. I also don't know anyone proficient in Clojure so any guidance is welcome.
In higher terms, my implementation will take a string (which is one message to be parsed) and break it into fields. Each field it will apply some function to (often a dictionary lookup), and finally return a dictionary of results.
(use 'clojure.string)
;; This defines the breakdown of each line in the file. The line contains an entry and start with something
;; like "1AXYZ ". In this case the first two characters ("1A") are the "Record Type" and the next four
;; ("XYZ ") are the "Entering Firm". This data structure holds that information.
(def parser
{:shared '( ; These are the entries shared amongst all types of message
;; Field-Name num-chars
Record-Type 2
Entering-Firm 4
Clearing-Number 4
Symbol 11
Lot-Indicator 1
;;...
)
:1A '( ; this is one type of message, indicated with record-type=1A
;; Field-Name num-chars
Filler 9
Poss-Dupe 1
Booth-Booked-Indicator
Order-Time 6
Order-Date 8
;; ...
)
}
)
;; if no implementation is found for a field, just return a map to its contents.
(defn default [header x]
{(str header) (trim (join x))})
;; most of the fields that dont use the default function will just be a dictionary lookup
(defn Poss-Dupe [[x]] (default 'Poss-Dupe ({1円 "Odd Lot" \D "Rule 902" \E "Poss Dupe Rule 902"
\F "Auto Execution" \G "Poss Dupe of Auto-Execution"
\H "Non-auto Execution" \I "Poss Dupe of Non-auto Execution"
0円 "" \space ""}
x (str "ERROR: " x))))
(defn Booth-Booked-Indicator [[x]] (default 'Booth-Booked-Indicator ({0円 "Post" 1円 "Booth" 2円 "Booked"} x (str "ERROR: " x))))
(defn Order-Time [x] (default 'Order-Time (->> x (partition 2) (map join) (join ":") ))) ; "HHMMSS" -> "HH:MM:SS"
(defn Order-Date [x] (default 'Order-Date
(join "-" (map join (list (take 4 x) (take 2 (drop 4 x)) (drop 6 x)))))) ; "YYYYMMDD"->"YYYY-MM-DD"
(defn Filler [x] nil)
(defn try-call [f subseq]
(try
((eval f) subseq)
(catch Exception e ; f not defined -> call default
(default f subseq))))
;; Where the recursion happens
(defn grab-call [[f n & rest-instructions] line]
(let [[field rest-line] (split-at n line)]
(if (or (empty? line) (not f))
nil
(merge (try-call f field) (grab-call rest-instructions rest-line)))))
;;; Entry point
(defn parse-line [line]
(grab-call (concat (:shared parser)
((keyword (subs line 0 2)) parser))
line ))
If you care to know the particulars of what is being parsed, that can be found at NYSE MRO specs.
2 Answers 2
There is a lot of weird stuff here.
- It's strange to have no
ns
form for defining your namespace and requiring the libraries you need. Putting a bareuse
at the top is gross. - Don't
use
an entire namespace. Eitherrequire
with an alias, like(:require [clojure.string :as s]) ... (s/join ...)
, or elserequire
andrefer
the specific stuff you use: `(:require [clojure.string :refer [join trim]]). - The uppercase function names should be lowercase:
poss-dupe
, or evenpossible-duplicate
, notPoss-Dupe
- There should be a newline after the parameter definitions for most functions:
(defn foo [x] \newline (...))
- The
eval
is very suspicious, but I don't want to read your whole program and figure out what you should be doing instead. Probablyns-resolve
or something. (if (or (empty? x) f) nil y)
is a bad way to write(when (and (seq x) f) y)
- Usually calling
keyword
to look stuff up in a map suggests that you shouldn't be using keywords as keys, but in your case it looks fine-ish. However, instead of((keyword whatever) m)
, it would be more readable to write(get m (keyword whatever))
.
-
\$\begingroup\$ Thanks for the tips!
ns-resolve
does exactly what I was usingeval
for--why is it preferred in general? And are you still (as you were in IRC) of the opinion that I should have an explicit map rather than just a function lookup? I was going for minimal boilerplate, but I don't want to sacrifice readability for that... \$\endgroup\$ari– ari2015年06月09日 20:46:51 +00:00Commented Jun 9, 2015 at 20:46 -
\$\begingroup\$ Yes. Otherwise you have to give your functions crappy names like Poss-Dupe, and you run the risk of executing arbitrary code if someone puts an
eval
into their CSV file or whatever it is you're parsing. \$\endgroup\$amalloy– amalloy2015年06月09日 20:49:37 +00:00Commented Jun 9, 2015 at 20:49 -
\$\begingroup\$ Haha, the field is called "Poss Dupe" in the original NYSE spec. \$\endgroup\$ari– ari2015年06月09日 21:33:06 +00:00Commented Jun 9, 2015 at 21:33
-
\$\begingroup\$ Yes, exactly. Your scheme involving eval and ns-resolve means you have to give your functions names as crappy as those in the nyse spec, instead of functions with good names and a mapping from crappy names to good ones. \$\endgroup\$amalloy– amalloy2015年06月09日 21:33:59 +00:00Commented Jun 9, 2015 at 21:33
-
\$\begingroup\$ Ahh, that's a good point \$\endgroup\$ari– ari2015年06月09日 21:35:28 +00:00Commented Jun 9, 2015 at 21:35
Some points on idiomatic Clojure:
- Do not return
nil
from if statements:(if condition? nil false-branch)
->(if-not condition? false-branch)
defn
bodies in the next line unless they are truly one liners (not like inBooth-Booked-Indicator
.- Don't use
eval
but resolve the function and call it directly as(f)
.
Conceptually:
- If you want to define your own types use
defrecord
anddeftype
(defrecord
in this case).Poss-Dupe
,Order-Time Order-Date
, etc. should be records, and it might be cleaner if they use the samefn
bindings, instead of[[x]]
and[x]
, only one of the two. - Instead of defining of passing around a function, evaluating it, checking for a
NullPointerException
, and then going to the default implementation, consider using a protocol. For example,IParse (parse [f])
. You can then implementparse
for your different types and leave a default implementation. Thentry-call
will be just reduced to-parse
.
-
\$\begingroup\$
(eval f)
->(f)
is a totally wrong suggestion. Getting rid of the eval is great, but you completely change what the function does. Records and protocols don't make a lot of sense to me here, but I guess I don't understand the problem enough to say for sure that it is wrong. \$\endgroup\$amalloy– amalloy2015年06月09日 18:06:33 +00:00Commented Jun 9, 2015 at 18:06 -
\$\begingroup\$ I don't know what you mean by "call it directly as (f)"--the following dummy example gives an error:
(def foo 'bar) (defn bar [] (println "bar")) (foo)
\$\endgroup\$ari– ari2015年06月09日 20:48:13 +00:00Commented Jun 9, 2015 at 20:48