3
\$\begingroup\$

The below is to parse a lisp expression (doing as much as possible in 'one go'). How does it look, and what can be improved?

# goal: capture the next token then get the rest of the line
# to be used in a while-loop/yield
tokenizer = re.compile(r"""
 \s* # any amount of whitespace...
 
 # 1. capture group one: token
 (
 ,@ # special token ,@ ...
 |[(),`'] # or ) ( , ' ` ...
 |"(?:[^\\"]*(?:\\.)*)*" # or match on string (unrolling the loop)...
 |;.* # or comment-anything...
 |[^\s('"`,;)]* # or non-special...
 )
 # 2. capture group two: rest-of-line
 (.*) 
""", re.VERBOSE)

Example run (python):

line = '(define (square x) (* x x))'
while line:
 token, line = tokenizer.match(line).groups()
 print (token)
asked Apr 4, 2021 at 3:45
\$\endgroup\$
2
  • 1
    \$\begingroup\$ Typically, a lexer will complain if given invalid inputs. Yours will tokenize anything. Perhaps it would help for you to clarify the goal(s) of this code. \$\endgroup\$ Commented Apr 4, 2021 at 22:09
  • \$\begingroup\$ Your (unrolling the loop) part is wrong and should be: "[^\\"]*(?:\\.[^\\"]*)*" \$\endgroup\$ Commented Apr 18, 2021 at 12:25

1 Answer 1

-1
\$\begingroup\$

It's difficult for me to say whether this is well-written or not since there are no tests, and that's fundamentally your biggest problem. A complex regex like this couldn't represent a better subject for unit testing. It's already well-isolated, it's important, somewhat internally complex, and would benefit from spelling out exactly which inputs and outputs you expect. Include in your tests as many edge cases as you can think of, and also the "good" (inputs that you expect to successfully parse) as well as the "bad" (inputs that you expect should fail to parse in expected ways).

The regex itself doesn't seem crazy. I find the lack of ^ and $ suspicious. If you accidentally send this through a search instead of a match, you leave yourself open to false positives.

answered Apr 4, 2021 at 23:29
\$\endgroup\$

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.