Skip to main content
Code Review

Return to Answer

Bounty Awarded with 50 reputation awarded by Community Bot
added 254 characters in body
Source Link
MaLiN2223
  • 1.1k
  • 5
  • 11

Instead of returning None I'd raise StopIterationError . It is because it(削除) Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator. (削除ここまで)

As @Maarten Fabré pointed out, this is the usual thing to do when you want to stop a generatorno longer valid. Instead, just use return (see Explanation of generators, iterators, and StopIteration )

Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator.

(削除) Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator. (削除ここまで)

As @Maarten Fabré pointed out, this is no longer valid. Instead, just use return (see Explanation of generators, iterators, and StopIteration )

Source Link
MaLiN2223
  • 1.1k
  • 5
  • 11

Note: I'm assuming that this code works and you seek only 'code quality' improvements. Also, you seem like experienced programmer so I will try not to explain trivial things but please, do ask if you think I didn't state my opinion clearly enough.

Code notes:

General

No docs on any method or class. While methods are descriptive one might have trouble with for example, what is being updated in Location.update, it might be a name update as well.

In my opinion both readchar and readtoken should be wrapped into some class - a Reader maybe? Advantages of that are (at least) two: you do not clutter global namespace with functions and you could store both inputchannel, state and location as a class variables.

Imports

import collections
import io
import sys

I wouldn't import whole packages. Try to using from X import Y pattern, it is both more descriptive (reader can see why are you using those packages) and also you don't have to repeat yourself (as you do with sys.stdout). However, this is just a matter of preference.

Global (file) variables

I tend to move them into separate file and import them. It is mainly because of python's zen:

Namespaces are one honking great idea -- let's do more of those!

Moving your constant's away from the usage helps in the future if you would need to make them configurable - it separates 'implementation' from the usage.

Also, (if you happen to use python 3.4+) it might be beneficial to use enums instead. In addition to creating new 'namespace' it also provides a very important feature- your globals will be constant.

Location's init

I must say that I love a fact that you used name or "<input>", people tend to check for None with if too much.

Location's update

The c variable is not very descriptive, what you probably meant was 'character'/'char' ?

Location's repr and str

I personally don't like when people use str.format method it has no advantage over "Location({}, {}, {}, {})".format but is longer. However, if you target your classes for python 3.6+ why not using something even more readable: string interpolation.

Location's readchar

Instead of returning None I'd raise StopIterationError. It is because it is the usual thing to do when you want to stop a generator.

readtoken

  • In my opinion, this method is too long, I'd split it based on state, after each state is ... you'd call different function.
  • Depending on your needs it might be useful to use linesp instead of "\n".
  • Do not raise generic Exception, more on that later.
  • As previously, do not return none - raise an exception instead.

Lexer's init

  • In this case a documentation would be lovely, particularly to understand what inputchannel is and what API must it comply to.

  • In Location("<input>", 0, 1, 0) it would be good to name the parameters that you pass so reader wouldn't need to search for signature: Location(name="<input>", pos=0, line=1, col=0).

  • What does for token in Lexer(sys.stdin): mean? Will the reader understand what does it mean to 'iterate over lexer' ? My preference would be to rearrange your class to something like for token in lexer.Tokenize(sys.stdin).

Questions

Q1

You are right, recursion is generally discouraged in python. In general, what I saw in similar use cases people do tend to scan char by char as you did. I don't have much experience in writing lexers so I am not sure if my advice will be at all helpful but let's try:
  • Small functions are your friend - the more logic you abstract the more complexity you can put in. For example, instead of doing x == '[' try isLeftBracket(x), in the future you might introduce isBrachet which will call isLeftBracet or isRightBracket. This is just a silly example but I hope you get the point.
  • Instead of recursion how about using a stack of states? It is a general practice to change recursive code to iterative one.
  • You could abstract your class to a state machine (basically wrapping everything as I mentioned at the beginning), this will help in managing the state and include many events as functions.
  • First consume and then validate multiple characters given a state. For example, if you know that if current state is A and you encounter character [, next few characters are going to be some specific data

you could do this:

if state == A and current_character == '[':
 consume_dataX()
def consume_dataX():
 a = ''
 while condition(current_character):
 a += readchar()
 current_character = readchar()
 validate(a) 

Real life example would be parsing a function (foo(int x, int y)):

if state == ParsingFunction and current_character == "(":
 argument_vector = consume_until(')')

Q2

It is completely fine to think of characters as a length 1 strings. You can think of it like this: for a given string `a` operator `[]` is a slicing operator which means that `a[1]` slices the string and is short for `a[1:2]`. Also, keep in mind that in reality `char` type is just a `byte` (or something similar) therefore, there is no need for a special treatment.

Q3

I cannot stress this enough, in my opinion: **DO NOT RAISE GENERIC EXCEPTIONS** unless you have completely no idea what happened.

Person using your class will have to do the following:

try:
 # some code
except Exception as ex:
 # handle

which is inconvenient for many reason. If you want to know more see this SO question.

Q4

Already answered above.
lang-py

AltStyle によって変換されたページ (->オリジナル) /