Return to Answer

Bounty Awarded with 50 reputation awarded by Community Bot

occurred Sep 22, 2018 at 18:00

added 254 characters in body

Source Link

edited Sep 18, 2018 at 10:17

MaLiN2223

edited Sep 18, 2018 at 10:17

MaLiN2223

1.1k
5
11

Instead of returning None I'd raise StopIterationError . It is because it~~(削除) Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator. (削除ここまで)~~

As @Maarten Fabré pointed out, this is the usual thing to do when you want to stop a generatorno longer valid. Instead, just use return (see Explanation of generators, iterators, and StopIteration )

Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator.

~~(削除) Instead of returning None I'd raise StopIterationError . It is because it is the usual thing to do when you want to stop a generator. (削除ここまで)~~

As @Maarten Fabré pointed out, this is no longer valid. Instead, just use return (see Explanation of generators, iterators, and StopIteration )

Source Link

answered Sep 14, 2018 at 22:29

MaLiN2223

answered Sep 14, 2018 at 22:29

MaLiN2223

1.1k
5
11

Note: I'm assuming that this code works and you seek only 'code quality' improvements. Also, you seem like experienced programmer so I will try not to explain trivial things but please, do ask if you think I didn't state my opinion clearly enough.

Code notes:

General

No docs on any method or class. While methods are descriptive one might have trouble with for example, what is being updated in Location.update, it might be a name update as well.

In my opinion both readchar and readtoken should be wrapped into some class - a Reader maybe? Advantages of that are (at least) two: you do not clutter global namespace with functions and you could store both inputchannel, state and location as a class variables.

Imports

import collections
import io
import sys

I wouldn't import whole packages. Try to using from X import Y pattern, it is both more descriptive (reader can see why are you using those packages) and also you don't have to repeat yourself (as you do with sys.stdout). However, this is just a matter of preference.

Global (file) variables

I tend to move them into separate file and import them. It is mainly because of python's zen:

Namespaces are one honking great idea -- let's do more of those!

Moving your constant's away from the usage helps in the future if you would need to make them configurable - it separates 'implementation' from the usage.

Also, (if you happen to use python 3.4+) it might be beneficial to use enums instead. In addition to creating new 'namespace' it also provides a very important feature- your globals will be constant.

Location's init

I must say that I love a fact that you used name or "<input>", people tend to check for None with if too much.

Location's update

The c variable is not very descriptive, what you probably meant was 'character'/'char' ?

Location's repr and str

I personally don't like when people use str.format method it has no advantage over "Location({}, {}, {}, {})".format but is longer. However, if you target your classes for python 3.6+ why not using something even more readable: string interpolation.

Location's readchar

Instead of returning None I'd raise StopIterationError. It is because it is the usual thing to do when you want to stop a generator.

readtoken

In my opinion, this method is too long, I'd split it based on state, after each state is ... you'd call different function.
Depending on your needs it might be useful to use linesp instead of "\n".
Do not raise generic Exception, more on that later.
As previously, do not return none - raise an exception instead.

Lexer's init

In this case a documentation would be lovely, particularly to understand what inputchannel is and what API must it comply to.
In Location("<input>", 0, 1, 0) it would be good to name the parameters that you pass so reader wouldn't need to search for signature: Location(name="<input>", pos=0, line=1, col=0).
What does for token in Lexer(sys.stdin): mean? Will the reader understand what does it mean to 'iterate over lexer' ? My preference would be to rearrange your class to something like for token in lexer.Tokenize(sys.stdin).

Questions

Q1

You are right, recursion is generally discouraged in python. In general, what I saw in similar use cases people do tend to scan char by char as you did. I don't have much experience in writing lexers so I am not sure if my advice will be at all helpful but let's try:

Small functions are your friend - the more logic you abstract the more complexity you can put in. For example, instead of doing x == '[' try isLeftBracket(x), in the future you might introduce isBrachet which will call isLeftBracet or isRightBracket. This is just a silly example but I hope you get the point.
Instead of recursion how about using a stack of states? It is a general practice to change recursive code to iterative one.
You could abstract your class to a state machine (basically wrapping everything as I mentioned at the beginning), this will help in managing the state and include many events as functions.
First consume and then validate multiple characters given a state. For example, if you know that if current state is A and you encounter character [, next few characters are going to be some specific data

you could do this:

if state == A and current_character == '[':
 consume_dataX()
def consume_dataX():
 a = ''
 while condition(current_character):
 a += readchar()
 current_character = readchar()
 validate(a)

Real life example would be parsing a function (foo(int x, int y)):

if state == ParsingFunction and current_character == "(":
 argument_vector = consume_until(')')

Q2

It is completely fine to think of characters as a length 1 strings. You can think of it like this: for a given string `a` operator `[]` is a slicing operator which means that `a[1]` slices the string and is short for `a[1:2]`. Also, keep in mind that in reality `char` type is just a `byte` (or something similar) therefore, there is no need for a special treatment.

Q3

I cannot stress this enough, in my opinion: **DO NOT RAISE GENERIC EXCEPTIONS** unless you have completely no idea what happened.

Person using your class will have to do the following:

try:
 # some code
except Exception as ex:
 # handle

which is inconvenient for many reason. If you want to know more see this SO question.

Q4

Already answered above.

lang-py