This is one of my first useful scripts. I run it on a highlighted text selection within Notepad++, but I've provided a sample formula and commented out the Scintilla-specific references.
I'm a rank amateur: Am I using good idiom? Is my approach to a state machine sensible?
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
EVENTS = 11
STATES = 7
state_table = [[[0,0] for x in range(EVENTS)] for x in range(STATES)]
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table[0] = [[1,0], [0,1], [5,0], [0,3], [0,4], [0,5], [2,0], [0,0], [6,0], [0,0], [0,0]] # normal state
state_table[1] = [[0,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0]] # double-quote comment
state_table[2] = [[2,0], [2,0], [2,0], [2,0], [2,0], [2,0], [3,0], [0,0], [2,0], [2,0], [2,0]] # inside bracketed table reference
state_table[3] = [[3,0], [3,0], [3,0], [3,0], [3,0], [3,0], [4,0], [2,0], [3,0], [3,0], [3,0]] # inside double-bracketed table reference
state_table[4] = [[4,0], [4,0], [4,0], [4,0], [4,0], [4,0], [-1,0], [3,0], [4,0], [4,0], [4,0]] # inside triple-bracketed table reference (I don't think this exists)
state_table[5] = [[1,2], [0,2], [5,2], [0,0], [0,2], [0,5], [2,2], [0,2], [0,2], [0,2], [0,2]] # found left-paren; only wrap and insert if not empty, like =row()
state_table[6] = [[6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [0,0], [6,0]] # inside curly-braced array
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
for a in range(len(event_list)):
if z in event_list[a]:
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]
break
else:
takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any event
if current_state == -1:
current_state = 5 # -1 is an error and should be handled with some sort of error operation
if takeaction == 0: # just pass the characters along
output_text += z
elif takeaction == 1: # place character at same depth as only one on line
output_text += NEW_LINE + TAB_CHAR*paren_depth + z + NEW_LINE + TAB_CHAR*paren_depth
elif takeaction == 2: # previous character was a left paren and this one is not a right-paren
paren_depth = paren_depth + 1
output_text += NEW_LINE + TAB_CHAR*paren_depth + z
elif takeaction == 3: # character is a right paren not immediately following a left paren
paren_depth = paren_depth - 1
output_text += NEW_LINE + TAB_CHAR*paren_depth + z
elif takeaction == 4: # character is a comma, so start a new line at the paren depth
output_text += z + NEW_LINE + TAB_CHAR*paren_depth
elif takeaction == 5: # strip whitespace from the input outside of quotes
pass
else:
pass # should raise error, since the state table includes more actions than are defined
# editor.replaceSel(output_text)
print(output_text)
My code, without sample, is on Gisthub.
-
\$\begingroup\$ I just found this question that seems related and that I intend to look through: stackoverflow.com/questions/5492980/… \$\endgroup\$Dane– Dane2014年03月19日 21:06:39 +00:00Commented Mar 19, 2014 at 21:06
2 Answers 2
Josay's suggestions are good; I only mention where I have a different idea:
The way you create
state_table
is a bit of an anti-idiom, but in this case I see a benefit too: seeing the number instate_table[2] = ...
improves the readability of the state machine as a whole. However, initializing the variable to the nested list comprehension is redundant. I suggest changingstate_table
to a dictionary. The rest of the code works as is if you usestate_table = {}
Use
zip
to iterate over two lists in parallel. Instead of thisfor a in range(len(event_list)): if z in event_list[a]: takeaction = state_table[current_state][a][1] current_state = state_table[current_state][a][0]
do this (using also sequence unpacking)
for event, next_state in zip(event_list, state_table[current_state]): if z in event: current_state, takeaction = next_state
...except in this case don't. Instead, avoid the inner loop altogether by first creating a mapping
char_to_event = {char: i for i, event_chars in enumerate(event_list) for char in event_chars}
which allows you to replace the loop (including the else clause) with this:
event = char_to_event.get(z, -1) current_state, takeaction = state_table[current_state][event]
You can avoid the lengthy
elif
chains with dictionary lookup and string formatting. Given these dictionariesaction_table = { 0: "{z}", 1: "{newline}{indent}{z}{newline}{indent}", 2: "{newline}{indent}{z}", 3: "{newline}{indent}{z}", 4: "{z}{newline}{indent}" } depth_table = { 2: +1, 3: -1 }
the
if takeaction ... elif
statement can be replaced with this:paren_depth += depth_table.get(takeaction, 0) output_text += action_table.get(takeaction, "").format( z=z, indent=TAB_CHAR*paren_depth, newline=NEW_LINE)
-
\$\begingroup\$ Nice trick! I didn't see that coming. \$\endgroup\$SylvainD– SylvainD2014年03月19日 18:07:51 +00:00Commented Mar 19, 2014 at 18:07
-
\$\begingroup\$ Sweet. Now, why shouldn't I go a step further and eliminate the
event = ...
line:current_state, takeaction = state_table[current_state][char_to_event.get(z, -1)]
\$\endgroup\$Dane– Dane2014年03月19日 19:01:33 +00:00Commented Mar 19, 2014 at 19:01 -
\$\begingroup\$ @Dane You can do that too. It's a choice between an extra line+variable and a longer line. \$\endgroup\$Janne Karila– Janne Karila2014年03月19日 19:15:52 +00:00Commented Mar 19, 2014 at 19:15
-
\$\begingroup\$ @Dane Added a way to skip the long
elif
chain. \$\endgroup\$Janne Karila– Janne Karila2014年03月21日 17:18:29 +00:00Commented Mar 21, 2014 at 17:18
You can use list unpacking to transform :
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]
into the more concise :
current_state, takeaction = state_table[current_state][a]
. Also, it shows that the type you should be using to convey current_state
and takeaction
should probably be something more like a tuple than like a list.
The pythonic way to loop is not to use range
and len
but just to go through your iterable with for i in my_iterable:
. If you do need the index, enumerate is what you need.
for a in range(len(event_list)):
if z in event_list[a]:
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]
becomes :
for i,event in enumerate(event_list):
if z in event:
current_state, takeaction = state_table[current_state][i]
break
Remove whatever code is not needed.
elif takeaction == 5: # strip whitespace from the input outside of quotes
pass
else:
pass # should raise error, since the state table includes more actions than are define
is roughly the same as nothing. If you want to ensure the value is correct, you can use assert.
From PEP 8 :
For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.
Remove duplicated logic :
takeaction = state_table[current_state][a][1]
current_state = state_table[current_state][a][0]
and
takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any event
are quite similar and probably can be factorised out.
Do not repeat yourself (bis) : you do not need to define the state table twice. Just do it once and for all and everything should be ok :
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]
Final result :
This is what my code is like at the end. I am too lazy to change the string concatenations for a better solution. Also, I'm still not quite happy with the multiple if
statements but I have nothing better to suggest at the moment.
#!/usr/bin/python
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table = [
[(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
[(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
[(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
[(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
[(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
[(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
[(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
for i,event in enumerate(event_list):
if z in event:
break
else:
i = -1
current_state, takeaction = state_table[current_state][i]
if current_state == -1:
current_state = 5 # -1 is an error and should be handled with some sort of error operation
if takeaction == 0: # just pass the characters along
output_text += z
elif takeaction == 1: # place character at same depth as only one on line
output_text += NEW_LINE + TAB_CHAR*paren_depth + z + NEW_LINE + TAB_CHAR*paren_depth
elif takeaction == 2: # previous character was a left paren and this one is not a right-paren
paren_depth = paren_depth + 1
output_text += NEW_LINE + TAB_CHAR*paren_depth + z
elif takeaction == 3: # character is a right paren not immediately following a left paren
paren_depth = paren_depth - 1
output_text += NEW_LINE + TAB_CHAR*paren_depth + z
elif takeaction == 4: # character is a comma, so start a new line at the paren depth
output_text += z + NEW_LINE + TAB_CHAR*paren_depth
else:
assert(takeaction == 5) # strip whitespace from the input outside of quotes
pass # should raise error, since the state table includes more actions than are defined
# editor.replaceSel(output_text)
print(output_text)
-
\$\begingroup\$ This is terrific. Thank you!
enumerate()
was definitely something I've been looking for, but didn't know how to find it (or even that it existed). Also, I should probably think of tuples more. On that note: shouldn't the whole state_table become nested tuples? Also the event_list? \$\endgroup\$Dane– Dane2014年03月19日 13:41:42 +00:00Commented Mar 19, 2014 at 13:41 -
\$\begingroup\$ Reusing Raymond Hettinger, I usually try to go for "Loopy lists and structy tuples". In your case, state_table and event_list are things you are iterating on so list makes perfect sense to me. \$\endgroup\$SylvainD– SylvainD2014年03月19日 13:49:18 +00:00Commented Mar 19, 2014 at 13:49
-
\$\begingroup\$ Hmm, perhaps, but on the other hand, my state_table should not be altered, so the immutable quality of tuples is perhaps more appropriate for the whole thing. Either way, I've learned several valuable lessons here. \$\endgroup\$Dane– Dane2014年03月19日 14:25:54 +00:00Commented Mar 19, 2014 at 14:25
-
\$\begingroup\$ As per stackoverflow.com/questions/1708510/… different people will give you different answers, some based on mutability, some based on the way you want to use your container (position has relevance). \$\endgroup\$SylvainD– SylvainD2014年03月19日 14:58:13 +00:00Commented Mar 19, 2014 at 14:58