10
\$\begingroup\$

This is one of my first useful scripts. I run it on a highlighted text selection within Notepad++, but I've provided a sample formula and commented out the Scintilla-specific references.

I'm a rank amateur: Am I using good idiom? Is my approach to a state machine sensible?

# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
EVENTS = 11
STATES = 7
state_table = [[[0,0] for x in range(EVENTS)] for x in range(STATES)]
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table[0] = [[1,0], [0,1], [5,0], [0,3], [0,4], [0,5], [2,0], [0,0], [6,0], [0,0], [0,0]] # normal state
state_table[1] = [[0,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0], [1,0]] # double-quote comment
state_table[2] = [[2,0], [2,0], [2,0], [2,0], [2,0], [2,0], [3,0], [0,0], [2,0], [2,0], [2,0]] # inside bracketed table reference
state_table[3] = [[3,0], [3,0], [3,0], [3,0], [3,0], [3,0], [4,0], [2,0], [3,0], [3,0], [3,0]] # inside double-bracketed table reference
state_table[4] = [[4,0], [4,0], [4,0], [4,0], [4,0], [4,0], [-1,0], [3,0], [4,0], [4,0], [4,0]] # inside triple-bracketed table reference (I don't think this exists)
state_table[5] = [[1,2], [0,2], [5,2], [0,0], [0,2], [0,5], [2,2], [0,2], [0,2], [0,2], [0,2]] # found left-paren; only wrap and insert if not empty, like =row()
state_table[6] = [[6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [6,0], [0,0], [6,0]] # inside curly-braced array
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
 for a in range(len(event_list)):
 if z in event_list[a]:
 takeaction = state_table[current_state][a][1]
 current_state = state_table[current_state][a][0]
 break
 else:
 takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
 current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any event
 if current_state == -1:
 current_state = 5 # -1 is an error and should be handled with some sort of error operation
 if takeaction == 0: # just pass the characters along
 output_text += z
 elif takeaction == 1: # place character at same depth as only one on line
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z + NEW_LINE + TAB_CHAR*paren_depth
 elif takeaction == 2: # previous character was a left paren and this one is not a right-paren
 paren_depth = paren_depth + 1
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z
 elif takeaction == 3: # character is a right paren not immediately following a left paren
 paren_depth = paren_depth - 1
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z
 elif takeaction == 4: # character is a comma, so start a new line at the paren depth
 output_text += z + NEW_LINE + TAB_CHAR*paren_depth
 elif takeaction == 5: # strip whitespace from the input outside of quotes
 pass
 else:
 pass # should raise error, since the state table includes more actions than are defined
# editor.replaceSel(output_text)
print(output_text)

My code, without sample, is on Gisthub.

Jamal
35.2k13 gold badges134 silver badges238 bronze badges
asked Mar 18, 2014 at 20:25
\$\endgroup\$
1
  • \$\begingroup\$ I just found this question that seems related and that I intend to look through: stackoverflow.com/questions/5492980/… \$\endgroup\$ Commented Mar 19, 2014 at 21:06

2 Answers 2

5
\$\begingroup\$

Josay's suggestions are good; I only mention where I have a different idea:

  • The way you create state_table is a bit of an anti-idiom, but in this case I see a benefit too: seeing the number in state_table[2] = ... improves the readability of the state machine as a whole. However, initializing the variable to the nested list comprehension is redundant. I suggest changing state_table to a dictionary. The rest of the code works as is if you use

    state_table = {}
    
  • Use zip to iterate over two lists in parallel. Instead of this

    for a in range(len(event_list)):
     if z in event_list[a]:
     takeaction = state_table[current_state][a][1]
     current_state = state_table[current_state][a][0]
    

    do this (using also sequence unpacking)

    for event, next_state in zip(event_list, state_table[current_state]):
     if z in event:
     current_state, takeaction = next_state
    
  • ...except in this case don't. Instead, avoid the inner loop altogether by first creating a mapping

    char_to_event = {char: i for i, event_chars in enumerate(event_list) 
     for char in event_chars}
    

    which allows you to replace the loop (including the else clause) with this:

    event = char_to_event.get(z, -1)
    current_state, takeaction = state_table[current_state][event]
    
  • You can avoid the lengthy elif chains with dictionary lookup and string formatting. Given these dictionaries

    action_table = {
     0: "{z}",
     1: "{newline}{indent}{z}{newline}{indent}",
     2: "{newline}{indent}{z}",
     3: "{newline}{indent}{z}",
     4: "{z}{newline}{indent}"
    }
    depth_table = {
     2: +1,
     3: -1
    }
    

    the if takeaction ... elif statement can be replaced with this:

    paren_depth += depth_table.get(takeaction, 0)
    output_text += action_table.get(takeaction, "").format(
     z=z, 
     indent=TAB_CHAR*paren_depth,
     newline=NEW_LINE)
    
answered Mar 19, 2014 at 18:02
\$\endgroup\$
4
  • \$\begingroup\$ Nice trick! I didn't see that coming. \$\endgroup\$ Commented Mar 19, 2014 at 18:07
  • \$\begingroup\$ Sweet. Now, why shouldn't I go a step further and eliminate the event = ... line: current_state, takeaction = state_table[current_state][char_to_event.get(z, -1)] \$\endgroup\$ Commented Mar 19, 2014 at 19:01
  • \$\begingroup\$ @Dane You can do that too. It's a choice between an extra line+variable and a longer line. \$\endgroup\$ Commented Mar 19, 2014 at 19:15
  • \$\begingroup\$ @Dane Added a way to skip the long elif chain. \$\endgroup\$ Commented Mar 21, 2014 at 17:18
7
\$\begingroup\$

You can use list unpacking to transform :

 takeaction = state_table[current_state][a][1]
 current_state = state_table[current_state][a][0]

into the more concise :

 current_state, takeaction = state_table[current_state][a]

. Also, it shows that the type you should be using to convey current_state and takeaction should probably be something more like a tuple than like a list.


The pythonic way to loop is not to use range and len but just to go through your iterable with for i in my_iterable:. If you do need the index, enumerate is what you need.

 for a in range(len(event_list)):
 if z in event_list[a]:
 takeaction = state_table[current_state][a][1]
 current_state = state_table[current_state][a][0]

becomes :

for i,event in enumerate(event_list):
 if z in event:
 current_state, takeaction = state_table[current_state][i]
 break

Remove whatever code is not needed.

elif takeaction == 5: # strip whitespace from the input outside of quotes
 pass
 else:
 pass # should raise error, since the state table includes more actions than are define

is roughly the same as nothing. If you want to ensure the value is correct, you can use assert.


From PEP 8 :

For example, do not rely on CPython's efficient implementation of in-place string concatenation for statements in the form a += b or a = a + b. This optimization is fragile even in CPython (it only works for some types) and isn't present at all in implementations that don't use refcounting. In performance sensitive parts of the library, the ''.join() form should be used instead. This will ensure that concatenation occurs in linear time across various implementations.


Remove duplicated logic :

 takeaction = state_table[current_state][a][1]
 current_state = state_table[current_state][a][0]

and

 takeaction = state_table[current_state][-1][1] # this sets takeaction to the value when the test character does not match any event
 current_state = state_table[current_state][-1][0] # this sets current_state to the value when the test character does not match any event

are quite similar and probably can be factorised out.


Do not repeat yourself (bis) : you do not need to define the state table twice. Just do it once and for all and everything should be ok :

state_table = [
 [(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
 [(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
 [(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
 [(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
 [(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
 [(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
 [(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]

Final result :

This is what my code is like at the end. I am too lazy to change the string concatenations for a better solution. Also, I'm still not quite happy with the multiple if statements but I have nothing better to suggest at the moment.

#!/usr/bin/python
# input_text = editor.getSelText()
input_text = '=HLOOKUP(TablePeriods[[#Headers],[YearMonth]],TablePeriods[[#All],[YearMonth]],MATCH([@Date],INDIRECT("TablePeriods[[#All],["&[@Account]&"]]"),-1))'
# initialize the state table
# event list and corresponding state table
event_list = [ "\"", "&", "(", ")", ",", "\n\r\t ", "[", "]", "{", "}" ] # last event is anything that doesn't match one of the other actions
state_table = [
 [(1,0), (0,1), (5,0), (0,3), (0,4), (0,5), ( 2,0), (0,0), (6,0), (0,0), (0,0)], # normal state
 [(0,0), (1,0), (1,0), (1,0), (1,0), (1,0), ( 1,0), (1,0), (1,0), (1,0), (1,0)], # double-quote comment
 [(2,0), (2,0), (2,0), (2,0), (2,0), (2,0), ( 3,0), (0,0), (2,0), (2,0), (2,0)], # inside bracketed table reference
 [(3,0), (3,0), (3,0), (3,0), (3,0), (3,0), ( 4,0), (2,0), (3,0), (3,0), (3,0)], # inside double-bracketed table reference
 [(4,0), (4,0), (4,0), (4,0), (4,0), (4,0), (-1,0), (3,0), (4,0), (4,0), (4,0)], # inside triple-bracketed table reference (I don't think this exists)
 [(1,2), (0,2), (5,2), (0,0), (0,2), (0,5), ( 2,2), (0,2), (0,2), (0,2), (0,2)], # found left-paren; only wrap and insert if not empty, like =row()
 [(6,0), (6,0), (6,0), (6,0), (6,0), (6,0), ( 6,0), (6,0), (6,0), (0,0), (6,0)], # inside curly-braced array
]
# initialize the state, parenthesis depth, and output text
current_state = 0
paren_depth = 0
output_text = ""
# define the tab and new line characters
TAB_CHAR = "\t"
NEW_LINE = "\r\n"
for z in input_text:
 for i,event in enumerate(event_list):
 if z in event:
 break
 else:
 i = -1
 current_state, takeaction = state_table[current_state][i]
 if current_state == -1:
 current_state = 5 # -1 is an error and should be handled with some sort of error operation
 if takeaction == 0: # just pass the characters along
 output_text += z
 elif takeaction == 1: # place character at same depth as only one on line
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z + NEW_LINE + TAB_CHAR*paren_depth
 elif takeaction == 2: # previous character was a left paren and this one is not a right-paren
 paren_depth = paren_depth + 1
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z
 elif takeaction == 3: # character is a right paren not immediately following a left paren
 paren_depth = paren_depth - 1
 output_text += NEW_LINE + TAB_CHAR*paren_depth + z
 elif takeaction == 4: # character is a comma, so start a new line at the paren depth
 output_text += z + NEW_LINE + TAB_CHAR*paren_depth
 else:
 assert(takeaction == 5) # strip whitespace from the input outside of quotes
 pass # should raise error, since the state table includes more actions than are defined
# editor.replaceSel(output_text)
print(output_text)
Janne Karila
10.6k21 silver badges34 bronze badges
answered Mar 19, 2014 at 8:01
\$\endgroup\$
4
  • \$\begingroup\$ This is terrific. Thank you! enumerate() was definitely something I've been looking for, but didn't know how to find it (or even that it existed). Also, I should probably think of tuples more. On that note: shouldn't the whole state_table become nested tuples? Also the event_list? \$\endgroup\$ Commented Mar 19, 2014 at 13:41
  • \$\begingroup\$ Reusing Raymond Hettinger, I usually try to go for "Loopy lists and structy tuples". In your case, state_table and event_list are things you are iterating on so list makes perfect sense to me. \$\endgroup\$ Commented Mar 19, 2014 at 13:49
  • \$\begingroup\$ Hmm, perhaps, but on the other hand, my state_table should not be altered, so the immutable quality of tuples is perhaps more appropriate for the whole thing. Either way, I've learned several valuable lessons here. \$\endgroup\$ Commented Mar 19, 2014 at 14:25
  • \$\begingroup\$ As per stackoverflow.com/questions/1708510/… different people will give you different answers, some based on mutability, some based on the way you want to use your container (position has relevance). \$\endgroup\$ Commented Mar 19, 2014 at 14:58

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.