Avoiding regexps abuse in generation of statements to be executed

Question 1

I made a GPL project to read Shogi Western Notations that I made public in GitHub. The project works perfectly on Ubuntu. Please check README.md to check requirements.

All this code works quite well. The basic idea was fragmenting each notated move with a long regexp, which I think is the best way to optimize the process. First, I read notated games in plain text and store every move as an element into a list:

shogi_reader.py -> line 50

movs = game_data.movs.splitlines()

There are also two objects that control position of pieces to determine moves. They are defined in my project in the file managers.py. The classes are named coords_manager() and matrix_manager(). coords_manager class controls a list of coords for every kind of piece and controls the revertion of coords if R key is pressed. matrix_manager controls a matrix of boolean values to determine if a square is owned by a piece or not. lamesa is an instance of coords_manager(), which controls the arrays of coords of pieces and reverts the pieces' coords, which are a property of the class.

The program works in two steps:

1- It reads the notation and calculates how to store the data into a format revertible backwards for the user using matchobj.group(n). (shogi_reader.py, line 69:)

reg = re.compile('^\s*(\d+)\s*-\s*(\+?[P|L|N|S|G|K|R|B](\d[a-i])?)([-|x|*])(\d[a-i][=|\+]?)$')
error_cnt = 0
for move in movs:
 move_to_add = None
 kind = ''
 action = None
 promoting = False
 piece_respawn = ''
 piece_x = 0
 piece_y = 0
 frag = reg.match(move)
 destiny = [lamesa.coords_x[frag.group(5)[0]],lamesa.coords_y[frag.group(5)[1]]]
 destiny_h = frag.group(5)
 if len(frag.group(5)) > 2:
 if frag.group(5)[2] == '+':
 promoting = True
 if (len(frag.group(2)) > 1 and frag.group(2)[0] != '+') or (len(frag.group(2)) > 2 and frag.group(2)[0] == '+'):
 if frag.group(2)[0] == '+':
 kind = frag.group(2)[0:2]
 piece_x = lamesa.coords_x[frag.group(2)[2]]
 piece_y = lamesa.coords_y[frag.group(2)[3]]
 else:
 kind = frag.group(2)[0]
 piece_x = lamesa.coords_x[frag.group(2)[1]]
 piece_y = lamesa.coords_y[frag.group(2)[2]]
 else:
 kind = frag.group(2)
 if frag.group(4) == 'x':
 action = 1
 elif frag.group(4) == '*':
 action = 2
 elif frag.group(4) == '-':
 action = 0
 # [...]

2- The canvas is loaded, and the pieces are drawn into the shogi board. The notated game then can be played with arrows OR (A and D keys), and revert the board with R.

Once the second step begins, a list with the formatted moves is stored in history_pos list and then replayed with following functions assigned to arrow keys. This is the part of the algorythm where the regexp abuse takes place(Lines 84, 94 and 141):

funciones.py -> Line 65

def move_forward():
 global pos, history
 output = ''
 if pos < len(history):
 if history[pos][2] == 2:
 #In this case,
 # history[pos] structure -> [[kind_of_piece,counter_to_assign_id], destiny_coords, action, promoting, piece_respawn]
 statem = 'lamesa.lista_'+history[pos][0][0]+'['+(str)(history[pos][0][1])+']=['+(str)(lamesa.coords_x[history[pos][1][0]])+','+(str)(lamesa.coords_y[history[pos][1][1]])+']'
 exec statem
 output = 'lamesa.r'+history[pos][0][0]+' -= 1'
 else:
 # Standard case,
 # history[pos] structure -> [code_statement_to_select_piece, coords_sum, action, promoting, piece_respawn]
 statem = 'lamesa.'+history[pos][0]+'[0]+='+(str)(history[pos][1][0]*lamesa.reverted)
 exec statem
 statem = 'lamesa.'+history[pos][0]+'[1]+='+(str)(history[pos][1][1]*lamesa.reverted)
 exec statem
 if history[pos][3] == True: #Handle piece_promotion==True
 #TODO Simplify this task avoiding the regexp abuse
 prueba = re.match('^.*\[(.*)\]$', history[pos][0])
 statem = 'prueba2=lamesa.'+history[pos][0]
 exec statem
 statem = 'del lamesa.'+history[pos][0]
 exec statem
 prueba4 = re.match('^.*_(.*)\[.*\]$', history[pos][0])
 statem = 'lamesa.lista_s'+prueba4.group(1)+'['+prueba.group(1)+']=['+(str)(prueba2[0])+','+(str)(prueba2[1])+']'
 exec statem
 if history[pos][4] != '': #Piece captured that has to be respawn backwards
 # TODO Avoid regexp abuse
 frag = re.match('^(lista_(.*)\[.*\])=\[.*,.*\]$', history[pos][4])
 theobj = frag.group(1)
 piece = frag.group(2)
 if len(piece) == 3:
 piece = piece[1:]
 if piece[-1] == 'b':
 piece = piece[:-1]+'n'
 else:
 piece = piece[:-1]+'b'
 statem = 'del lamesa.'+theobj
 exec statem
 output = 'lamesa.r'+piece+' += 1'
 pos += 1
 previous_highlight(pos)
 return output

funciones.py -> Line: 129

def move_back():
 global pos, history
 output = ''
 if pos > 0:
 pos -= 1
 if history[pos][2] == 2:
 statem = 'del lamesa.lista_'+history[pos][0][0]+'['+(str)(history[pos][0][1])+']'
 exec statem
 output = 'lamesa.r'+history[pos][0][0]+' += 1'
 else:
 if history[pos][3] == True:
 #TODO Avoid regexp abuse
 prueba = re.match('^.*\[(.*)\]$', history[pos][0])
 prueba4 = re.match('^.*_(.*)\[.*\]$', history[pos][0])
 statem = 'prueba2=lamesa.lista_s'+prueba4.group(1)+'['+prueba.group(1)+']'
 exec statem
 statem = 'lamesa.'+history[pos][0]+'=['+(str)(prueba2[0])+','+(str)(prueba2[1])+']'
 exec statem
 statem = 'del lamesa.lista_s'+prueba4.group(1)+'['+prueba.group(1)+']'
 exec statem
 statem = 'lamesa.'+history[pos][0]+'[0]-='+(str)(history[pos][1][0]*lamesa.reverted)
 exec statem
 statem = 'lamesa.'+history[pos][0]+'[1]-='+(str)(history[pos][1][1]*lamesa.reverted)
 exec statem
 if history[pos][4] != '':
 frag = re.match('^lista_(.*)\[.*\]=\[.*,.*\]$', history[pos][4])
 piece = frag.group(1)
 if len(piece) == 3:
 piece = piece[1:]
 if piece[-1] == 'b':
 piece = piece[:-1]+'n'
 else:
 piece = piece[:-1]+'b'
 exec 'lamesa.'+history[pos][4]
 output = 'lamesa.r'+piece+' -= 1'
 previous_highlight(pos)
 return output

history list's elements have two different formats depending on the kind of move:

Normal move:

[code_statement_to_select_piece, coords_sum, action, promoting, piece_respawn]
Drop:

[[kind_of_piece,counter_to_assign_id], destiny_coords, action, promoting, piece_respawn]

When action==2, the structure of history[pos] elements is different in order to send data that is needed in order to revert the drop move. This transformation is made to obtain a state that can be calculated forward and backwards for the position and state of pieces:

The way I structured the algorythm forced me to make some abuse of regexps in these fragments of code.

There are the move_forward() and move_back() functions, which performs a change on the properties of coords_manager() and matrix_manager() objects which objective is to keep the pieces' state. I had to make some abuse of regexps in these lines to make the operations I needed on the state.

I think I should change the structure of objects inside history_pos list in order to prevent abuse of regexps, but I don't have good ideas for it.

Am I really abusing of regexps here? If so, which structure changes on objects inside history_pos list would be needed to prevent the abuse of regexps?

Question 2

In addition to the "regexp abuse" concern, you also have a problem with the use of exec. Actually, I would consider them both to be symptoms of a deeper problem: your modelling is underdeveloped.

Specifically, the problem is that your data are "stringly typed".

A better design would be for the parser to return a list of Move objects. Here's a rough idea:

class Move
 def __init__(self, from_coord, to_coord, promoted_to=None):
 self.from = from_coord
 self.to = to_coord
 self.piece = None
 self.captured_piece = None
 self.promoted_to = None
 def do_move(self, board):
 self.piece = board.get_piece(self.from_coord)
 self.captured_piece = board.get_piece(self.to_coord)
 if self.captured_piece is not None:
 board.remove_piece(self.to_coord)
 board.remove_piece(self.from_coord)
 board.put_piece(self.to_coord, self.promoted_to or self.piece)
 def undo_move(self, board):
 board.remove_piece(self.to_coord)
 board.put_piece(self.from_coord, self.piece)
 if self.captured_piece is not None:
 board.put_piece(self.to_coord, self.captured_piece)

The shogi_reader would return a list of these Move objects, and you can play and unplay the moves as necessary by calling do_move() or undo_move().

Question 3

Looks like a good way to prevent abuse of regexps and exec instruction. I'll check it out.

Question 4

There are four managers in managers.py: coords_manager(), sprites_manager(), matrix_manager() and input_manager(). The 1st and 3rd control the logic on squares and piece's coords, so making a Move class that handles it in a cleaner way may be factible. Late or soon, I will try it out.

Question 5

I encourage you to ask about coords_manager in a separate question.

Question 6

Just adding information about what I'm planning to do to perform a Move class. I'm not sure if you mean there's a problem with coords_manager()

Question 7

Follow-up question posted

Question 8

Considering just your regex...

reg = re.compile('^\s*(\d+)\s*-\s*(\+?[P|L|N|S|G|K|R|B](\d[a-i])?)([-|x|*])(\d[a-i][=|\+]?)$')

I see a few problems.

It's good practice to use r'raw strings' for any non-trivial regex, to prevent backslashes from having additional special meaning.
The use of character classes is incorrect. Inside a character class, the "or" is implicit. That is, [PLNSGKRB] means the same as (?:P|L|N|S|G|K|R|B).
Inside a character class, some characters, such as +, are taken literally. You don't need to put a backslash in front.

reg = re.compile(r'^\s*(\d+)\s*-\s*(\+?[PLNSGKRB](\d[a-i])?)([-x*])(\d[a-i][=+]?)$')

For readability, I suggest taking advantage of re.VERBOSE and (?P<name>...) groups.

reg = re.compile(r"""^\s*(?P<move_num>\d+)
 \s*-\s*(?P<piece>\+?[PLNSGKRB](?P<from>\d[a-i])?)
 (?P<action>[-x*])
 (?P<to>\d[a-i][=+]?)$""", re.VERBOSE)

Question 9

Thank you for the analysis. When I have somemore time I'll use this information to make the code more readable and test your VERBOSE version to see if it keeps accuracy.

Question 10

Anyway, the main issues with regexps abuse are in move_forward() and move_back() functions (funciones.py file)

score 3 · Accepted Answer · 2015-05-28 23:21:35Z

In addition to the "regexp abuse" concern, you also have a problem with the use of exec. Actually, I would consider them both to be symptoms of a deeper problem: your modelling is underdeveloped.

Specifically, the problem is that your data are "stringly typed".

A better design would be for the parser to return a list of Move objects. Here's a rough idea:

class Move
 def __init__(self, from_coord, to_coord, promoted_to=None):
 self.from = from_coord
 self.to = to_coord
 self.piece = None
 self.captured_piece = None
 self.promoted_to = None
 def do_move(self, board):
 self.piece = board.get_piece(self.from_coord)
 self.captured_piece = board.get_piece(self.to_coord)
 if self.captured_piece is not None:
 board.remove_piece(self.to_coord)
 board.remove_piece(self.from_coord)
 board.put_piece(self.to_coord, self.promoted_to or self.piece)
 def undo_move(self, board):
 board.remove_piece(self.to_coord)
 board.put_piece(self.from_coord, self.piece)
 if self.captured_piece is not None:
 board.put_piece(self.to_coord, self.captured_piece)

The shogi_reader would return a list of these Move objects, and you can play and unplay the moves as necessary by calling do_move() or undo_move().

Looks like a good way to prevent abuse of regexps and exec instruction. I'll check it out.
There are four managers in managers.py: coords_manager(), sprites_manager(), matrix_manager() and input_manager(). The 1st and 3rd control the logic on squares and piece's coords, so making a Move class that handles it in a cleaner way may be factible. Late or soon, I will try it out.
I encourage you to ask about coords_manager in a separate question.
Just adding information about what I'm planning to do to perform a Move class. I'm not sure if you mean there's a problem with coords_manager()

Stack Exchange Network

Avoiding regexps abuse in generation of statements to be executed

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Avoiding regexps abuse in generation of statements to be executed

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions