Multiple parsers for keywords in strings

Question 1

I'm writing some code that parses strings, often using simple keywords. On parsing, the code performs various actions, such a printing a response, running functions, etc., and it keeps track of whether it was able to respond.

I am actually using multiple parsers and have illustrated this in the code shown below.

What would be better ways to structure this code, particularly with a mind to scalability and code compactness? For example, imagine many more parsers being added that operate on principles more complex than simple keyword-spotting.

Minimal working example (MWE):

#!/usr/bin/python
import json
import os
import requests
import subprocess
import sys
def main():
 message = "how are you"
 #message = "ip address"
 #message = "restart"
 triggered = [
 parse_1(message = message),
 parse_2(message = message)
 ]
 if not any(triggered):
 report_help()
def parse_1(
 message = None
 ):
 def keyphrases_text_response(
 message = None,
 keyphrases = None,
 response = None
 ):
 if any(pattern in message for pattern in keyphrases):
 print(response)
 return True
 else:
 return False
 triggered = [
 keyphrases_text_response(
 message = message,
 keyphrases = [
 "image"
 ],
 response = "http://i.imgur.com/MiqrlTh.jpg"
 ),
 keyphrases_text_response(
 message = message,
 keyphrases = [
 "sup",
 "hi"
 ],
 response = "sup home bean"
 ),
 keyphrases_text_response(
 message = message,
 keyphrases = [
 "how are you",
 "are you well",
 "status"
 ],
 response = "nae bad fam"
 ),
 keyphrases_text_response(
 message = message,
 keyphrases = [
 "help",
 "what can you do"
 ],
 response = "I can report my IP address and I can restart my script."
 )
 ]
 if any(triggered):
 return True
 else:
 return False
def parse_2(
 message = None
 ):
 triggered = []
 if any(pattern in message for pattern in\
 [
 "IP",
 "I.P.",
 "IP address",
 "I.P. address",
 "ip address"
 ]
 ):
 triggered.append(True)
 report_IP()
 if any(pattern in message for pattern in\
 [
 "restart"
 ]
 ):
 triggered.append(True)
 restart()
 if any(pattern in message for pattern in\
 [
 "SSH",
 "reverse"
 ]
 ):
 triggered.append(True)
 engage_command(
 command = "ssh -R 10000:localhost:22 www.sern.ch",
 background = True
 )
 if any(triggered):
 return True
 else:
 return False
def report_IP(
 contact = None,
 country = True
 ):
 IP = "unknown"
 try:
 data_IP_website = requests.get("http://ipinfo.io/json")
 data_IP = data_IP_website.json()
 IP = data_IP["ip"]
 country = data_IP["country"]
 except:
 pass
 text = "IP address: " + IP
 if country:
 text = text + " (" + country + ")"
 print(text)
def restart():
 print("restart! (and I'm in a crazy loop!)")
 import __main__
 os.execv(__main__.__file__, sys.argv)
def report_help():
 print("I can report my IP address, I can restart my script and I can run commands.")
def engage_command(
 command = None,
 background = False
 ):
 print("engage command: {command}".format(command = command))
 if not background:
 process = subprocess.Popen(
 [command],
 shell = True,
 executable = "/bin/bash"
 )
 process.wait()
 output, errors = process.communicate()
 return output
 else:
 subprocess.Popen(
 [command],
 shell = True,
 executable = "/bin/bash"
 )
 return None
if __name__ == "__main__":
 main()

Question 2

I'd recommend checking out pep8 and flake8 to help improve readability of your code.

Question 3

Repetition

You have a huge number of clauses of the form if any(pattern in message for pattern in somelist):

keyphrases_text_response does a good job of reducing that within parse_1, but you can adapt it to improve parse_2 as well.

def keyphrases_response(
 message = None,
 keyphrases = None,
 response = None
 ):
 if any(pattern in message for pattern in keyphrases):
 response()
 return True
 else:
 return False
def keyphrases_text_response(
 message = None,
 keyphrases = None,
 response = None
 ):
 return keyphrases_response(message, keyphrases, lambda: print(response))

This allows parse_2 to have a similar structure to parse_1.

def parse_2(
 message = None
 ):
 triggered = [
 keyphrases_response(
 message = message,
 keyphrases = [
 "IP",
 "I.P.",
 "IP address",
 "I.P. address",
 "ip address"
 ],
 response = reportIP
 ),
 keyphrases_response(
 message = message
 keyphrases = [
 "restart"
 ],
 response = restart
 ),
 keyphrases_response(
 message = message
 keyphrases = [
 "SSH",
 "reverse"
 ],
 response = lambda: engage_command(
 command = "ssh -R 10000:localhost:22 www.sern.ch",
 background = True
 )
 )
 ]
 return any(triggered)

Question 4

Efficiency

These separate if statements in the parse_2 function:

if any(pattern in message for pattern in\
if any(pattern in message for pattern in\
if any(pattern in message for pattern in\

should be combined into a single if/else statement:

if any(pattern in message for pattern in\
elif any(pattern in message for pattern in\
elif any(pattern in message for pattern in\

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true, etc. Also, this more clearly shows the intent of the code.

Layout

In some areas, there is too much wasted vertical whitespace. For example, the blank lines in this code are not needed:

if any(triggered):
 return True
else:
 return False

There is value in having more code on the screen at once to understand its overall structure. This is much more typical:

if any(triggered):
 return True
else:
 return False

In this particular case, these 5 lines can be simplified and replaced with the following one line:

return any(triggered)

Naming

The functions named parse_1 and parse_2 need more descriptive names. The "1" and "2" suffixes should be replaced with something more meaningful.

Documentation

The PEP 8 style guide recommends adding docstrings for functions. This would aid in understanding the parse_ functions.

Tools

You could run code development tools to automatically find some style issues with your code.

ruff

Remove unused import: `json`

Caleth Caleth 1,4529 silver badges11 bronze badges · Answer 1 · 2025-02-11 12:44:45Z

Repetition

You have a huge number of clauses of the form if any(pattern in message for pattern in somelist):

keyphrases_text_response does a good job of reducing that within parse_1, but you can adapt it to improve parse_2 as well.

def keyphrases_response(
 message = None,
 keyphrases = None,
 response = None
 ):
 if any(pattern in message for pattern in keyphrases):
 response()
 return True
 else:
 return False
def keyphrases_text_response(
 message = None,
 keyphrases = None,
 response = None
 ):
 return keyphrases_response(message, keyphrases, lambda: print(response))

This allows parse_2 to have a similar structure to parse_1.

def parse_2(
 message = None
 ):
 triggered = [
 keyphrases_response(
 message = message,
 keyphrases = [
 "IP",
 "I.P.",
 "IP address",
 "I.P. address",
 "ip address"
 ],
 response = reportIP
 ),
 keyphrases_response(
 message = message
 keyphrases = [
 "restart"
 ],
 response = restart
 ),
 keyphrases_response(
 message = message
 keyphrases = [
 "SSH",
 "reverse"
 ],
 response = lambda: engage_command(
 command = "ssh -R 10000:localhost:22 www.sern.ch",
 background = True
 )
 )
 ]
 return any(triggered)

toolic toolic 14.6k5 gold badges29 silver badges204 bronze badges · Answer 2 · 2025-02-11 12:12:54Z

Efficiency

These separate if statements in the parse_2 function:

if any(pattern in message for pattern in\
if any(pattern in message for pattern in\
if any(pattern in message for pattern in\

should be combined into a single if/else statement:

if any(pattern in message for pattern in\
elif any(pattern in message for pattern in\
elif any(pattern in message for pattern in\

The checks are mutually exclusive. This makes the code more efficient since you don't have to perform the 2nd check if the first is true, etc. Also, this more clearly shows the intent of the code.

Layout

In some areas, there is too much wasted vertical whitespace. For example, the blank lines in this code are not needed:

if any(triggered):
 return True
else:
 return False

There is value in having more code on the screen at once to understand its overall structure. This is much more typical:

if any(triggered):
 return True
else:
 return False

In this particular case, these 5 lines can be simplified and replaced with the following one line:

return any(triggered)

Naming

The functions named parse_1 and parse_2 need more descriptive names. The "1" and "2" suffixes should be replaced with something more meaningful.

Documentation

The PEP 8 style guide recommends adding docstrings for functions. This would aid in understanding the parse_ functions.

Tools

You could run code development tools to automatically find some style issues with your code.

ruff

Remove unused import: `json`

Stack Exchange Network

Multiple parsers for keywords in strings

2 Answers 2

Repetition

Efficiency

Layout

Naming

Documentation

Tools

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Multiple parsers for keywords in strings

2 Answers 2

Repetition

Efficiency

Layout

Naming

Documentation

Tools

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions