Clean up Python function with too many if/else statements

Question 1

I have a working function with too many if/else statements (pylint). The code works but it's not good. I try to improve it but I have some tunnelvision. Can someone help me to improve the code?

So depending on a list of stacks, an environment (prod, nonprod or all) and a list of account IDs there can be filtered on which stacks need to be updated.

This is an example of a stack dict:

{'AccountId': '123456789101', 'StackName': 'test-nonprod'}

function:

def filter_stacks(stacks, environment, account_ids):
 """todo"""
 non_prod_regex = re.compile("(nonprod|non-prod)")
 stacks_to_update = []
 if non_prod_regex.match(environment) and account_ids:
 for stack in stacks:
 if (
 non_prod_regex.search(stack["StackName"])
 and stack["AccountId"] in account_ids
 ):
 stacks_to_update.append(stack)
 elif non_prod_regex.match(environment) and not account_ids:
 for stack in stacks:
 if non_prod_regex.search(stack["StackName"]):
 stacks_to_update.append(stack)
 elif re.compile("all").match(environment) and account_ids:
 for stack in stacks:
 if stack["AccountId"] in account_ids:
 stacks_to_update.append(stack)
 elif re.compile("all").match(environment) and not account_ids:
 stacks_to_update = stacks
 elif not non_prod_regex.match(environment) and account_ids:
 for stack in stacks:
 if (
 not non_prod_regex.search(stack["StackName"])
 and stack["AccountId"] in account_ids
 ):
 stacks_to_update.append(stack)
 elif not non_prod_regex.match(environment) and not account_ids:
 for stack in stacks:
 if not non_prod_regex.search(stack["StackName"]):
 stacks_to_update.append(stack)
 else:
 print("No match found.")
 return stacks_to_update

Question 2

Also, regexes are the same every time the function is called. Make them static (i.e. global or attached to class object or something), not compile them every time.

Question 3

Without some more context, a review will be difficult. I think that the whole "non-production" logic is a non-starter and is indicative of suboptimal design.

Question 4

Welcome to Code Review! The current question title, which states your concerns about the code, is too general to be useful here. Please edit to the site standard, which is for the title to simply state the task accomplished by the code. Please see How to get the best value out of Code Review: Asking Questions for guidance on writing good question titles.

Question 5

@PavloSlavynskyy Whereas OP should follow your advice, strictly speaking compile them every time is not what actually happens since there is a regex compilation cache.

Question 6

@Reinderien thnx gone reading

Question 7

Simplifications

re.compile("(nonprod|non-prod)") is just a complicated way to write a badly performing regex for re.compile("non-?prod"), which does the same, but better.
extract re.compile("all") into a variable like you did with non_prod_regex, or better yet, do a string comparison: if environment == "all"... Achieves the same effect, but is quicker (because no regex involved).

A different design

An alterative to having that for-loop inside of the if-statements would be to have the if-statements inside of your loop. That can be made much cleaner by converting it to a list comprehension:

 is_nonprod = bool(non_prod_regex.match(environment))
 has_accounts = bool(account_ids)
 stacks_to_update = [stack for stack in stacks if (
 (environment == "all" or (is_nonprod == bool(non_prod_regex.match(stack["StackName"]))))
 and (not has_accounts or stack["AccountId"] in account_ids)
 )]

This design makes an interesting simplification: You have two dimensions of things you want to check: The environment of the stack and the account of the stack. Both of these dimensions impact the result set, but are independent of one another.

This allows us to simplify things a bit by noticing the repeating pattern:

if there are account_ids, the account needs to be checked, otherwise we don't care
The environment must either be "all" or the 'nonprod-status' of the stack must match the environment.

These two conditions must be fulfilled to include a stack in the result set. The first one is easier, it's just an implication \$A \Rightarrow s.aid \in A\$, which is equivalent to \$\neg A \lor s.aid \in A\$ (not account_ids or ...)

The second condition is equally simple to implement and combining the two conditions with and gets all the results filtered the way we need them to.

Question 8

re: the regex... i doubt it makes a difference honestly, since unless the engine in question is bad the regex probably gets compiled to an nfa/dfa. so the decision is to choose whichever one is more maintainable... and that really depends on the user.

Question 9

well actually, even better: if environment in ["nonprod", "non-prod"]: <a> elif environment in ["all"]: <b> else: c

Question 10

@somebody except for the fact that string interning is a thing, the compilation of a regex takes time and even if it is a DFA, following a pointered graph instead of comparing elements of an array is slower. Note that the if you are suggesting won't work well in a list comprehension and is IMO easier expressed using the XOR semantic

Question 11

"following a pointered graph"??? what exactly is that referring to? also imo the list comprehension doesn't look great... plus i wasn't replying to that section at all.

Question 12

A data-oriented function should return data or raise an exception, not print. If there are no matching stacks, just return an empty list or raise.

Establish a habit of defining regular expressions with raw strings. If you always define regex patterns with raw strings, you never have to waste time thinking about whether your pattern contains things like \n, \t, and so forth. This is one example of a preemptive strategy to avoid silly bugs. Efficient developers have many habits like this.

Instead of checking everything at once, apply multiple filtering passes. Your current strategy sets up all of the conditional logic branches based on the environment and the presence/absence of account_ids and then writes the complete loop for each branch. That's a lot of code and it imposes a high mental cost on the reader who must try to discern small differences between each branch. But that's not necessary. If you break the process down into phases, everything simplifies: first filter by environment, and then filter by account IDs. Notice also that this reorganization of the code has the benefit of laying the groundwork should you decide in the future to put environment-filtering and account-id-filtering in separate functions (for example, if the filtering logic in one or both becomes more complex).

Organize code in commented paragraphs. As illustrated below, a few simple comments -- especially if they focus on strategy or purpose -- can help future readers navigate the code. The most likely future reader is you.

import re
def filter_stacks(stacks, environment, account_ids):
 # Determine whether the environment is nonprod.
 nonprod_rgx = re.compile(r'non-?prod')
 env_is_nonprod = bool(nonprod_rgx.match(environment))
 # Filter based on environment.
 if re.compile(r'all').match(environment):
 to_update = stacks
 else:
 to_update = [
 s
 for s in stacks
 if env_is_nonprod is bool(nonprod_rgx.search(s['StackName']))
 ]
 # Optionally filter based on account_ids.
 if account_ids:
 return [s for s in to_update if s['AccountId'] in account_ids]
 else:
 return to_update

Question 13

According to your code the first optimization that came to my mind is to group your if statements by the variable accounts_ids like the next example

 if accounts_ids:
 if non_prod_regex.match(environment):
 do....
 elif re.compile("all").match(environment):
 do....
 else:
 do...
 else:
 if non_prod_regex.match(environment):
 do....
 elif re.compile("all").match(environment):
 stacks_to_update = stacks
 else:
 do....

That will make the code a bit more readable, however when there is multiple if statements my recommendation is that you use a Visitor pattern to solve the problem. Hope it helps to clarify

Question 14

As other answers already pointed out, you should not compile call re.compile several times, and there is no need to try to filter according to all filters at the same time.

I would go even further and suggest to use iterators, which are great tools in Python. It would allow you to write a single function per filter, and apply a bunch of function to each stack in a single iteration.

I would also suggest to type your code, which makes any review process easier.

Here is a suggestion that will work only in python 3.8+ (due to functools.cached_property usage):

import re
from functools import cached_property
from typing import (
 Any,
 Iterable,
 Iterator,
 List,
 Pattern,
 TypeVar,
 Mapping,
)
class Patterns:
 """Wrapper regex patterns into cached properties"""
 @cached_property
 def NON_PROD(self) -> Pattern[str]:
 return re.compile(r"non-?prod")
 @cached_property
 def ALL(self) -> Pattern[str]:
 return re.compile(r"all")
# Create a global instance that can be used accross the module
PATTERNS = Patterns()
def account_ids_filter(stack: Mapping[str, Any], account_ids: List[str]) -> bool:
 """Return True is stack account ID is present in list of accoutn IDs"""
 return stack["AccountId"] in account_ids
def environment_filter(stack: Mapping[str, Any], environment: str) -> bool:
 """Return True if stack matches environment else False"""
 env_is_non_prod = True if PATTERNS.NON_PROD.match(environment) else False
 return env_is_non_prod is bool(PATTERNS.NON_PROD.search(stack["StackName"]))
# Whatever type is going into filter_stacks will also come out from filter_stacks
StackT = TypeVar("StackT", bound=Mapping[str, Any])
def filter_stacks(
 stacks: Iterable[StackT],
 environment: str,
 account_ids: List[str],
) -> Iterator[StackT]:
 """Returns an iterator of filtered stacks"""
 # Return early in case "all" pattern matches
 if PATTERNS.ALL.match(environment):
 return iter(stacks)
 # Iterate through stacks
 for stack in stacks:
 # Filter using environment
 if not environment_filter(stack, environment):
 continue
 # Filter using account IDs
 if not account_ids_filter(stack, account_ids):
 continue
 yield stack
def get_filtered_stacks(
 stacks: Iterable[StackT], environment: str, account_ids: List[str]
) -> List[StackT]:
 """Returns a list of filtered stacks"""
 return list(filter_stacks(stacks, environment=environment, account_ids=account_ids))

Because we're working with iterators, the input stacks argument is iterated on once only.

Moreover, regexp patterns are stored as cached properties on a global variable so that they are generated lazily (on first call, and not at module import)

If you filters grow more complex, you can write more filter functions, unit test them easily, and call them in the filter_stacks function

Question 15

Your account_ids_filter function could be simplified. return True if x else False is equivalent to return x (assuming x is a boolean, which it is here).

Question 16

Pattern.match returns either None if there is no match, or an instance of re.Match. So it's definitely not a boolean. I could have done bool(...) instead of True if ... else False, but I guess it's a matter of style ?

Question 17

I’m talking about stack["AccountId"] in account_ids which is a Boolean

Question 18

You're right, my bad ! I fixed it

Vogel612 Vogel612 25.5k7 gold badges59 silver badges141 bronze badges · Accepted Answer · 2022-01-14 16:24:11Z

Simplifications

re.compile("(nonprod|non-prod)") is just a complicated way to write a badly performing regex for re.compile("non-?prod"), which does the same, but better.
extract re.compile("all") into a variable like you did with non_prod_regex, or better yet, do a string comparison: if environment == "all"... Achieves the same effect, but is quicker (because no regex involved).

A different design

An alterative to having that for-loop inside of the if-statements would be to have the if-statements inside of your loop. That can be made much cleaner by converting it to a list comprehension:

 is_nonprod = bool(non_prod_regex.match(environment))
 has_accounts = bool(account_ids)
 stacks_to_update = [stack for stack in stacks if (
 (environment == "all" or (is_nonprod == bool(non_prod_regex.match(stack["StackName"]))))
 and (not has_accounts or stack["AccountId"] in account_ids)
 )]

This design makes an interesting simplification: You have two dimensions of things you want to check: The environment of the stack and the account of the stack. Both of these dimensions impact the result set, but are independent of one another.

This allows us to simplify things a bit by noticing the repeating pattern:

if there are account_ids, the account needs to be checked, otherwise we don't care
The environment must either be "all" or the 'nonprod-status' of the stack must match the environment.

These two conditions must be fulfilled to include a stack in the result set. The first one is easier, it's just an implication \$A \Rightarrow s.aid \in A\$, which is equivalent to \$\neg A \lor s.aid \in A\$ (not account_ids or ...)

The second condition is equally simple to implement and combining the two conditions with and gets all the results filtered the way we need them to.

re: the regex... i doubt it makes a difference honestly, since unless the engine in question is bad the regex probably gets compiled to an nfa/dfa. so the decision is to choose whichever one is more maintainable... and that really depends on the user.
well actually, even better: if environment in ["nonprod", "non-prod"]: <a> elif environment in ["all"]: <b> else: c
@somebody except for the fact that string interning is a thing, the compilation of a regex takes time and even if it is a DFA, following a pointered graph instead of comparing elements of an array is slower. Note that the if you are suggesting won't work well in a list comprehension and is IMO easier expressed using the XOR semantic
"following a pointered graph"??? what exactly is that referring to? also imo the list comprehension doesn't look great... plus i wasn't replying to that section at all.

Stack Exchange Network

Clean up Python function with too many if/else statements

4 Answers 4

Simplifications

A different design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Clean up Python function with too many if/else statements

4 Answers 4

Simplifications

A different design

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions