I have a pattern that I have been using for a few lookup functions that I feel could be done in a better or more pythonic way. I am looking for guidance on how the following function could be improved. The list of patterns that I have for each type of lookup can be long - I have shortened it here but it is usually 50 - 100 patterns in each.
The goal of this function is to take an arg status
and return back a string indicating which list it matched.
I use this as such:
... process csv data normalize_license_status("Operating Approved") --> active normalize_license_status("Pending") --> inactive
and if this matched the "active" list or expression I expect to get back "active".
def _normalize_license_status(self,status,pattern):
logger.debug(f'Searching {status} for pattern {pattern}')
if re.search(rf"\b(?=\w){pattern}\b(?!\w)", status, re.IGNORECASE):
return True
def normalize_license_status(self,status):
# Status Patterns
active = "Operating|Established|Active|About to Expire"
inactive = "Returned|Void|Pending-Inspection|Pending"
missing = "Not Available|not available|Not Provided|not provided"
# Strip off leading and training spaces
status = status.strip()
# Set default status
try:
## Inactive
if self._normalize_license_status(status,inactive):
result = 'Inactive'
logger.debug(f'Matched {status} to {result}')
# Active
elif self._normalize_license_status(status,active):
result = 'Active'
logger.debug(f'Matched {status} to {result}')
## Missing or Not Provided or Not Available
elif self._normalize_license_status(status,missing):
result = 'Not Available'
logger.debug(f'Matched {status} to {result}')
## No Match
else:
# Should only get here when a license status is not matched
raise ValueError(f'Invalid status {status}. Could not find a match.')
except ValueError as value_err:
logger.error(value_err)
except Exception as e:
logger.error(e)
else:
logger.debug(f'Normalized status {status} as {result}')
return result
My thought is that there is a better way to do this. I have several functions like this that verify a string against a list of patterns and I am looking to understand how to make this better.
-
2\$\begingroup\$ Welcome to Code Review! I changed the title so that it describes what the code does per site goals: "State what your code does in your title, not your main concerns about it.". Please check that I haven't misrepresented your code, and correct it if I have. \$\endgroup\$Toby Speight– Toby Speight2022年02月11日 16:45:02 +00:00Commented Feb 11, 2022 at 16:45
2 Answers 2
Discussion
If I'm understanding this correctly, you're trying to take map a set of known strings to one of three license states (active
, inactive
, or missing
)?
If so, it feels like representing the data as enum
s seems the most natural.
The license states themselves feel like enum
s - in your implementation you're returning one of three strings.
The inputs feel like fixed values themselves (maybe I'm wrong, maybe they are highly dynamic), but if they are relatively fixed / there is a known set of the input values, they should be categorized as enum
s as well.
Proposed Implementation
from enum import Enum, auto
class LicenseState(Enum):
ACTIVE = auto()
INACTIVE = auto()
MISSING = auto()
class LicenseDescription(Enum):
OPERATING_APPROVED = auto()
PENDING = auto()
license_states_by_license_description = {
LicenseDescription.OPERATING_APPROVED: LicenseState.ACTIVE,
LicenseDescription.PENDING: LicenseState.INACTIVE
}
def calculate_license_state(license_description):
license_state = license_states_by_license_description.get(license_description, None)
if license_state is None:
raise ValueError(f"Unknown license state for license description: {license_description}")
return license_state
print(calculate_license_state(LicenseDescription.OPERATING_APPROVED)) // LicenseStatus.ACTIVE
print(calculate_license_state(None)) // ValueError
Proposed Implementation Discussion
I'll touch more on this in the review of your proposed implementation, but I think a shortcoming of the your proposed implementation is it depends on string parsing for analyzing which state a license description is in, which feels a little brittle to me.
The function inputs seem like they should be enumerable, and in my opinion, this categorization logic would be much easier to maintain if they were enumerated. As it stands, you're effectively enumerating some subset of the license descriptions by using the pipe-delimited approach.
Worse-case you could at least split the pipe-delimited strings into a set of strings and map them to the appropriate license status like
identification_patterns_by_license_status = {
LicenseStatus.ACTIVE: ('Operating', 'Established'),
// etc
}
And then iterate over the different LicenseStatus
es in some order, iterating over each element of the set and seeing if there is a match against the input string.
Code Review
Naming
Input Argument Name
The input argument to normalize_license_status
is named status
but to me, the return string
seems to be more "status"-like. "Active", "Inactive", "Not Available" seem to describe the state of a license than "Operating Approved". To me, the input seems more like a license description, or a license status description?
Helper Method
_normalize_license_status
is supposed to return a boolean
. Generally speaking, I like to name methods that return a boolean is_X
or has_X
. So in this case maybe something like is_license_description_contained_in_license_patterns
. Also, I think it's generally a good practice that if you return True
in one branch, you should explicitly return False
in the other branch and not return None
- or even simpler
return re.search(rf"\b(?=\w){pattern}\b(?!\w)", status, re.IGNORECASE)
General Thoughts
Each time the method is called, you are reallocating the same strings to the active
, inactive
, and missing
variables - these seem like they should be class variables.
The string parsing approach seems very brittle. Imagine the case where "Foo" is both in the set of active
patterns and inactive
patterns. Then you can never change the order of evaluating the inactive
patterns before evaluating the active
patterns or else the behavior of the method might change.
The logic has a lot of repetition. Inside the try
you are effectively testing each string pattern and then assigning a result if a test succeeds. Imagine something like this:
result = next(
case_and_result[1]
for case_and_result in
[(active, "Active"), (inactive, "Inactive), (missing, "Not Available)]
if _normalize_license_status(status, case_and_result[0]),
None
)
if result is None:
raise ValueError()
Logging & Exception Handling
I'm seeing a lot of logger
method invocations. There's nothing wrong with using a logger for debugging, but to have logger
methods in (theoretically) production code seems like a code smell - specifically, it smells like there aren't sufficient unit tests written to test the places that have logger
method invocations.
I don't have much context on the overall class, but I don't think this method should swallow errors. The caller should be in charge of how it wants to handle any errors that are thrown by this method.
While your current use-case might work by logging errors and swallowing them, if some other person joined your team, it would not be obvious (or expected) that any errors raised by this method would be swallowed.
-
1\$\begingroup\$ The implication that logging is inappropriate for production code is at odds with industry practice. \$\endgroup\$FMc– FMc2022年02月12日 15:09:53 +00:00Commented Feb 12, 2022 at 15:09
-
\$\begingroup\$ @FMc you're right that production code can have logging. I have mostly seen usages for error handling and telemetry. A lot of the log statements seem to be around validating logical behavior - i.e. if a certain output was produced - which seems more inline with a unit test of some kind rather than a log statement, imho. \$\endgroup\$Jae Bradley– Jae Bradley2022年02月12日 15:51:35 +00:00Commented Feb 12, 2022 at 15:51
-
\$\begingroup\$ @jaeBradley - ty for the very detailed response. This is my first time posting on this exchange and i was not certain what to expect. I am reviewing your recommendations above and am realizing that i have never used Enums in the past. I am testing various scenarios with your code sample. I am curious about the logging you mentioned. This process is part of a much larger internal ETL tool and I thought i understood logging properly and have added logging so that i can find specific issues when the pop-up. I do not have any unit testing setup at this time and am using logging to verify \$\endgroup\$Isi Lawson– Isi Lawson2022年02月14日 18:06:57 +00:00Commented Feb 14, 2022 at 18:06
I presume that there is a one-to-one mapping of input strings to normalized status. That is, the same input string can't map to more than one normalized status. If that is the case, just use a dict to map input strings to the normalized status strings?
So that you don't need to type 'Active'/'Inactive'/'Missing' 100 times, start with a dict mapping normalized status to the corresponding input strings. Update this dict whenever a user comes up with a new way to describe the license status in the CSV file. Make it easy to read and update.
_status_map = {
'Active':"""
about to expire
active
established
operating
""",
'Inactive':"""
pending
pending-inspection
returned
void
""",
'Missing':"""
not available
not provided
""",
}
But, we need a map from input strings to a normalized status string:
license_status_map = {value:key for key,value in _status_map.items()
for value in values.strip().splitlines()}
The normalizing routing should put the input string into some standard form. Here it is just lowercased, but consider replacing punctuation with spaces and collapsing a adjacent whitespace into one space. Then simply look up the input string in the license_status_map
.
def normalize_license_status(self, status):
status = status.strip().tolower()
result = license_status_map.get(status, None)
if result is None:
logger.error(f'Unmatched {status}')
return None # default_status?
logger.debug(f'Matched {status} to {result}')
return status
Note: 'Active'/'Inactive'/'Missing' can be replaced by enums
if that fits your use case.
-
\$\begingroup\$ I think it's a many-to-one mapping, but the same observations apply. \$\endgroup\$Toby Speight– Toby Speight2022年02月15日 20:33:24 +00:00Commented Feb 15, 2022 at 20:33