I've found the following code invaluable in helping me 'handle' None values including "whitespace" characters that should be treated as None based on the situation. I have been using this code for quite some time now:
class _MyUtils:
def __init__(self):
pass
def _mynull(self, myval, myalt, mystrip=True, mynullstrings=["", "None"], mynuminstances=(int, float)):
# if the value is None, return the alternative immediately.
if myval is None:
return myalt
# if the value is a number, it is not None - so return the original
elif isinstance(myval, mynuminstances):
return myval
# if the mystrip parameter is true, strip the original and test that
else:
if mystrip:
testval = myval.strip()
else:
testval = myval
# if mynullstrings are populated, check if the upper case of the
# original value matches the upper case of any item in the list.
# return the alternative if so.
if len(mynullstrings) > 0:
i = 0
for ns in mynullstrings:
if ns.upper() == testval.upper():
i = i + 1
break
if i > 0:
return myalt
else:
return myval
else:
return myval
def main():
x = _MyUtils()
print(x._mynull(None, "alternative_value", True, [""]))
if __name__ == '__main__':
main()
The code requires an input, an alternative to provide if input is found to be Null, whether to 'strip' the input during testing (if not a number), values to treat as 'equivalent' to None and types of number instances to determine if the input is numeric (and hence not none).
Essentially, too many processes that we run depend upon not having None values in the data being processed—whether that be lambda functions, custom table toolsets, etc. This code gives me the ability to handle None values predictably, but I am sure there is a better approach here. Is there a more Pythonic way of doing this? How can this code be improved? How would others approach this problem?
-
\$\begingroup\$ This kind of imprecision is a reason why JavaScript can be so brittle. JavaScript is very liberal in converting values to other types. It's better to be strict about types and the values you allow for them. An unexpected value is a bug; it should not be silently corrected. \$\endgroup\$usr– usr2019年05月04日 13:43:47 +00:00Commented May 4, 2019 at 13:43
-
\$\begingroup\$ The 'fault' lies not with me, but with the dataset I am being provided with for analysis. If that dataset is imperfect, I have only limited choices - I can effectively fix the data at source (using similar code) or create a toolset that works relatively predictably across multiple datasets which was my intention here. Hope that clarifies my issue. \$\endgroup\$lb_so– lb_so2019年05月05日 09:56:51 +00:00Commented May 5, 2019 at 9:56
-
1\$\begingroup\$ I see. I thought this util class was supposed to be used in regular code. As a data import helper this is very useful and totally appropriate. Fix the data as it enters the system. \$\endgroup\$usr– usr2019年05月05日 10:05:25 +00:00Commented May 5, 2019 at 10:05
4 Answers 4
Generally I don't think you should have a class for this functionality. There's no state and no particular meaning to MyUtils
object here. You can make this into a long function in whatever module you deem appropriate in your codebase.
I think this function as written is a code smell. It 1) doesn't cover a whole lot of types and 2) implies that where you're using it you're not going to have even a rough idea of what type of data you're expecting. In most cases you will have some idea, and even then it's not usually a good idea to do explicit type checking.
Where you're using this for numbers you can replace it with myval if myval is not None else mydefault
.
A function like this may be more useful for strings, for which there are a wider range of essentially empty values. Perhaps something like this
def safe_string(s, default="", blacklist=["None"]):
if s is None or len(s.strip()) == 0:
return default
if s.upper() in [b.upper() for b in blacklist]:
return default
return s
Apart from the "blacklist" feature, you can in many cases just use or
to use a "default" value if the first argument is falsy. Some example:
>>> "foo" or "default"
'foo'
>>> "" or "default"
'default'
>>> None or "default"
'default'
And similar for numbers, lists, etc.
for x in list_that_could_be_none or []:
print(x * (number_that_could_be_none or 0))
But note that any non-empty string is truthy (but you can still strip
):
>>> " " or "default"
' '
>>> " ".strip() or "default"
'default'
This loop could be rewritten:
if len(mynullstrings) > 0:
i = 0
for ns in mynullstrings:
if ns.upper() == testval.upper():
i = i + 1
break
if i > 0:
return myalt
else:
return myval
else:
return myval
as:
if testval.upper() in [ns.upper() for ns in mynullstrings]:
return myalt
else:
return myval
I would also rewrite this:
if mystrip:
testval = myval.strip()
else:
testval = myval
as:
if mystrip:
myval= myval.strip()
and continue to use myval
. This seems clearer to me.
Personally, I don't think prepending 'my' is a good style—variable names should be descriptive in and of themselves.
-
2\$\begingroup\$ also note that
str.casefold()
is recommended for comparing strings. see for example stackoverflow.com/q/45745661/1358308 and stackoverflow.com/q/40348174/1358308 \$\endgroup\$Sam Mason– Sam Mason2019年05月03日 16:46:10 +00:00Commented May 3, 2019 at 16:46 -
\$\begingroup\$ I really like how you have crushed my multi-line loops into a far more pythonic version here. Thank you. \$\endgroup\$lb_so– lb_so2019年05月05日 10:00:09 +00:00Commented May 5, 2019 at 10:00
further to everything else that's been written I find it's generally better for functions to raise an exception if the wrong data-type is propagated, I'd therefore discourage use of code that special cases things like your checking for int
s and float
s. I'd write the function as:
def replace_null(text, *, empty_is_null=True, strip=True, nulls=('NULL', 'None')):
"""Return None if text represents 'none', otherwise text with whitespace stripped."""
if text is None:
return None
if strip:
text = str.strip(text)
if empty_is_null and not text:
return None
if str.casefold(text) in (s.casefold() for s in nulls):
return None
return text
The asterisk (*
) indicates keyword-only arguments (see PEP 3102) as I think it would help with future readers of the code. For example I would probably have to look at the definition to determine what:
x = myobj._mynull(text, 'default', False)
does, especially the unqualified False
, when compared to (assuming the above is saved in utils.py
):
x = utils.replace_null(text, strip=False) or 'default'
which relies more on keyword arguments and standard Python semantics.
I've also added a small docstring, so that help(replace_null)
works.