Handling Null values (and equivalents) routinely in Python

Question 1

I've found the following code invaluable in helping me 'handle' None values including "whitespace" characters that should be treated as None based on the situation. I have been using this code for quite some time now:

class _MyUtils:
 def __init__(self):
 pass
 def _mynull(self, myval, myalt, mystrip=True, mynullstrings=["", "None"], mynuminstances=(int, float)):
 # if the value is None, return the alternative immediately.
 if myval is None:
 return myalt
 # if the value is a number, it is not None - so return the original
 elif isinstance(myval, mynuminstances):
 return myval
 # if the mystrip parameter is true, strip the original and test that
 else:
 if mystrip:
 testval = myval.strip()
 else:
 testval = myval
 # if mynullstrings are populated, check if the upper case of the
 # original value matches the upper case of any item in the list.
 # return the alternative if so.
 if len(mynullstrings) > 0:
 i = 0
 for ns in mynullstrings:
 if ns.upper() == testval.upper():
 i = i + 1
 break
 if i > 0:
 return myalt
 else:
 return myval
 else:
 return myval
def main():
 x = _MyUtils()
 print(x._mynull(None, "alternative_value", True, [""]))
if __name__ == '__main__':
 main()

The code requires an input, an alternative to provide if input is found to be Null, whether to 'strip' the input during testing (if not a number), values to treat as 'equivalent' to None and types of number instances to determine if the input is numeric (and hence not none).

Essentially, too many processes that we run depend upon not having None values in the data being processed—whether that be lambda functions, custom table toolsets, etc. This code gives me the ability to handle None values predictably, but I am sure there is a better approach here. Is there a more Pythonic way of doing this? How can this code be improved? How would others approach this problem?

Question 2

This kind of imprecision is a reason why JavaScript can be so brittle. JavaScript is very liberal in converting values to other types. It's better to be strict about types and the values you allow for them. An unexpected value is a bug; it should not be silently corrected.

Question 3

The 'fault' lies not with me, but with the dataset I am being provided with for analysis. If that dataset is imperfect, I have only limited choices - I can effectively fix the data at source (using similar code) or create a toolset that works relatively predictably across multiple datasets which was my intention here. Hope that clarifies my issue.

Question 4

I see. I thought this util class was supposed to be used in regular code. As a data import helper this is very useful and totally appropriate. Fix the data as it enters the system.

Question 5

Generally I don't think you should have a class for this functionality. There's no state and no particular meaning to MyUtils object here. You can make this into a long function in whatever module you deem appropriate in your codebase.

I think this function as written is a code smell. It 1) doesn't cover a whole lot of types and 2) implies that where you're using it you're not going to have even a rough idea of what type of data you're expecting. In most cases you will have some idea, and even then it's not usually a good idea to do explicit type checking.

Where you're using this for numbers you can replace it with myval if myval is not None else mydefault.

A function like this may be more useful for strings, for which there are a wider range of essentially empty values. Perhaps something like this

def safe_string(s, default="", blacklist=["None"]):
 if s is None or len(s.strip()) == 0:
 return default
 if s.upper() in [b.upper() for b in blacklist]:
 return default
 return s

Question 6

Apart from the "blacklist" feature, you can in many cases just use or to use a "default" value if the first argument is falsy. Some example:

>>> "foo" or "default"
'foo'
>>> "" or "default"
'default'
>>> None or "default"
'default'

And similar for numbers, lists, etc.

for x in list_that_could_be_none or []:
 print(x * (number_that_could_be_none or 0))

But note that any non-empty string is truthy (but you can still strip):

>>> " " or "default"
' '
>>> " ".strip() or "default"
'default'

Question 7

This loop could be rewritten:

 if len(mynullstrings) > 0:
 i = 0
 for ns in mynullstrings:
 if ns.upper() == testval.upper():
 i = i + 1
 break
 if i > 0:
 return myalt
 else:
 return myval
 else:
 return myval

as:

 if testval.upper() in [ns.upper() for ns in mynullstrings]:
 return myalt
 else:
 return myval

I would also rewrite this:

 if mystrip:
 testval = myval.strip()
 else:
 testval = myval

as:

 if mystrip:
 myval= myval.strip()

and continue to use myval. This seems clearer to me.

Personally, I don't think prepending 'my' is a good style—variable names should be descriptive in and of themselves.

Question 8

also note that str.casefold() is recommended for comparing strings. see for example stackoverflow.com/q/45745661/1358308 and stackoverflow.com/q/40348174/1358308

Question 9

I really like how you have crushed my multi-line loops into a far more pythonic version here. Thank you.

Question 10

further to everything else that's been written I find it's generally better for functions to raise an exception if the wrong data-type is propagated, I'd therefore discourage use of code that special cases things like your checking for ints and floats. I'd write the function as:

def replace_null(text, *, empty_is_null=True, strip=True, nulls=('NULL', 'None')):
 """Return None if text represents 'none', otherwise text with whitespace stripped."""
 if text is None:
 return None
 if strip:
 text = str.strip(text)
 if empty_is_null and not text:
 return None
 if str.casefold(text) in (s.casefold() for s in nulls):
 return None
 return text

The asterisk (*) indicates keyword-only arguments (see PEP 3102) as I think it would help with future readers of the code. For example I would probably have to look at the definition to determine what:

x = myobj._mynull(text, 'default', False)

does, especially the unqualified False, when compared to (assuming the above is saved in utils.py):

x = utils.replace_null(text, strip=False) or 'default'

which relies more on keyword arguments and standard Python semantics.

I've also added a small docstring, so that help(replace_null) works.

Quinn Mortimer Quinn Mortimer 1895 bronze badges · Answer 1 · 2019-05-03 13:32:21Z

Generally I don't think you should have a class for this functionality. There's no state and no particular meaning to MyUtils object here. You can make this into a long function in whatever module you deem appropriate in your codebase.

I think this function as written is a code smell. It 1) doesn't cover a whole lot of types and 2) implies that where you're using it you're not going to have even a rough idea of what type of data you're expecting. In most cases you will have some idea, and even then it's not usually a good idea to do explicit type checking.

Where you're using this for numbers you can replace it with myval if myval is not None else mydefault.

A function like this may be more useful for strings, for which there are a wider range of essentially empty values. Perhaps something like this

def safe_string(s, default="", blacklist=["None"]):
 if s is None or len(s.strip()) == 0:
 return default
 if s.upper() in [b.upper() for b in blacklist]:
 return default
 return s

tobias_k tobias_k 1,69914 silver badges18 bronze badges · Answer 2 · 2019-05-03 13:37:36Z

Apart from the "blacklist" feature, you can in many cases just use or to use a "default" value if the first argument is falsy. Some example:

>>> "foo" or "default"
'foo'
>>> "" or "default"
'default'
>>> None or "default"
'default'

And similar for numbers, lists, etc.

for x in list_that_could_be_none or []:
 print(x * (number_that_could_be_none or 0))

But note that any non-empty string is truthy (but you can still strip):

>>> " " or "default"
' '
>>> " ".strip() or "default"
'default'

unddoch unddoch 3541 silver badge7 bronze badges · Answer 3 · 2019-05-03 13:16:46Z

This loop could be rewritten:

 if len(mynullstrings) > 0:
 i = 0
 for ns in mynullstrings:
 if ns.upper() == testval.upper():
 i = i + 1
 break
 if i > 0:
 return myalt
 else:
 return myval
 else:
 return myval

as:

 if testval.upper() in [ns.upper() for ns in mynullstrings]:
 return myalt
 else:
 return myval

I would also rewrite this:

 if mystrip:
 testval = myval.strip()
 else:
 testval = myval

as:

 if mystrip:
 myval= myval.strip()

and continue to use myval. This seems clearer to me.

Personally, I don't think prepending 'my' is a good style—variable names should be descriptive in and of themselves.

also note that str.casefold() is recommended for comparing strings. see for example stackoverflow.com/q/45745661/1358308 and stackoverflow.com/q/40348174/1358308
I really like how you have crushed my multi-line loops into a far more pythonic version here. Thank you.

Sam Mason Sam Mason 2961 silver badge6 bronze badges · Answer 4 · 2019-05-03 18:09:47Z

further to everything else that's been written I find it's generally better for functions to raise an exception if the wrong data-type is propagated, I'd therefore discourage use of code that special cases things like your checking for ints and floats. I'd write the function as:

def replace_null(text, *, empty_is_null=True, strip=True, nulls=('NULL', 'None')):
 """Return None if text represents 'none', otherwise text with whitespace stripped."""
 if text is None:
 return None
 if strip:
 text = str.strip(text)
 if empty_is_null and not text:
 return None
 if str.casefold(text) in (s.casefold() for s in nulls):
 return None
 return text

The asterisk (*) indicates keyword-only arguments (see PEP 3102) as I think it would help with future readers of the code. For example I would probably have to look at the definition to determine what:

x = myobj._mynull(text, 'default', False)

does, especially the unqualified False, when compared to (assuming the above is saved in utils.py):

x = utils.replace_null(text, strip=False) or 'default'

which relies more on keyword arguments and standard Python semantics.

I've also added a small docstring, so that help(replace_null) works.

Stack Exchange Network

Handling Null values (and equivalents) routinely in Python

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Handling Null values (and equivalents) routinely in Python

4 Answers 4

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions