Pandas likes to throw cryptic errors when you feed its functions with empty DataFrame
s saying nothing that would help you to identify the root cause. In order to avoid this I used to write conditions like this one all over the place:
def normalize_null(data: pd.DataFrame) -> pd.DataFrame:
"""Replaces pd.NaT garbage with None."""
if data.empty:
return pd.DataFrame()
return data.replace({pd.NaT: None})
I don't even remember if this particular operation fails on an empty DataFrame
, but there are many that do so I just add it to every function just in case.
Then I thought why making it so ugly? Make it a decorator! So now I do this:
@skip_if_empty(default=pd.DataFrame())
def normalize_null(data: pd.DataFrame) -> pd.DataFrame:
"""Replaces pd.NaT garbage with None."""
return data.replace({pd.NaT: None})
with this decorator:
def skip_if_empty(default: Any | None):
def decorate(decoratee):
def decorator(*decoratee_args, **decoratee_kwargs):
if len(decoratee_args) < 1:
raise ValueError("The decorated function must have at least one argument.")
if type(decoratee_args[0]) is not pd.DataFrame:
raise ValueError("The first argument must be a DataFrame.")
return default if decoratee_args[0].empty else decoratee(*decoratee_args, **decoratee_kwargs)
decorator.__signature__ = inspect.signature(decoratee)
return decorator
return decorate
Now when I use several such methods that modify a DataFrame
in a chain I don't have to worry about anything being empty
:
data = normalize_null(data)
data = do_something_else(data)
data = ...
Would you say it's ok to use decorators this way or mabe there is something else about it that could be improved?
1 Answer 1
functools.wraps
The triple-nested function confused me for a minute, looking at it as a stranger, particularly as the decorator is not documented. You may want to look into functools.wraps
which is a convenience function which does what you want with preserving signature, docs, name, etc. while being a standardised method.
Names
To me, default
doesn't immediately tell me what it's doing, default_return
would be more obvious to me. Also, skip_if_empty
implies that this function would work for any type, but is explicitly limited by:
if type(decoratee_args[0]) is not pd.DataFrame
to pd.DataFrame
s. Perhaps skip_if_empty_df
might be more clear. Either that, or you could make this more generically useful by allowing a second argument of permitted type:
def skip_if_empty(default: Any | None, allowed_types: tuple = (pd.DataFrame,)):
...
if type(decoratee_args[0]) is not in allowed_types
Pythonic
if len(decoratee_args) < 1
unless len(tuple)
is liable to return a negative number what you're actually asking is:
if len(decoratee_args) == 0
or more pythonically
if not decoratee_args
Likewise
if type(decoratee_args[0]) is not pd.DataFrame
might want to be:
if not isinstance(decoratee_args[0], pd.DataFrame)
which would allow subclasses of pd.DataFrame
s to work as well.
Error messages
As a user, I would (and should) be unaware that any of the functions are in any way decorated, so seeing:
raise ValueError("The decorated function must have at least one argument.")
Is unhelpful. You might instead want:
raise ValueError(f"{decoratee.__name__} must have at least one argument.")
-
\$\begingroup\$ I like that and I have a question about this
allowed_types: tuple = (pd.DataFrame,)
. Would this also work with a list or are tuples preferable in cases like this? I guess probably because they are immutable, right? \$\endgroup\$t3chb0t– t3chb0t2023年03月29日 10:55:25 +00:00Commented Mar 29, 2023 at 10:55 -
1\$\begingroup\$ It would work with a
list
and type annotations are just hints (by default), I would argue atuple
makes most sense here due to immutability and the fact that you will be defining it as a programmer rather than a user, but it could be any collection that supportsin
semantics.isinstance
also recommendstuple
s \$\endgroup\$DeathIncarnate– DeathIncarnate2023年03月29日 10:59:45 +00:00Commented Mar 29, 2023 at 10:59
Explore related questions
See similar questions with these tags.