I have built a data pipeline in Python (3.12). It typically processes hundreds of thousands of files and runs for several days. A particular pipeline function calls external web APIs that sometimes fail. I expect some exceptions, and I want to record them. However, if the exception frequency is above a threshold, I do want to abort immediately.
I implemented an exception class that I use to decorate the function above.
Implementation:
class ExceptionMonitor:
""" Monitor the number of exceptions raised and the number of calls made to a function. """
def __init__(self, error_rate_threshold: float, min_calls: int):
"""
:param error_rate_threshold: The error rate threshold above which an exception is raised.
:param min_calls: The minimum number of calls before the error rate is calculated.
"""
self.error_rate_threshold = error_rate_threshold
self.min_calls = min_calls
self.nb_exceptions = 0
self.nb_calls = 0
self.error_rate = 0
def _save_exception(self):
with open("exceptions.log", "a") as f:
f.write(f"{datetime.now()}\n")
f.write(traceback.format_exc())
f.write("\n\n")
def __call__(self, func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
return func(*args, **kwargs)
except Exception:
self.nb_exceptions += 1
if self.nb_calls >= self.min_calls:
self.error_rate = self.nb_exceptions / self.nb_calls
if self.error_rate > self.error_rate_threshold:
raise
self._save_exception()
finally:
self.nb_calls += 1
return wrapper
Usage:
@ExceptionMonitor(error_rate_threshold=0.03, min_calls=100)
def some_func():
call_web_api()
Is this code idiomatic?
4 Answers 4
Your code looks fine to me.
For some nitpicks:
- pass the exception to
_save_exception
to ensure you're interacting with the correct error, and - pass the path to the log file to the class, or change to a
logging.Logger
, to make logging to a different location easier to change.
It's likely to be irrelevant for most parameters, but I think that self.nb_calls += 1
belongs in the try
block before calling the function rather than a finally
block after.
To take a small example, consider
@ExceptionMonitor(error_rate_threshold=0.5, min_calls=2)
and function results like [success, success, exception]
. I would argue that this does not clear the 50% threshold, however the code as written would say that it does.
This would also address @toolic's note about division by zero.
Since the code does a division calculation:
self.error_rate = self.nb_exceptions / self.nb_calls
it would be good to add a check that the divisor (nb_calls
) is not 0. It is initialized to 0 by __init__
:
self.nb_calls = 0
You might want to check that min_calls
is greater than 0 as well, unless you want to support that case. If you do support 0, it might be worth explicitly documenting that.
-
\$\begingroup\$ Checking
nb_calls
is unnecessary as long asmin_calls > 0
, there is already a>=
check. It would be better to only checkmin_calls
in constructor IMO. \$\endgroup\$STerliakov– STerliakov2025年04月04日 13:12:49 +00:00Commented Apr 4 at 13:12
One issue that I identified later:
The error rate calculation is cumulative, so error bursts will be missed. For example, if the error threshold is 0.05 (5%) and min_calls
is 10 and there are 8 consecutive exceptions from the 20th to the 27th call, the cumulative error rate will be 8/27 which is around 3%. Although technically correct, it's more useful to calculate the error rate within a sliding window, like this:
class ExceptionMonitor:
"""Monitor the number of exceptions raised and the number of calls made to a function."""
def __init__(
self, error_rate_threshold: float, min_calls: int, window_sz: int = 100
):
"""
:param error_rate_threshold: The error rate threshold in [0,1), above which an exception is raised.
:param min_calls: The minimum number of calls before the error rate is calculated.
:param window_sz: The size of the sliding window for calculating exception rate.
"""
self.error_rate_threshold = error_rate_threshold
self.min_calls = min_calls
self.exceptions_log = "exceptions.log"
if os.path.exists(self.exceptions_log):
os.remove(self.exceptions_log)
if window_sz < min_calls:
raise ValueError(
f"Window size {window_sz} must be greater than or equal to min_calls {min_calls}."
)
self.window_sz = window_sz
self.recent_outcomes = deque(maxlen=window_sz)
def _error_rate_exceeded(self):
if len(self.recent_outcomes) < self.min_calls:
return False
return (
sum(self.recent_outcomes) / len(self.recent_outcomes)
) > self.error_rate_threshold
def _save_exception(self):
with open("exceptions.log", "a") as f:
f.write(f"{datetime.now()}\n")
f.write(traceback.format_exc())
f.write("\n\n")
def __call__(self, func):
@functools.wraps(func)
def wrapper(*args, **kwargs):
try:
result = func(*args, **kwargs)
self.recent_outcomes.append(False)
return result
except Exception:
self.recent_outcomes.append(True)
if self._error_rate_exceeded():
raise
self._save_exception()
return wrapper
-
\$\begingroup\$ Nice self-answer +1 \$\endgroup\$2025年04月12日 10:32:29 +00:00Commented Apr 12 at 10:32
nb
stand for innb_calls
ornb_exceptions
? Notebook? \$\endgroup\$