Generalization of any() function with switchable default parameter for empty iterables

Question 1

I have some use-cases where I want the behaviour of the build-in any() function to be slightly different, namely to return True if the iterable is empty.

Therefore I came up with the following override:

_any = any # Alias built-in any.
def any(iterable, *, default=False):
 """Generalization of the built-in any() function."""
 if not default:
 return _any(iterable)
 count = 0
 for count, item in enumerate(iterable, start=1):
 if item:
 return True
 return count == 0

The code I used to test this is:

EMPTY_TUPLE = ()
FALSEY_TUPLE = (False, None, 0, '')
TRUEY_TUPLE = (False, None, 1, '')
def empty_iterables():
 """Yields empty iterables."""
 yield list(EMPTY_TUPLE)
 yield EMPTY_TUPLE
 yield {item: item for item in EMPTY_TUPLE}
 yield set(EMPTY_TUPLE)
 yield (i for i in EMPTY_TUPLE)
def falsey_iterables():
 """Yields iterables that any() should assert as False."""
 yield list(FALSEY_TUPLE)
 yield FALSEY_TUPLE
 yield {item: item for item in FALSEY_TUPLE}
 yield set(FALSEY_TUPLE)
 yield (i for i in FALSEY_TUPLE)
def truey_iterables():
 """Yields iterables that any() should assert as True."""
 yield list(TRUEY_TUPLE)
 yield TRUEY_TUPLE
 yield {item: item for item in TRUEY_TUPLE}
 yield set(TRUEY_TUPLE)
 yield (i for i in TRUEY_TUPLE)
def test():
 """Tests the custom any() function."""
 # Built-in behaviour.
 for iterable in empty_iterables():
 assert any(iterable) is False
 for iterable in falsey_iterables():
 assert any(iterable) is False
 for iterable in truey_iterables():
 assert any(iterable) is True
 # Custom behaviour.
 for iterable in empty_iterables():
 assert any(iterable, default=True) is True
 for iterable in falsey_iterables():
 assert any(iterable, default=True) is False
 for iterable in truey_iterables():
 assert any(iterable, default=True) is True
if __name__ == '__main__':
 test()

I wonder whether there's a more elegant way to implement the custom any() function with the aforementioned behaviour.

Use case

I have a custom monitoring system, that determines the state of a system using check records from a MariaDB database. I use peewee as an ORM:

def check_customer_system(system):
 """Returns the customer online check for the respective system."""
 end = datetime.now()
 start = end - CUSTOMER_INTERVAL
 condition = OnlineCheck.system == system
 condition &= OnlineCheck.timestamp >= start
 condition &= OnlineCheck.timestamp <= end
 query = OnlineCheck.select().where(condition)
 if query:
 return any(online_check.online for online_check in query)
 raise NotChecked(system, OnlineCheck)

In the current implementation, I need to check, whether the amount of records in the query in non-empty by doing if query:. However, this will execute the query an load all records into RAM, which I want to avoid. Peewee has the optional .iterator() method on queries, which will return an iterator, that will not load all records into RAM at once. However, if I use that, if query: will always be True since it's testing the truth value of a generator object then. Hence, I came up with the idea to implement a function, that returns True if any value of an iterable is True or the iterable is empty.

And I just realized that this is indeed an XY-Problem, since I'd loose the ability to distinguish between a checked and a non-checked system. Thanks @Eric Duminil.

Question 2

@hjpotter92 To distinguish empty input from single-element input.

Question 3

You've misspelt truthy.

Question 4

It looks like a XY problem. We shouldn't be reviewing this code, because its specs are broken, and no implementation can save it. We should be looking at the code using your custom any, to see how the logic can be modified.

Question 5

@EricDuminil I clarified the use case.

Question 6

@RichardNeumann: Interesting, thanks. The question now becomes completely different, though. I'm not sure if it should be a new question, now that you've already got long answers.

Question 7

I really think this is a bad idea. A function called any returning True if there not only isn't anything true but even not anything at all... that's just wrong. It sure isn't a "generalization". Shadowing the built-in is not a good idea, either. Maybe just call yours any_or_empty (without a default)?

For the emptiness check, the counting seems like overkill and I think a Boolean is clearer (and probably faster, due to not going through enumerate):

 empty = True
 for item in iterable:
 if item:
 return True
 empty = False
 return empty

Or with a sentinel, to get rid of the assignment in the loop:

 item = sentinel = object()
 for item in iterable:
 if item:
 return True
 return item is sentinel

For best speed, you could just handle the first item yourself and then let the built-in handle the rest:

def any(iterable, *, default=False):
 """Modification of the built-in any() function."""
 it = iter(iterable)
 sentinel = object()
 first = next(it, sentinel)
 if first is sentinel:
 return default
 return bool(first) or _any(it)

Your test code uses the words "truey" and "falsey". That's not Python terminology. See the Python docs, you'll never find those words. It's true and false, as you can for example see at if, Boolean operations and Truth Value Testing.

The name falsey_iterables is ambiguous if not misleading, as it doesn't yield any false iterables. They're all true. They just don't contain anything true. I'd suggest iterables_without_trues. And iterables_with_true instead of truey_iterables for consistency.

Did you have this idea because next, min and max offer a default parameter? In their cases it makes sense, as they rather need non-empty inputs. And meaningful defaults are possible there, for example min with default=float('inf') could be a useful default for the code using the result.

Question 8

Thanks for the review. Also thanks for the review of the testing code, though it was not, what this review was about. I'll definitely change the name of the custom any(), since on second thought, a function called "any" should indeed never return True on an empty input.

Question 9

And now I'm curious why someone voted that "This answer is not useful"...

Question 10

You've got a recursive implementation. If you have an iterable of 100 false values, you'll use 100 stack frames, pealing off one value from the head of iterable each time. You probably wanted to call _any(it).

Question 11

@AJNeufeld Ah yes, thanks. As stated, the intention was to "let the built-in handle the rest". Fixed. Proves the point that we shouldn't shadow the built-in :-)

Question 12

Not overriding a core builtin and handing over the non-empty case to the builtin are both excellent suggestions for maintainability. Fewer surprises (for anyone already familiar with any) and less code overall.

Question 13

Any and All

If you are going to override any(...) to return a custom result when given an empty iterable, you should also override all(...) to return a custom result as well, for consistency.

Detecting the Empty Iterable

This code:

 count = 0
 for count, item in enumerate(iterable, start=1):
 if item:
 return True
 return count == 0

relies on count being reassigned to a non-zero value by the for count ... in enumerable(..., start=1) loop. This has two inefficiencies. First, it is repeatedly assigning values to count. Second, it creates and abandons int objects (once it has gone beyond the range of interned int values), which are only used for their "non equality to zero". A flag could eliminate this second inefficiency:

 is_empty = True
 for item in iterable:
 if item:
 return True
 is_empty = False
 return is_empty

But this still leaves the first inefficiency: repeated assignments to is_empty.

Better would be to extract the first value from the iterable. If that fails, then the iterable was empty.

_any = any
_all = all
def any(iterable, *, default=False):
 """Generalization of the built-in any() function."""
 if not isinstance(default, bool):
 raise TypeError("Default must be a boolean value")
 try:
 it = iter(iterable)
 return bool(next(it)) or _any(it)
 except StopIteration:
 return default
def all(iterable, *, default=True):
 """Generalization of the built-in all() function."""
 if not isinstance(default, bool):
 raise TypeError("Default must be a boolean value")
 try:
 it = iter(iterable)
 return bool(next(it)) and _all(it)
 except StopIteration:
 return default

Testing

empty_iterables(), falsey_iterables() and truey_iterables() is confusing. It took me a while to determine that you were creating a series of different kinds of test iterables: a list, a tuple, a dict, a set, and a generator expression. Moreover, it is code duplications. All 3 functions are generating the same things, just with different values.

This would be clearer:

from typing import Iterable, Any
def test_iterables_of(*values: Any) -> Iterable[Iterable[Any]]:
 """
 Generate various types of collections of the given values.
 Returns a list, tuple, dictionary, set, and generator expression,
 each containing all of the values.
 """
 yield list(values)
 yield tuple(values)
 yield dict((val, val) for val in values)
 yield set(values)
 yield (val for val in values) # Generator

And you could use like:

 for iterable in test_iterables_of():
 assert any(iterable, default=True) is True
 for iterable in test_iterables_of(False, None, 0, ''):
 assert any(iterable, default=True) is False

Dictionary Key 0 / False

Just a note: Your dictionaries of FALSEY_TUPLE values have only 3 entries, not 4, because the key False and the key 0 are actually the same key.

Question 14

I'd say there was another inefficiency, the enumerate iterator being another layer on top of the underlying iterator (creating the ints is part of that, but not all). The assignment in the loop can btw be avoided rather easily, although I didn't think of it at first, either :-) (just added that to my answer)

Question 15

If any is now broken, the solution isn't to break all too for consistency, is it?

Question 16

@EricDuminil I'm not sure why you feel this is "breaking" any. The built-in function doesn't support keyword arguments; with no default= argument, the upgraded function behaves the same way as the built-in function. If this was submitted as a PEP to the design team, they might very well consider adding the default keyword to any() and all(). Where these upgrades can be "break" is forward compatibility; if other keyword arguments were added in Python 3.9 or Python 4.x, this upgrade would, break those. I agree the question update does make this whole mess an XY-Problem.

Question 17

@AJNeufeld The way I see it, an any function which returns True for an empty set is broken by design, even with an extra argument, even if the implementation is correct and if it doesn't break backward compatibility. The corresponding all, should, I guess, return False for empty collections, which would contradict vacuous truth, among others. No amount of documentation could save the surprising bugs coming out of this design.

Question 18

We might have to agree to disagree, then. all(()) returning True is the expected value. The surprising value would be False, and I don't see how it would be clear for all((), default=True) to return False. Your cell phone power example is interesting, but a better function name should be chosen (any_or_empty was proposed in another answer, so a modified all could be called all_but_not_empty). ∃ and ∀ are so important and ubiquitous in math, if you insist on redefining them, you might as well say that 1 = 2.

Question 19

Benchmarks and a faster solution

Code including the solutions at the end. And note I also have an image further down that might be worth checking out before/while reading.

Empty iterable

First an empty iterable, as that's what motivated the question in the first place, and perhaps that also suggests that that happens non-negligibly often in what the OP is doing:

iterable = [] number = 2,000,000 = how often each function is run
1.26 1.25 1.23 original
0.65 0.65 0.65 loop_bool
0.86 0.86 0.86 loop_sentinel
1.00 1.00 1.01 default_and_builtin
1.30 1.33 1.32 try_and_builtin
0.73 0.72 0.75 for_and_builtin
0.74 0.73 0.75 for_and_builtin2

The two simpler loop solutions are faster than the original, as they save the cost of enumerate creation. Slowest is the solution with try, as "catching an exception is expensive". The other solution using the built-in any instead of looping, the one trying next with a default and then checking that, is faster, but still faster is the for_and_builtin solution:

def any(iterable, *, default=False):
 it = iter(iterable)
 for first in it:
 return bool(first) or _any(it)
 return default

Its slightly less pretty variant for_and_builtin2 is equally fast, as the difference is only in the loop body, which doesn't come into play yet:

def any(iterable, *, default=False):
 it = iter(iterable)
 for first in it:
 return True if first else _any(it)
 return default

One-element iterables

iterable = [False] number = 2,000,000 iterable = [True]
 1.35 1.36 1.38 original 1.26 1.24 1.24 
 0.73 0.75 0.73 loop_bool 0.70 0.69 0.68 
 0.93 0.95 0.93 loop_sentinel 0.87 0.89 0.88 
 1.34 1.36 1.36 default_and_builtin 1.22 1.20 1.21 
 1.12 1.13 1.11 try_and_builtin 1.02 0.99 0.99 
 1.05 1.07 1.07 for_and_builtin 0.95 0.95 0.95 
 0.87 0.89 0.88 for_and_builtin2 0.79 0.76 0.79

All got slower, except try_and_builtin, which got faster since it doesn't pay the heavy price of catching an exception anymore. Still slightly slower than for_and_builtin, though. And for_and_builtin2 was impacted far less and remains the second-fastest solution.

Two elements

Not much change here.

 [False, False] number = 2,000,000 [False, True] 
1.42 1.43 1.42 original 1.35 1.32 1.39
0.77 0.78 0.79 loop_bool 0.74 0.77 0.75
0.96 0.92 0.95 loop_sentinel 0.91 0.92 0.90
1.32 1.30 1.32 default_and_builtin 1.29 1.32 1.35
1.11 1.11 1.08 try_and_builtin 1.09 1.10 1.09
1.06 1.03 1.06 for_and_builtin 1.05 1.06 1.05
0.91 0.89 0.86 for_and_builtin2 0.86 0.88 0.86

Long iterables

At ten elements, the faster loop solutions are still competitive, but in the long runs, they become a lot slower than the solutions making the built-in any do the hard work. Of the three loop solutions, loop_sentinel becomes the fastest, as it has the least to do in each iteration. The three built-in users become equally fast.

iterable = [False] * 10**1 number = 2,000,000
2.03 2.02 2.02 original
1.25 1.25 1.24 loop_bool
1.30 1.26 1.26 loop_sentinel
1.43 1.42 1.37 default_and_builtin
1.19 1.17 1.19 try_and_builtin
1.18 1.15 1.13 for_and_builtin
0.96 0.96 0.99 for_and_builtin2
iterable = [False] * 10**2 number = 200,000
0.86 0.85 0.86 original
0.59 0.59 0.59 loop_bool
0.41 0.40 0.40 loop_sentinel
0.21 0.21 0.21 default_and_builtin
0.19 0.19 0.19 try_and_builtin
0.18 0.19 0.19 for_and_builtin
0.17 0.16 0.17 for_and_builtin2
iterable = [False] * 10**3 number = 20,000
0.93 0.93 0.93 original
0.52 0.51 0.53 loop_bool
0.32 0.32 0.32 loop_sentinel
0.09 0.09 0.09 default_and_builtin
0.09 0.09 0.09 try_and_builtin
0.09 0.09 0.09 for_and_builtin
0.09 0.09 0.09 for_and_builtin2
iterable = [False] * 10**6 number = 20
1.01 0.99 1.01 original
0.53 0.54 0.51 loop_bool
0.32 0.32 0.32 loop_sentinel
0.08 0.08 0.08 default_and_builtin
0.08 0.08 0.08 try_and_builtin
0.08 0.08 0.08 for_and_builtin
0.08 0.08 0.08 for_and_builtin2

Then again, how likely does the iterable have a million false elements but not a single true element? Probably rather unlikely. If we assume that every element has a 50% chance to be true, then there's only a one-in-a-million chance that there's no true element in the first 20 elements (if the iterable even is that long). So let's have a better look at [False] * n for n from 0 to 20:

closer look at 0 to 20 elements

Full benchmark code (for the textual outputs):

from timeit import repeat
from functools import partial
_any = any # Alias built-in any.
def original(iterable, *, default=False):
 if not default:
 return _any(iterable)
 count = 0
 for count, item in enumerate(iterable, start=1):
 if item:
 return True
 return count == 0
def loop_bool(iterable, *, default=False):
 if not default:
 return _any(iterable)
 empty = True
 for item in iterable:
 if item:
 return True
 empty = False
 return empty
def loop_sentinel(iterable, *, default=False):
 if not default:
 return _any(iterable)
 item = sentinel = object()
 for item in iterable:
 if item:
 return True
 return item is sentinel
def default_and_builtin(iterable, *, default=False):
 it = iter(iterable)
 sentinel = object()
 first = next(it, sentinel)
 if first is sentinel:
 return default
 return bool(first) or _any(it)
def try_and_builtin(iterable, *, default=False):
 try:
 it = iter(iterable)
 return bool(next(it)) or _any(it)
 except StopIteration:
 return default
def for_and_builtin(iterable, *, default=False):
 it = iter(iterable)
 for first in it:
 return bool(first) or _any(it)
 return default
def for_and_builtin2(iterable, *, default=False):
 it = iter(iterable)
 for first in it:
 return True if first else _any(it)
 return default
funcs = original, loop_bool, loop_sentinel, default_and_builtin, try_and_builtin, for_and_builtin, for_and_builtin2
num = 2 * 10**6
tests = [
 ('[]', num),
 ('[False]', num),
 ('[True]', num),
 ('[False, False]', num),
 ('[False, True]', num),
 ('[False] * 10**1', num // 10**0),
 ('[False] * 10**2', num // 10**1),
 ('[False] * 10**3', num // 10**2),
 ('[False] * 10**6', num // 10**5),
 ]
for iterable, number in tests:
 print('iterable =', iterable, f' {number = :,}')
 iterable = eval(iterable)
 times = [[] for _ in funcs]
 for _ in range(3):
 for func, ts in zip(funcs, times):
 t = min(repeat(partial(func, iterable, default=True), number=number))
 ts.append(t)
 for func, ts in zip(funcs, times):
 print(*('%.2f' % t for t in ts), func.__name__, sep=' ')
 print()

Question 20

NOTE: Since you provided more information about your actual problem, this answer will be very different to the already existing answers. Their pieces of advice are still valid : you really shouldn't break any().

Your actual problem

However, this will execute the query an load all records into RAM, which I want to avoid

As mentioned in this answer, you could simply call Peewee's query.exists() to know if there's at least one record returned by this query.

The source code makes it clear it retrieves at most 1 record from the DB, and only returns a boolean:

@database_required
def exists(self, database):
 clone = self.columns(SQL('1'))
 clone._limit = 1
 clone._offset = None
 return bool(clone.scalar())

I tried a small example on my laptop:

query = Movie.select().where((Movie.year > 1950) & (Movie.year < 1960))
query.exists()

Here's the corresponding Postgres log (please notice LIMIT 1):

SELECT 1 FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960)) LIMIT 1

Asking for the number of corresponding records with query.count() also doesn't retrieve every row:

SELECT COUNT(1) FROM (SELECT 1 FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960))) AS "_wrapped"

Only when I iterate over every row does peewee send a complete query:

for movie in query:
 print(movie.title)

SELECT "t1"."id", "t1"."title", "t1"."imdb_id", "t1"."year", "t1"."seen", "t1"."rating" FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960))

superb rain superb rain 4,0217 silver badges22 bronze badges · Accepted Answer · 2020-09-30 14:36:01Z

I really think this is a bad idea. A function called any returning True if there not only isn't anything true but even not anything at all... that's just wrong. It sure isn't a "generalization". Shadowing the built-in is not a good idea, either. Maybe just call yours any_or_empty (without a default)?

For the emptiness check, the counting seems like overkill and I think a Boolean is clearer (and probably faster, due to not going through enumerate):

 empty = True
 for item in iterable:
 if item:
 return True
 empty = False
 return empty

Or with a sentinel, to get rid of the assignment in the loop:

 item = sentinel = object()
 for item in iterable:
 if item:
 return True
 return item is sentinel

For best speed, you could just handle the first item yourself and then let the built-in handle the rest:

def any(iterable, *, default=False):
 """Modification of the built-in any() function."""
 it = iter(iterable)
 sentinel = object()
 first = next(it, sentinel)
 if first is sentinel:
 return default
 return bool(first) or _any(it)

Your test code uses the words "truey" and "falsey". That's not Python terminology. See the Python docs, you'll never find those words. It's true and false, as you can for example see at if, Boolean operations and Truth Value Testing.

The name falsey_iterables is ambiguous if not misleading, as it doesn't yield any false iterables. They're all true. They just don't contain anything true. I'd suggest iterables_without_trues. And iterables_with_true instead of truey_iterables for consistency.

Did you have this idea because next, min and max offer a default parameter? In their cases it makes sense, as they rather need non-empty inputs. And meaningful defaults are possible there, for example min with default=float('inf') could be a useful default for the code using the result.

Thanks for the review. Also thanks for the review of the testing code, though it was not, what this review was about. I'll definitely change the name of the custom any(), since on second thought, a function called "any" should indeed never return True on an empty input.
And now I'm curious why someone voted that "This answer is not useful"...
You've got a recursive implementation. If you have an iterable of 100 false values, you'll use 100 stack frames, pealing off one value from the head of iterable each time. You probably wanted to call _any(it).
@AJNeufeld Ah yes, thanks. As stated, the intention was to "let the built-in handle the rest". Fixed. Proves the point that we shouldn't shadow the built-in :-)
Not overriding a core builtin and handing over the non-empty case to the builtin are both excellent suggestions for maintainability. Fewer surprises (for anyone already familiar with any) and less code overall.

Stack Exchange Network

Generalization of any() function with switchable default parameter for empty iterables

4 Answers 4

Any and All

Detecting the Empty Iterable

Testing

Dictionary Key 0 / False

Benchmarks and a faster solution

Your actual problem

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Generalization of any() function with switchable default parameter for empty iterables

4 Answers 4

Any and All

Detecting the Empty Iterable

Testing

Dictionary Key 0 / False

Benchmarks and a faster solution

Your actual problem

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions