I have some use-cases where I want the behaviour of the build-in any()
function to be slightly different, namely to return True
if the iterable is empty.
Therefore I came up with the following override:
_any = any # Alias built-in any.
def any(iterable, *, default=False):
"""Generalization of the built-in any() function."""
if not default:
return _any(iterable)
count = 0
for count, item in enumerate(iterable, start=1):
if item:
return True
return count == 0
The code I used to test this is:
EMPTY_TUPLE = ()
FALSEY_TUPLE = (False, None, 0, '')
TRUEY_TUPLE = (False, None, 1, '')
def empty_iterables():
"""Yields empty iterables."""
yield list(EMPTY_TUPLE)
yield EMPTY_TUPLE
yield {item: item for item in EMPTY_TUPLE}
yield set(EMPTY_TUPLE)
yield (i for i in EMPTY_TUPLE)
def falsey_iterables():
"""Yields iterables that any() should assert as False."""
yield list(FALSEY_TUPLE)
yield FALSEY_TUPLE
yield {item: item for item in FALSEY_TUPLE}
yield set(FALSEY_TUPLE)
yield (i for i in FALSEY_TUPLE)
def truey_iterables():
"""Yields iterables that any() should assert as True."""
yield list(TRUEY_TUPLE)
yield TRUEY_TUPLE
yield {item: item for item in TRUEY_TUPLE}
yield set(TRUEY_TUPLE)
yield (i for i in TRUEY_TUPLE)
def test():
"""Tests the custom any() function."""
# Built-in behaviour.
for iterable in empty_iterables():
assert any(iterable) is False
for iterable in falsey_iterables():
assert any(iterable) is False
for iterable in truey_iterables():
assert any(iterable) is True
# Custom behaviour.
for iterable in empty_iterables():
assert any(iterable, default=True) is True
for iterable in falsey_iterables():
assert any(iterable, default=True) is False
for iterable in truey_iterables():
assert any(iterable, default=True) is True
if __name__ == '__main__':
test()
I wonder whether there's a more elegant way to implement the custom any()
function with the aforementioned behaviour.
Use case
I have a custom monitoring system, that determines the state of a system using check records from a MariaDB database. I use peewee as an ORM:
def check_customer_system(system):
"""Returns the customer online check for the respective system."""
end = datetime.now()
start = end - CUSTOMER_INTERVAL
condition = OnlineCheck.system == system
condition &= OnlineCheck.timestamp >= start
condition &= OnlineCheck.timestamp <= end
query = OnlineCheck.select().where(condition)
if query:
return any(online_check.online for online_check in query)
raise NotChecked(system, OnlineCheck)
In the current implementation, I need to check, whether the amount of records in the query in non-empty by doing if query:
.
However, this will execute the query an load all records into RAM, which I want to avoid. Peewee has the optional .iterator()
method on queries, which will return an iterator, that will not load all records into RAM at once. However, if I use that, if query:
will always be True
since it's testing the truth value of a generator object then.
Hence, I came up with the idea to implement a function, that returns True
if any value of an iterable is True
or the iterable is empty.
And I just realized that this is indeed an XY-Problem, since I'd loose the ability to distinguish between a checked and a non-checked system. Thanks @Eric Duminil.
4 Answers 4
I really think this is a bad idea. A function called any
returning True
if there not only isn't anything true but even not anything at all... that's just wrong. It sure isn't a "generalization". Shadowing the built-in is not a good idea, either. Maybe just call yours any_or_empty
(without a default)?
For the emptiness check, the counting seems like overkill and I think a Boolean is clearer (and probably faster, due to not going through enumerate
):
empty = True
for item in iterable:
if item:
return True
empty = False
return empty
Or with a sentinel, to get rid of the assignment in the loop:
item = sentinel = object()
for item in iterable:
if item:
return True
return item is sentinel
For best speed, you could just handle the first item yourself and then let the built-in handle the rest:
def any(iterable, *, default=False):
"""Modification of the built-in any() function."""
it = iter(iterable)
sentinel = object()
first = next(it, sentinel)
if first is sentinel:
return default
return bool(first) or _any(it)
Your test code uses the words "truey" and "falsey". That's not Python terminology. See the Python docs, you'll never find those words. It's true and false, as you can for example see at if
, Boolean operations and Truth Value Testing.
The name falsey_iterables
is ambiguous if not misleading, as it doesn't yield any false iterables. They're all true. They just don't contain anything true. I'd suggest iterables_without_trues
. And iterables_with_true
instead of truey_iterables
for consistency.
Did you have this idea because next
, min
and max
offer a default
parameter? In their cases it makes sense, as they rather need non-empty inputs. And meaningful defaults are possible there, for example min
with default=float('inf')
could be a useful default for the code using the result.
-
\$\begingroup\$ Thanks for the review. Also thanks for the review of the testing code, though it was not, what this review was about. I'll definitely change the name of the custom
any()
, since on second thought, a function called "any" should indeed never returnTrue
on an empty input. \$\endgroup\$Richard Neumann– Richard Neumann2020年09月30日 15:07:03 +00:00Commented Sep 30, 2020 at 15:07 -
\$\begingroup\$ And now I'm curious why someone voted that "This answer is not useful"... \$\endgroup\$superb rain– superb rain2020年09月30日 15:33:03 +00:00Commented Sep 30, 2020 at 15:33
-
\$\begingroup\$ You've got a recursive implementation. If you have an iterable of 100 false values, you'll use 100 stack frames, pealing off one value from the head of
iterable
each time. You probably wanted to call_any(it)
. \$\endgroup\$AJNeufeld– AJNeufeld2020年09月30日 16:03:20 +00:00Commented Sep 30, 2020 at 16:03 -
4\$\begingroup\$ @AJNeufeld Ah yes, thanks. As stated, the intention was to "let the built-in handle the rest". Fixed. Proves the point that we shouldn't shadow the built-in :-) \$\endgroup\$superb rain– superb rain2020年09月30日 16:08:56 +00:00Commented Sep 30, 2020 at 16:08
-
1\$\begingroup\$ Not overriding a core builtin and handing over the non-empty case to the builtin are both excellent suggestions for maintainability. Fewer surprises (for anyone already familiar with
any
) and less code overall. \$\endgroup\$l0b0– l0b02020年10月01日 01:26:22 +00:00Commented Oct 1, 2020 at 1:26
Any and All
If you are going to override any(...)
to return a custom result when given an empty iterable, you should also override all(...)
to return a custom result as well, for consistency.
Detecting the Empty Iterable
This code:
count = 0
for count, item in enumerate(iterable, start=1):
if item:
return True
return count == 0
relies on count
being reassigned to a non-zero value by the for count ... in enumerable(..., start=1)
loop. This has two inefficiencies. First, it is repeatedly assigning values to count
. Second, it creates and abandons int
objects (once it has gone beyond the range of interned int
values), which are only used for their "non equality to zero". A flag could eliminate this second inefficiency:
is_empty = True
for item in iterable:
if item:
return True
is_empty = False
return is_empty
But this still leaves the first inefficiency: repeated assignments to is_empty
.
Better would be to extract the first value from the iterable. If that fails, then the iterable was empty.
_any = any
_all = all
def any(iterable, *, default=False):
"""Generalization of the built-in any() function."""
if not isinstance(default, bool):
raise TypeError("Default must be a boolean value")
try:
it = iter(iterable)
return bool(next(it)) or _any(it)
except StopIteration:
return default
def all(iterable, *, default=True):
"""Generalization of the built-in all() function."""
if not isinstance(default, bool):
raise TypeError("Default must be a boolean value")
try:
it = iter(iterable)
return bool(next(it)) and _all(it)
except StopIteration:
return default
Testing
empty_iterables()
, falsey_iterables()
and truey_iterables()
is confusing. It took me a while to determine that you were creating a series of different kinds of test iterables: a list
, a tuple
, a dict
, a set
, and a generator expression. Moreover, it is code duplications. All 3 functions are generating the same things, just with different values.
This would be clearer:
from typing import Iterable, Any
def test_iterables_of(*values: Any) -> Iterable[Iterable[Any]]:
"""
Generate various types of collections of the given values.
Returns a list, tuple, dictionary, set, and generator expression,
each containing all of the values.
"""
yield list(values)
yield tuple(values)
yield dict((val, val) for val in values)
yield set(values)
yield (val for val in values) # Generator
And you could use like:
for iterable in test_iterables_of():
assert any(iterable, default=True) is True
for iterable in test_iterables_of(False, None, 0, ''):
assert any(iterable, default=True) is False
Dictionary Key 0 / False
Just a note: Your dictionaries of FALSEY_TUPLE
values have only 3 entries, not 4, because the key False
and the key 0
are actually the same key.
-
1\$\begingroup\$ I'd say there was another inefficiency, the
enumerate
iterator being another layer on top of the underlying iterator (creating the ints is part of that, but not all). The assignment in the loop can btw be avoided rather easily, although I didn't think of it at first, either :-) (just added that to my answer) \$\endgroup\$superb rain– superb rain2020年09月30日 16:34:16 +00:00Commented Sep 30, 2020 at 16:34 -
\$\begingroup\$ If
any
is now broken, the solution isn't to breakall
too for consistency, is it? \$\endgroup\$Eric Duminil– Eric Duminil2020年10月01日 08:38:27 +00:00Commented Oct 1, 2020 at 8:38 -
\$\begingroup\$ @EricDuminil I'm not sure why you feel this is "breaking"
any
. The built-in function doesn't support keyword arguments; with nodefault=
argument, the upgraded function behaves the same way as the built-in function. If this was submitted as a PEP to the design team, they might very well consider adding thedefault
keyword toany()
andall()
. Where these upgrades can be "break" is forward compatibility; if other keyword arguments were added in Python 3.9 or Python 4.x, this upgrade would, break those. I agree the question update does make this whole mess an XY-Problem. \$\endgroup\$AJNeufeld– AJNeufeld2020年10月01日 21:56:36 +00:00Commented Oct 1, 2020 at 21:56 -
1\$\begingroup\$ @AJNeufeld The way I see it, an
any
function which returnsTrue
for an empty set is broken by design, even with an extra argument, even if the implementation is correct and if it doesn't break backward compatibility. The correspondingall
, should, I guess, returnFalse
for empty collections, which would contradict vacuous truth, among others. No amount of documentation could save the surprising bugs coming out of this design. \$\endgroup\$Eric Duminil– Eric Duminil2020年10月01日 22:30:05 +00:00Commented Oct 1, 2020 at 22:30 -
1\$\begingroup\$ We might have to agree to disagree, then.
all(())
returningTrue
is the expected value. The surprising value would beFalse
, and I don't see how it would be clear forall((), default=True)
to returnFalse
. Your cell phone power example is interesting, but a better function name should be chosen (any_or_empty
was proposed in another answer, so a modifiedall
could be calledall_but_not_empty
). ∃ and ∀ are so important and ubiquitous in math, if you insist on redefining them, you might as well say that 1 = 2. \$\endgroup\$Eric Duminil– Eric Duminil2020年10月02日 09:54:52 +00:00Commented Oct 2, 2020 at 9:54
Benchmarks and a faster solution
Code including the solutions at the end. And note I also have an image further down that might be worth checking out before/while reading.
Empty iterable
First an empty iterable, as that's what motivated the question in the first place, and perhaps that also suggests that that happens non-negligibly often in what the OP is doing:
iterable = [] number = 2,000,000 = how often each function is run
1.26 1.25 1.23 original
0.65 0.65 0.65 loop_bool
0.86 0.86 0.86 loop_sentinel
1.00 1.00 1.01 default_and_builtin
1.30 1.33 1.32 try_and_builtin
0.73 0.72 0.75 for_and_builtin
0.74 0.73 0.75 for_and_builtin2
The two simpler loop solutions are faster than the original, as they save the cost of enumerate
creation. Slowest is the solution with try
, as "catching an exception is expensive". The other solution using the built-in any
instead of looping, the one trying next
with a default and then checking that, is faster, but still faster is the for_and_builtin
solution:
def any(iterable, *, default=False):
it = iter(iterable)
for first in it:
return bool(first) or _any(it)
return default
Its slightly less pretty variant for_and_builtin2
is equally fast, as the difference is only in the loop body, which doesn't come into play yet:
def any(iterable, *, default=False):
it = iter(iterable)
for first in it:
return True if first else _any(it)
return default
One-element iterables
iterable = [False] number = 2,000,000 iterable = [True]
1.35 1.36 1.38 original 1.26 1.24 1.24
0.73 0.75 0.73 loop_bool 0.70 0.69 0.68
0.93 0.95 0.93 loop_sentinel 0.87 0.89 0.88
1.34 1.36 1.36 default_and_builtin 1.22 1.20 1.21
1.12 1.13 1.11 try_and_builtin 1.02 0.99 0.99
1.05 1.07 1.07 for_and_builtin 0.95 0.95 0.95
0.87 0.89 0.88 for_and_builtin2 0.79 0.76 0.79
All got slower, except try_and_builtin
, which got faster since it doesn't pay the heavy price of catching an exception anymore. Still slightly slower than for_and_builtin
, though. And for_and_builtin2
was impacted far less and remains the second-fastest solution.
Two elements
Not much change here.
[False, False] number = 2,000,000 [False, True]
1.42 1.43 1.42 original 1.35 1.32 1.39
0.77 0.78 0.79 loop_bool 0.74 0.77 0.75
0.96 0.92 0.95 loop_sentinel 0.91 0.92 0.90
1.32 1.30 1.32 default_and_builtin 1.29 1.32 1.35
1.11 1.11 1.08 try_and_builtin 1.09 1.10 1.09
1.06 1.03 1.06 for_and_builtin 1.05 1.06 1.05
0.91 0.89 0.86 for_and_builtin2 0.86 0.88 0.86
Long iterables
At ten elements, the faster loop solutions are still competitive, but in the long runs, they become a lot slower than the solutions making the built-in any
do the hard work. Of the three loop solutions, loop_sentinel
becomes the fastest, as it has the least to do in each iteration. The three built-in users become equally fast.
iterable = [False] * 10**1 number = 2,000,000
2.03 2.02 2.02 original
1.25 1.25 1.24 loop_bool
1.30 1.26 1.26 loop_sentinel
1.43 1.42 1.37 default_and_builtin
1.19 1.17 1.19 try_and_builtin
1.18 1.15 1.13 for_and_builtin
0.96 0.96 0.99 for_and_builtin2
iterable = [False] * 10**2 number = 200,000
0.86 0.85 0.86 original
0.59 0.59 0.59 loop_bool
0.41 0.40 0.40 loop_sentinel
0.21 0.21 0.21 default_and_builtin
0.19 0.19 0.19 try_and_builtin
0.18 0.19 0.19 for_and_builtin
0.17 0.16 0.17 for_and_builtin2
iterable = [False] * 10**3 number = 20,000
0.93 0.93 0.93 original
0.52 0.51 0.53 loop_bool
0.32 0.32 0.32 loop_sentinel
0.09 0.09 0.09 default_and_builtin
0.09 0.09 0.09 try_and_builtin
0.09 0.09 0.09 for_and_builtin
0.09 0.09 0.09 for_and_builtin2
iterable = [False] * 10**6 number = 20
1.01 0.99 1.01 original
0.53 0.54 0.51 loop_bool
0.32 0.32 0.32 loop_sentinel
0.08 0.08 0.08 default_and_builtin
0.08 0.08 0.08 try_and_builtin
0.08 0.08 0.08 for_and_builtin
0.08 0.08 0.08 for_and_builtin2
Then again, how likely does the iterable have a million false elements but not a single true element? Probably rather unlikely. If we assume that every element has a 50% chance to be true, then there's only a one-in-a-million chance that there's no true element in the first 20 elements (if the iterable even is that long). So let's have a better look at [False] * n
for n
from 0 to 20:
closer look at 0 to 20 elements
Full benchmark code (for the textual outputs):
from timeit import repeat
from functools import partial
_any = any # Alias built-in any.
def original(iterable, *, default=False):
if not default:
return _any(iterable)
count = 0
for count, item in enumerate(iterable, start=1):
if item:
return True
return count == 0
def loop_bool(iterable, *, default=False):
if not default:
return _any(iterable)
empty = True
for item in iterable:
if item:
return True
empty = False
return empty
def loop_sentinel(iterable, *, default=False):
if not default:
return _any(iterable)
item = sentinel = object()
for item in iterable:
if item:
return True
return item is sentinel
def default_and_builtin(iterable, *, default=False):
it = iter(iterable)
sentinel = object()
first = next(it, sentinel)
if first is sentinel:
return default
return bool(first) or _any(it)
def try_and_builtin(iterable, *, default=False):
try:
it = iter(iterable)
return bool(next(it)) or _any(it)
except StopIteration:
return default
def for_and_builtin(iterable, *, default=False):
it = iter(iterable)
for first in it:
return bool(first) or _any(it)
return default
def for_and_builtin2(iterable, *, default=False):
it = iter(iterable)
for first in it:
return True if first else _any(it)
return default
funcs = original, loop_bool, loop_sentinel, default_and_builtin, try_and_builtin, for_and_builtin, for_and_builtin2
num = 2 * 10**6
tests = [
('[]', num),
('[False]', num),
('[True]', num),
('[False, False]', num),
('[False, True]', num),
('[False] * 10**1', num // 10**0),
('[False] * 10**2', num // 10**1),
('[False] * 10**3', num // 10**2),
('[False] * 10**6', num // 10**5),
]
for iterable, number in tests:
print('iterable =', iterable, f' {number = :,}')
iterable = eval(iterable)
times = [[] for _ in funcs]
for _ in range(3):
for func, ts in zip(funcs, times):
t = min(repeat(partial(func, iterable, default=True), number=number))
ts.append(t)
for func, ts in zip(funcs, times):
print(*('%.2f' % t for t in ts), func.__name__, sep=' ')
print()
NOTE: Since you provided more information about your actual problem, this answer will be very different to the already existing answers. Their pieces of advice are still valid : you really shouldn't break any()
.
Your actual problem
However, this will execute the query an load all records into RAM, which I want to avoid
As mentioned in this answer, you could simply call Peewee's query.exists()
to know if there's at least one record returned by this query.
The source code makes it clear it retrieves at most 1 record from the DB, and only returns a boolean:
@database_required
def exists(self, database):
clone = self.columns(SQL('1'))
clone._limit = 1
clone._offset = None
return bool(clone.scalar())
I tried a small example on my laptop:
query = Movie.select().where((Movie.year > 1950) & (Movie.year < 1960))
query.exists()
Here's the corresponding Postgres log (please notice LIMIT 1
):
SELECT 1 FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960)) LIMIT 1
Asking for the number of corresponding records with query.count()
also doesn't retrieve every row:
SELECT COUNT(1) FROM (SELECT 1 FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960))) AS "_wrapped"
Only when I iterate over every row does peewee send a complete query:
for movie in query:
print(movie.title)
SELECT "t1"."id", "t1"."title", "t1"."imdb_id", "t1"."year", "t1"."seen", "t1"."rating" FROM "movies" AS "t1" WHERE (("t1"."year" > 1950) AND ("t1"."year" < 1960))
any
, to see how the logic can be modified. \$\endgroup\$