Based on the excellent answers provided for my previous question, I've thoroughly revised my attempt at "deep" versions of all
, any
, etc. It uses the recommended encapsulated flattening function, and streamlines some unnecessary stuff.
def flatten(*sequence, preserve=False):
for item in sequence:
try:
assert hasattr(item, '__iter__') and not isinstance(item, str)
if preserve:
next(iter(item))
except (AssertionError, StopIteration):
yield item
else:
for i in item:
yield from flatten(i, preserve=preserve)
def dsum(*args, preserve=False, s=''):
if any(isinstance(item, str) for item in flatten(args)):
return s.join(map(str, flatten(args, preserve=preserve)))
iter_ = flatten(args, preserve=preserve)
first = next(iter_)
return sum(iter_, first)
def dlen(*args, preserve=False, s=False):
return sum(len(arg) if s and isinstance(arg, str) else 1 for arg in flatten(args, preserve=preserve))
The flatten()
function is "public" and intended to be accessed from outside the module. It handles an arbitrary number of arguments and generates each non-iterable element (strings are treated as non-iterable). If the keyword argument preserve
is set to True
, empty iterables are preserved.
I am not aware of any way to use if
to test whether a generator/iterable has any items to yield
- is there such a way? If not, is my use of assert
an acceptable way of turning the if
statements into exceptions, or is it better to use if... raise
, or something else entirely?
The dsum()
function adds together all the items yielded by flatten()
. If preserve
is True
, empty iterables are preserved. If at least one item is a string, everything is converted into a string and joined, connected by the optional s
argument. If there are no strings, all items after the first are added onto the first item. If there aren't any strings and there are items that can't be added, the function fails, which I find acceptable.
The dlen()
function finds the total length of all the items yielded by flatten()
. If preserve
is True
, empty iterables are preserved and counted as a length of 1. Strings are counted as objects of length 1 unless the optional keyword argument s
is True
, in which case their actual lengths (number of characters) are counted.
I have removed dany()
, dall()
, etc. because they were just wrappers around flatten()
with zero additional logic.
Test suite (passes):
from deep import *
import datetime as dt
assert dsum([1,2], 3, '1','2','3','456', []) == '123123456'
assert dsum([1,2], 3, '1','2','3','456', [], preserve=True) == '123123456[]'
assert dsum('a', 'b') == 'ab'
assert dsum(1, [2,[3,[4,[5],[]]]]) == 15
# dsum(1, [2,[3,[4,[5],[]]]], preserve=True) # can't sum int with preserved list
assert all(flatten(1, 1, [1, []]))
assert not all(flatten(1, 1, [1, []], preserve=True))
assert all(flatten(1, 1, set()))
assert not all(flatten(1, 1, set(), preserve=True))
assert list(flatten((1, 1, []))) == [1,1]
assert list(flatten((1, 1, []), preserve=True)) == [1,1,[]]
assert dsum(dt.timedelta(3), dt.timedelta(4), s=dt.timedelta()) == dt.timedelta(7)
2 Answers 2
A few things for the flatten
function:
First, your code won't work when an item is an iterator and preserve=True
, since it will consume the first sub-item in the iterator, making the next for
loop skip the first item. A better approach would be to run the if
test, then check if i
has been defined. If it hasn't, then you can yield the item. I personally use a short-circuit continue
to avoid nesting too deeply, but this again is personal preference.
Second, you really shouldn't use assert
here, you can just use an if
test to short-circuit.
Third, this may be a personal thing, but I don't like explicitly testing if something is a str
. For example, if you don't want to flatten a str
, you probably don't want to flatten a bytes
, either. I prefer to duck-type for a str
-like object using hasattr(item, 'lower')
, which tests for the lower
method. This should really only be present on str
-like objects, although I admit it isn't perfect.
Fourth, some of your other functions would be easier if you include an option to split strings.
Fifth, just because an item has the __iter__
method does not mean it is iterable. It may have something in the method that raises some except. So it is better to test if it raises an except if you try to make it iterable.
For the dsum
function:
I don't see the point of preserve
here. How would adding an empty iterable to anything do anything?
Second, I would really put the string-handling function as an option. So perhaps if s=False
, then you can raise an error since the items cannot be combined. And I would use a more descriptive name (sep
is common).
Third, again this won't work if any items are iterators, since it runs through them twice. There is no way around this other than to convert the items to a list or something like that.
Fourth, I think a generator expression is easier to understand than a map.
For dlen
:
If you include the option I mentioned above to flatten strings, this becomes much simpler. You just pass that argument along to the flatten
function.
So here are the resulting functions
def flatten(*sequence, preserve=False, flat_str=False):
for item in sequence:
try:
iter(item)
except BaseException:
yield item
continue
if hasattr(item, 'lower') and (not flat_str or len(item) == 1):
yield item
continue
for i in item:
yield from flatten(i, preserve=preserve, flat_str=flat_str)
if not preserve:
continue
try:
i = i
except NameError:
yield item
def dsum(*args, preserve=False, sep='', flat_str=False):
if sep is False:
items = flatten(args, preserve=preserve)
first = next(items)
return sum(items, first)
items = list(flatten(args, preserve=preserve, flat_str=flat_str))
if any(hasattr(item, 'lower') for item in items):
return sep.join(str(x) for x in items)
return sum(items[1:], items[0])
def dlen(*args, preserve=False, flat_str=False):
return sum(1 for arg in flatten(args, preserve=preserve, flat_str=flat_str))
-
\$\begingroup\$ 1: Oops. 2: Sounds fine. 3: Good point - any other types like these? 4: In what way? 5: I see. --
dsum
: 1: So thatall(flatten([1],[]))
works likeall([[1],[]])
. 2: What's the purpose ofsep=False
? Just to suppress the "if there's a string, convert all to string" behavior? -s
wasn't always a separator (and may now beFalse
), sos
might be more appropriate thansep
. 3: How doeslist
conversion affect performance here? 4: Either way, I guess. -- What's the purpose offlat_str
? --if sep is False
should beif sep == False
. \$\endgroup\$TigerhawkT3– TigerhawkT32015年04月17日 21:11:45 +00:00Commented Apr 17, 2015 at 21:11 -
\$\begingroup\$ 3: no, that is the main one. 4: I explain that in the
dlen
section.dsum
: 2: one-character arguments are discouraged. 3: It will hurt performance, but there is no other way I am aware of to make this work reliably. 4: I explainflat_str
in the fourth point for the first function. Empty strings are alsoFalse
, so'' == False
isTrue
, while'' is False
isFalse
. \$\endgroup\$TheBlackCat– TheBlackCat2015年04月20日 12:03:12 +00:00Commented Apr 20, 2015 at 12:03 -
\$\begingroup\$ Actually, although
bool('') == False
evaluates toTrue
,'' == False
evaluates toFalse
. \$\endgroup\$TigerhawkT3– TigerhawkT32015年04月20日 18:57:50 +00:00Commented Apr 20, 2015 at 18:57
That is an inappropriate use of an assertion. You should only assert conditions that you know must be true. Furthermore, you must not rely on assertions for correct execution of your program, since assertions can be disabled.
I would also add that I find the execution flow in flatten()
difficult to follow.
-
\$\begingroup\$ How about
raise
, replacingassert hasattr(item, '__iter__') and not isinstance(item, str)
withif not hasattr(item, '__iter__') or isinstance(item, str): raise StopIteration
? \$\endgroup\$TigerhawkT3– TigerhawkT32015年04月17日 05:58:19 +00:00Commented Apr 17, 2015 at 5:58 -
\$\begingroup\$ Mixing exceptions, conditionals, and loops is very confusing. Can't you find some way that doesn't use exceptions at all? \$\endgroup\$200_success– 200_success2015年04月17日 09:55:45 +00:00Commented Apr 17, 2015 at 9:55
-
\$\begingroup\$ I'm not sure how else to check whether an iterator/generator contains elements. Is there some equivalent to
if len(mylist)
that works even on generators like(item*2 for item in mylist)
? \$\endgroup\$TigerhawkT3– TigerhawkT32015年04月17日 18:57:43 +00:00Commented Apr 17, 2015 at 18:57
join()
instead of just a single iterable - in other words, I kept doing''.join(string1, string2)
and getting a traceback. :) \$\endgroup\$