I'm working through the classic "99 problems in prolog" collection. But trying to do them Pythonically. Flatten a list is problem 07. I am flattening a list of lists of arbitrary depth. I have two solutions below. Which is more Pythonic?
# helper function
def isList( x ): return isinstance( x, list )
def flatten3_h( *xs ):
if ( len(xs) == 0 ): return [] # base case, empty list/tuple
hd, tl = xs[ 0 ], xs[ 1: ]
if ( not(isList( hd ))): return [ hd ] + flatten3_h( *tl )
return flatten3_h( *hd ) + flatten3_h( *tl )
def flatten4( *xs ):
acc = []
[ acc := acc + [x] if not(isList( x )) else acc + flatten4( *x ) for x in xs ]
return acc
Here is example usage:
flatten4([ [] , [1, [2, [3, [], 4]], 5]] ) # => [1, 2, 3, 4, 5]
I'm guessing flatten4
is more Pythonic as it uses list comprehensions.
But I'm not sure since it uses an assignment expression which I just learned about. As it's a new feature in python3.8.
Any feedback is appreciated.
6 Answers 6
Welcome to Code Review.
Follow the PEP8 guide for naming conventions. Functions and variables would be
snake_case
.When checking for empty list, you don't need to find the length. Empty lists/strings/sets are all falsy values in python.
Use better/fuller names.
hd
/tl
->head
/tail
.There are a few inbuilt utilities in python, which might help you:
-
\$\begingroup\$ Great advice, the 2nd point in particular is one that's hard to get used to, though it is very pythonic from what I've seen so far. Thanks for the tips. \$\endgroup\$Hank Igoe– Hank Igoe2020年07月30日 15:38:03 +00:00Commented Jul 30, 2020 at 15:38
Representing lists as a head and a tail is a common method, but honestly not the most Pythonic way to do things. Additionally, everything you're doing is appending lists together - this is generally inefficient due to the way memory has to be reallocated and the data has to be moved. This StackOverflow question has some interesting details. (Although I talk about efficiency, generally recursion and deeply nested generators may not be the most efficient either. Anytime you're considering performance, make sure you understand the requirements of your functionality and measure your own performance before attempting to optimize too much)
Ultimately, I think the cleanest solution would be to create a generator:
def flatten(iterable):
try:
iterator = iter(iterable)
except TypeError:
# this is potentially controversial, depending on your use case
# you may want a mutually recursive function, or a keyword argument, to
# indicate if you want to yield the input vs raise an error
yield iterable
else:
first_value = next(iterator)
if first_value is iterable:
yield first_value
else:
yield from flatten(first_value)
for value in iterator:
yield from flatten(value)
def get_flat_list(iterable, constructor=list):
return constructor(flatten(iterable))
A few things to call out - explicit isinstance
checks are not always a Pythonic solution. Many libraries and functions in the Python ecosystem make heavy use of duck typing - if it iterates like a list, then it is a list as far as that functionality is concerned. You should re-evaluate whether it is a strict requirement that lists are supported, or iterables. I suspect that you want most/all iterables to be supported, and thus see if there are other ways to do this. There are generally two ways to do this:
- Just try, and handle the error
- Direct inspection of properties to see if it can be used (e.g.
hasattr(iterable, "__getitem__") or hasattr(iterable, "__iter__")
)
In this case, I'm just try
ing. You might also notice the first_value is iterable
stuff - this is to handle cases like a string, where both "abc"
and "a"
are iterable, and you would hit infinite recursion. In this case, I'm simply making sure that if the first item of an object is itself we don't recurse forever. This probably isn't comprehensive, but it handles at least the string case.
I've also stopped restricting us to returning lists - what if you want a tuple? Some other type?
Lastly, I've renamed things to match Python's PEP8 naming conventions
As Peilonrayz pointed out in the comments, isinstance
has become more "accepted" outside of core Python development recently (generally, strict types are more in vogue overall).
For example, instead of using a try/except
, you could do isinstance(foo, collections.abc.Iterable)
instead of catching the TypeError
when we do iterator = iter(iterable)
. Note - collections.abc.Iterable
will only work for objects that define the __iter__
dunder-method, so __getitem__
based iteration won't be usable with this approach. This is another good reason to avoid using isinstance
if you have a good alternative (in our case, checking for iter(iterable)
).
Alternatively, some new proposed functionality in Python is isinstance
based - see PEP 622. Theoretically, you could come up with something like this (using sequence patterns):
match iterable:
case ((inner_head, *inner_tail), *tail):
yield inner_head
yield from flatten(inner_tail)
yield from flatten(tail)
case ():
return
case _:
yield iterable
Pythonic code comes in three main veins:
- That it follows PEP 8.
- Using features of Python that make code easier to read.
This doesn't mean shorter or use new features. - That it follows Python norms.
These are more norms in the sense thatmap
,filter
andreduce
are no longer 'Pythonic' but are not mentioned in PEP 8.
In short these can be summed up as follow norms that make your code easy to read.
Your code doesn't do this.
-
You shouldn't have spaces around parentheses.
isList( x )
vsisList(x)
.You should use
snake_case
for function names.isList
vsis_list
.You should not have expressions on the same lines as
if
s ordef
sdef isList( x ): return isinstance( x, list )
vs
def is_list(x): return isinstance(x, list)
You should indent with 4 spaces.
You shouldn't have white space at the end of lines.
You should have two newlines between top level functions (and classes).
You should use descriptive variable names, what are
xs
,x
,hd
,tl
?
-
- You should not have parentheses around if expressions.
if ( len(xs) == 0 ):
vsif len(xs) == 0:
. - You can use truthy rather than
len
.if len(xs) == 0:
vsif not xs:
- You can use tuple unpacking with the
*
operator rather thanxs[ 1: ]
. - Not is not a function it is a keyword. It does not need parentheses.
- You should not have parentheses around if expressions.
-
You shouldn't abuse list comprehensions to contain mutability.
[ acc := acc + [x] ... ]
In all more Pythonic versions of these functions would look like the following:
# helper function
def is_list(value):
return isinstance(value, list)
def flatten3_h(*values):
if not values:
return [] # base case, empty list/tuple
head, *tail = values
if not is_list(head):
return [head] + flatten3_h(*tail)
return flatten3_h(*head) + flatten3_h(*tail)
def flatten4(*values):
flattened = []
for value in values:
flattened += (
[value]
if not is_list(value) else
flatten4(*value)
)
return flattened
I'm wondering which is more Pythonic
At this point they are both fairly Pythonic. But I would prefer to maintain flatten4
.
-
4\$\begingroup\$ I still find
flatten3_h
unpythonic. Head/tail is apparently a (the?) proper way to think of lists in Prolog (which OP said this is from), but it's not in Python. \$\endgroup\$superb rain– superb rain2020年07月30日 13:53:51 +00:00Commented Jul 30, 2020 at 13:53 -
2\$\begingroup\$ @Peilonrayz I don't think Pythonicity is just about readability, but also very much about how you're doing things. \$\endgroup\$superb rain– superb rain2020年07月30日 14:00:10 +00:00Commented Jul 30, 2020 at 14:00
-
5\$\begingroup\$
flatten3_h([0] * 1000)
gets meRecursionError: maximum recursion depth exceeded
, even though that's a small and benign (not even nested at all) list. I don't think that's reasonable, and thus I submit it as evidence that it's not good to do it like that in Python and thus that it's unpythonic :-) \$\endgroup\$superb rain– superb rain2020年07月30日 14:27:50 +00:00Commented Jul 30, 2020 at 14:27 -
4\$\begingroup\$ Sure. Didn't think of it as bickering. Maybe I should've just written my view as an answer... \$\endgroup\$superb rain– superb rain2020年07月30日 14:36:10 +00:00Commented Jul 30, 2020 at 14:36
-
3\$\begingroup\$ Unlike Haskell and other functional languages, the head/tail idiom is simply not something you see much of in Python. Python doesn't optimize tail recursion, it's easy to blow the call stack, and there's a ton of allocation and GC action for every stack frame to basically achieve a simple
for foo in bar
loop inflatten3
. OP is basically trying to impose functional programming style on Python, which is fundamentally not a functional language.flatten4
is much better than 3 in my opinion. \$\endgroup\$ggorlen– ggorlen2020年07月30日 15:20:04 +00:00Commented Jul 30, 2020 at 15:20
PEP-8 issues aside, I found the answer from Peilonrayz (flatten4
) the best. I would refine it further though, to make it less complicated, and more in line with EAPF programming:
def flatten4(values):
flattened = []
for value in values:
try:
flattened.extend(flatten4(value))
except:
flattened.append(value)
return flattened
As mentioned by Dan Oberlam, use of isinstance
is fairly accepted, but can mess with Duck Typing.
Note three other small changes:
I replaced
flattened += [value]
withflattened.append(value)
, which is both clearer and faster.Purely for consistency, I also replaced
flattened += flatten4(value)
withflattened.extend(flatten4(value))
.I removed the
*
from*values
, as the problem description clearly statesflattening a list of lists of arbitrary depth
i.e. flattening a single list.
-
\$\begingroup\$ Looks like some good points here. +1 \$\endgroup\$2023年08月22日 17:59:35 +00:00Commented Aug 22, 2023 at 17:59
Neither is a good idea. Note the term arbitrary depth
. It will be easy to make this explode by constructing input that is deeper than your stack, since you recurse and Python has an explicitly limited stack. Even if it didn't have a limited stack, Python does not have tail recursion optimization so this will be (relatively) slow. You should attempt to make an iterative solution instead.
-
2\$\begingroup\$ I'd say it's fairly common to have nested lists a few levels deep, but very uncommon to have lists nested more deeply than the stack limit. Though the stack limit issue truly is an issue for the first solution because of the extra recursions caused by treating it as head/tail. Even
flatten3_h([0] * 1000)
explodes for me, and that's a rather small and benign list. \$\endgroup\$superb rain– superb rain2020年07月30日 14:19:43 +00:00Commented Jul 30, 2020 at 14:19
Other answers gave you great tips for the PEP8 compliance, implicit falseyness for empty lists and naming/style conventions. These are the first things I would have said if I code reviewed this first.
I wanted to add, if you are going to stick with the isinstance
approach, that you can keep it while making your function a generator by replacing return
with yield
. You still have recursion, but now it will yield from
itself instead of return
ing itself.
Something like:
def flatten(element):
if isinstance(element, list):
for item in element:
yield from flatten(item)
else:
yield(element)
Just note that because it is a generator, you have to wrap it with list()
to print output.
list(flatten(['a','b',['c','d']]))
--> ['a', 'b', 'c', 'd']
flatten_2
? Or, what do the names of your functions signify? \$\endgroup\$