Pythonic way to flatten lists or arbitrary depth using assignment expressions

Question 1

I'm working through the classic "99 problems in prolog" collection. But trying to do them Pythonically. Flatten a list is problem 07. I am flattening a list of lists of arbitrary depth. I have two solutions below. Which is more Pythonic?

# helper function
def isList( x ): return isinstance( x, list )
def flatten3_h( *xs ): 
 if ( len(xs) == 0 ): return [] # base case, empty list/tuple
 hd, tl = xs[ 0 ], xs[ 1: ]
 if ( not(isList( hd ))): return [ hd ] + flatten3_h( *tl )
 return flatten3_h( *hd ) + flatten3_h( *tl )
def flatten4( *xs ):
 acc = []
 [ acc := acc + [x] if not(isList( x )) else acc + flatten4( *x ) for x in xs ]
 return acc

Here is example usage:

flatten4([ [] , [1, [2, [3, [], 4]], 5]] ) # => [1, 2, 3, 4, 5]

I'm guessing flatten4 is more Pythonic as it uses list comprehensions. But I'm not sure since it uses an assignment expression which I just learned about. As it's a new feature in python3.8.

Any feedback is appreciated.

Question 2

Where's flatten_2? Or, what do the names of your functions signify?

Question 3

Sorry if that was confusing, they're just different versions. flatten4 = "Flatten using list comprehension," whereas flatten3_h = "Flatten using a more direct recursive style, like Scheme for example." I did have 2 other previous versions based on ideas from another post on stackoverflow, but they didn't work on any list that had sublists nested deeper than 2 levels. I hope that makes sense.

Question 4

Welcome to Code Review.

Follow the PEP8 guide for naming conventions. Functions and variables would be snake_case.
When checking for empty list, you don't need to find the length. Empty lists/strings/sets are all falsy values in python.
Use better/fuller names. hd/tl -> head/tail.
There are a few inbuilt utilities in python, which might help you:

Question 5

Great advice, the 2nd point in particular is one that's hard to get used to, though it is very pythonic from what I've seen so far. Thanks for the tips.

Question 6

Representing lists as a head and a tail is a common method, but honestly not the most Pythonic way to do things. Additionally, everything you're doing is appending lists together - this is generally inefficient due to the way memory has to be reallocated and the data has to be moved. This StackOverflow question has some interesting details. (Although I talk about efficiency, generally recursion and deeply nested generators may not be the most efficient either. Anytime you're considering performance, make sure you understand the requirements of your functionality and measure your own performance before attempting to optimize too much)

Ultimately, I think the cleanest solution would be to create a generator:

def flatten(iterable):
 try:
 iterator = iter(iterable)
 except TypeError:
 # this is potentially controversial, depending on your use case
 # you may want a mutually recursive function, or a keyword argument, to 
 # indicate if you want to yield the input vs raise an error
 yield iterable 
 else:
 first_value = next(iterator)
 if first_value is iterable:
 yield first_value
 else:
 yield from flatten(first_value)
 for value in iterator:
 yield from flatten(value)
 
def get_flat_list(iterable, constructor=list):
 return constructor(flatten(iterable))

A few things to call out - explicit isinstance checks are not always a Pythonic solution. Many libraries and functions in the Python ecosystem make heavy use of duck typing - if it iterates like a list, then it is a list as far as that functionality is concerned. You should re-evaluate whether it is a strict requirement that lists are supported, or iterables. I suspect that you want most/all iterables to be supported, and thus see if there are other ways to do this. There are generally two ways to do this:

Just try, and handle the error
Direct inspection of properties to see if it can be used (e.g. hasattr(iterable, "__getitem__") or hasattr(iterable, "__iter__"))

In this case, I'm just trying. You might also notice the first_value is iterable stuff - this is to handle cases like a string, where both "abc" and "a" are iterable, and you would hit infinite recursion. In this case, I'm simply making sure that if the first item of an object is itself we don't recurse forever. This probably isn't comprehensive, but it handles at least the string case.

I've also stopped restricting us to returning lists - what if you want a tuple? Some other type?

Lastly, I've renamed things to match Python's PEP8 naming conventions

As Peilonrayz pointed out in the comments, isinstance has become more "accepted" outside of core Python development recently (generally, strict types are more in vogue overall).

For example, instead of using a try/except, you could do isinstance(foo, collections.abc.Iterable) instead of catching the TypeError when we do iterator = iter(iterable). Note - collections.abc.Iterable will only work for objects that define the __iter__ dunder-method, so __getitem__ based iteration won't be usable with this approach. This is another good reason to avoid using isinstance if you have a good alternative (in our case, checking for iter(iterable)).

Alternatively, some new proposed functionality in Python is isinstance based - see PEP 622. Theoretically, you could come up with something like this (using sequence patterns):

match iterable:
 case ((inner_head, *inner_tail), *tail):
 yield inner_head
 yield from flatten(inner_tail)
 yield from flatten(tail)
 case ():
 return
 case _:
 yield iterable

Question 7

Pythonic code comes in three main veins:

That it follows PEP 8.
Using features of Python that make code easier to read.
This doesn't mean shorter or use new features.
That it follows Python norms.
These are more norms in the sense that map, filter and reduce are no longer 'Pythonic' but are not mentioned in PEP 8.

In short these can be summed up as follow norms that make your code easy to read.

Your code doesn't do this.

- You shouldn't have spaces around parentheses. isList( x ) vs isList(x).
- You should use snake_case for function names. isList vs is_list.
- You should not have expressions on the same lines as ifs or defs
```
def isList( x ): return isinstance( x, list )
```
  vs
```
def is_list(x):
 return isinstance(x, list)
```
- You should indent with 4 spaces.
- You shouldn't have white space at the end of lines.
- You should have two newlines between top level functions (and classes).
- You should use descriptive variable names, what are xs, x, hd, tl?
- You should not have parentheses around if expressions. if ( len(xs) == 0 ): vs if len(xs) == 0:.
- You can use truthy rather than len. if len(xs) == 0: vs if not xs:
- You can use tuple unpacking with the * operator rather than xs[ 1: ].
- Not is not a function it is a keyword. It does not need parentheses.
- You shouldn't abuse list comprehensions to contain mutability.
```
[ acc := acc + [x] ... ]
```

In all more Pythonic versions of these functions would look like the following:

# helper function
def is_list(value):
 return isinstance(value, list)
def flatten3_h(*values):
 if not values:
 return [] # base case, empty list/tuple
 head, *tail = values
 if not is_list(head):
 return [head] + flatten3_h(*tail)
 return flatten3_h(*head) + flatten3_h(*tail)
def flatten4(*values):
 flattened = []
 for value in values:
 flattened += (
 [value]
 if not is_list(value) else
 flatten4(*value)
 )
 return flattened

I'm wondering which is more Pythonic

At this point they are both fairly Pythonic. But I would prefer to maintain flatten4.

Question 8

I still find flatten3_h unpythonic. Head/tail is apparently a (the?) proper way to think of lists in Prolog (which OP said this is from), but it's not in Python.

Question 9

@Peilonrayz I don't think Pythonicity is just about readability, but also very much about how you're doing things.

Question 10

flatten3_h([0] * 1000) gets me RecursionError: maximum recursion depth exceeded, even though that's a small and benign (not even nested at all) list. I don't think that's reasonable, and thus I submit it as evidence that it's not good to do it like that in Python and thus that it's unpythonic :-)

Question 11

Sure. Didn't think of it as bickering. Maybe I should've just written my view as an answer...

Question 12

Unlike Haskell and other functional languages, the head/tail idiom is simply not something you see much of in Python. Python doesn't optimize tail recursion, it's easy to blow the call stack, and there's a ton of allocation and GC action for every stack frame to basically achieve a simple for foo in bar loop in flatten3. OP is basically trying to impose functional programming style on Python, which is fundamentally not a functional language. flatten4 is much better than 3 in my opinion.

Question 13

PEP-8 issues aside, I found the answer from Peilonrayz (flatten4) the best. I would refine it further though, to make it less complicated, and more in line with EAPF programming:

def flatten4(values):
 flattened = []
 for value in values:
 try:
 flattened.extend(flatten4(value))
 except:
 flattened.append(value)
 return flattened

As mentioned by Dan Oberlam, use of isinstance is fairly accepted, but can mess with Duck Typing. Note three other small changes:

I replaced flattened += [value] with flattened.append(value), which is both clearer and faster.
Purely for consistency, I also replaced flattened += flatten4(value) with flattened.extend(flatten4(value)).
I removed the * from *values, as the problem description clearly states

flattening a list of lists of arbitrary depth

i.e. flattening a single list.

Question 14

Looks like some good points here. +1

Question 15

Neither is a good idea. Note the term arbitrary depth. It will be easy to make this explode by constructing input that is deeper than your stack, since you recurse and Python has an explicitly limited stack. Even if it didn't have a limited stack, Python does not have tail recursion optimization so this will be (relatively) slow. You should attempt to make an iterative solution instead.

Question 16

I'd say it's fairly common to have nested lists a few levels deep, but very uncommon to have lists nested more deeply than the stack limit. Though the stack limit issue truly is an issue for the first solution because of the extra recursions caused by treating it as head/tail. Even flatten3_h([0] * 1000) explodes for me, and that's a rather small and benign list.

Question 17

Other answers gave you great tips for the PEP8 compliance, implicit falseyness for empty lists and naming/style conventions. These are the first things I would have said if I code reviewed this first.

I wanted to add, if you are going to stick with the isinstance approach, that you can keep it while making your function a generator by replacing return with yield. You still have recursion, but now it will yield from itself instead of returning itself.

Something like:

def flatten(element):
 if isinstance(element, list):
 for item in element:
 yield from flatten(item)
 else:
 yield(element)

Just note that because it is a generator, you have to wrap it with list() to print output.

list(flatten(['a','b',['c','d']])) 
--> ['a', 'b', 'c', 'd']

hjpotter92 hjpotter92 8,9011 gold badge26 silver badges49 bronze badges · Answer 1 · 2020-07-30 13:19:30Z

Welcome to Code Review.

Follow the PEP8 guide for naming conventions. Functions and variables would be snake_case.
When checking for empty list, you don't need to find the length. Empty lists/strings/sets are all falsy values in python.
Use better/fuller names. hd/tl -> head/tail.
There are a few inbuilt utilities in python, which might help you:

Great advice, the 2nd point in particular is one that's hard to get used to, though it is very pythonic from what I've seen so far. Thanks for the tips.

Dan Oberlam Dan Oberlam 8,0492 gold badges33 silver badges74 bronze badges · Answer 2 · 2020-07-30 13:32:25Z

Representing lists as a head and a tail is a common method, but honestly not the most Pythonic way to do things. Additionally, everything you're doing is appending lists together - this is generally inefficient due to the way memory has to be reallocated and the data has to be moved. This StackOverflow question has some interesting details. (Although I talk about efficiency, generally recursion and deeply nested generators may not be the most efficient either. Anytime you're considering performance, make sure you understand the requirements of your functionality and measure your own performance before attempting to optimize too much)

Ultimately, I think the cleanest solution would be to create a generator:

def flatten(iterable):
 try:
 iterator = iter(iterable)
 except TypeError:
 # this is potentially controversial, depending on your use case
 # you may want a mutually recursive function, or a keyword argument, to 
 # indicate if you want to yield the input vs raise an error
 yield iterable 
 else:
 first_value = next(iterator)
 if first_value is iterable:
 yield first_value
 else:
 yield from flatten(first_value)
 for value in iterator:
 yield from flatten(value)
 
def get_flat_list(iterable, constructor=list):
 return constructor(flatten(iterable))

A few things to call out - explicit isinstance checks are not always a Pythonic solution. Many libraries and functions in the Python ecosystem make heavy use of duck typing - if it iterates like a list, then it is a list as far as that functionality is concerned. You should re-evaluate whether it is a strict requirement that lists are supported, or iterables. I suspect that you want most/all iterables to be supported, and thus see if there are other ways to do this. There are generally two ways to do this:

Just try, and handle the error
Direct inspection of properties to see if it can be used (e.g. hasattr(iterable, "__getitem__") or hasattr(iterable, "__iter__"))

In this case, I'm just trying. You might also notice the first_value is iterable stuff - this is to handle cases like a string, where both "abc" and "a" are iterable, and you would hit infinite recursion. In this case, I'm simply making sure that if the first item of an object is itself we don't recurse forever. This probably isn't comprehensive, but it handles at least the string case.

I've also stopped restricting us to returning lists - what if you want a tuple? Some other type?

Lastly, I've renamed things to match Python's PEP8 naming conventions

As Peilonrayz pointed out in the comments, isinstance has become more "accepted" outside of core Python development recently (generally, strict types are more in vogue overall).

For example, instead of using a try/except, you could do isinstance(foo, collections.abc.Iterable) instead of catching the TypeError when we do iterator = iter(iterable). Note - collections.abc.Iterable will only work for objects that define the __iter__ dunder-method, so __getitem__ based iteration won't be usable with this approach. This is another good reason to avoid using isinstance if you have a good alternative (in our case, checking for iter(iterable)).

Alternatively, some new proposed functionality in Python is isinstance based - see PEP 622. Theoretically, you could come up with something like this (using sequence patterns):

match iterable:
 case ((inner_head, *inner_tail), *tail):
 yield inner_head
 yield from flatten(inner_tail)
 yield from flatten(tail)
 case ():
 return
 case _:
 yield iterable

Peilonrayz ♦Peilonrayz 44.4k7 gold badges80 silver badges157 bronze badges · Answer 3 · 2020-07-30 13:21:26Z

Pythonic code comes in three main veins:

That it follows PEP 8.
Using features of Python that make code easier to read.
This doesn't mean shorter or use new features.
That it follows Python norms.
These are more norms in the sense that map, filter and reduce are no longer 'Pythonic' but are not mentioned in PEP 8.

In short these can be summed up as follow norms that make your code easy to read.

Your code doesn't do this.

- You shouldn't have spaces around parentheses. isList( x ) vs isList(x).
- You should use snake_case for function names. isList vs is_list.
- You should not have expressions on the same lines as ifs or defs
```
def isList( x ): return isinstance( x, list )
```
  vs
```
def is_list(x):
 return isinstance(x, list)
```
- You should indent with 4 spaces.
- You shouldn't have white space at the end of lines.
- You should have two newlines between top level functions (and classes).
- You should use descriptive variable names, what are xs, x, hd, tl?
- You should not have parentheses around if expressions. if ( len(xs) == 0 ): vs if len(xs) == 0:.
- You can use truthy rather than len. if len(xs) == 0: vs if not xs:
- You can use tuple unpacking with the * operator rather than xs[ 1: ].
- Not is not a function it is a keyword. It does not need parentheses.
- You shouldn't abuse list comprehensions to contain mutability.
```
[ acc := acc + [x] ... ]
```

In all more Pythonic versions of these functions would look like the following:

# helper function
def is_list(value):
 return isinstance(value, list)
def flatten3_h(*values):
 if not values:
 return [] # base case, empty list/tuple
 head, *tail = values
 if not is_list(head):
 return [head] + flatten3_h(*tail)
 return flatten3_h(*head) + flatten3_h(*tail)
def flatten4(*values):
 flattened = []
 for value in values:
 flattened += (
 [value]
 if not is_list(value) else
 flatten4(*value)
 )
 return flattened

I'm wondering which is more Pythonic

At this point they are both fairly Pythonic. But I would prefer to maintain flatten4.

I still find flatten3_h unpythonic. Head/tail is apparently a (the?) proper way to think of lists in Prolog (which OP said this is from), but it's not in Python.
@Peilonrayz I don't think Pythonicity is just about readability, but also very much about how you're doing things.
flatten3_h([0] * 1000) gets me RecursionError: maximum recursion depth exceeded, even though that's a small and benign (not even nested at all) list. I don't think that's reasonable, and thus I submit it as evidence that it's not good to do it like that in Python and thus that it's unpythonic :-)
Sure. Didn't think of it as bickering. Maybe I should've just written my view as an answer...
Unlike Haskell and other functional languages, the head/tail idiom is simply not something you see much of in Python. Python doesn't optimize tail recursion, it's easy to blow the call stack, and there's a ton of allocation and GC action for every stack frame to basically achieve a simple for foo in bar loop in flatten3. OP is basically trying to impose functional programming style on Python, which is fundamentally not a functional language. flatten4 is much better than 3 in my opinion.

nicolaum nicolaum 1312 bronze badges · Answer 4 · 2023-08-22 16:25:17Z

PEP-8 issues aside, I found the answer from Peilonrayz (flatten4) the best. I would refine it further though, to make it less complicated, and more in line with EAPF programming:

def flatten4(values):
 flattened = []
 for value in values:
 try:
 flattened.extend(flatten4(value))
 except:
 flattened.append(value)
 return flattened

As mentioned by Dan Oberlam, use of isinstance is fairly accepted, but can mess with Duck Typing. Note three other small changes:

I replaced flattened += [value] with flattened.append(value), which is both clearer and faster.
Purely for consistency, I also replaced flattened += flatten4(value) with flattened.extend(flatten4(value)).
I removed the * from *values, as the problem description clearly states

flattening a list of lists of arbitrary depth

i.e. flattening a single list.

\$\begingroup\$ Looks like some good points here. +1 \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2023年08月22日 17:59:35 +00:00
Commented Aug 22, 2023 at 17:59

Reinderien Reinderien 70.9k5 gold badges76 silver badges256 bronze badges · Answer 5 · 2020-07-30 13:40:49Z

Neither is a good idea. Note the term arbitrary depth. It will be easy to make this explode by constructing input that is deeper than your stack, since you recurse and Python has an explicitly limited stack. Even if it didn't have a limited stack, Python does not have tail recursion optimization so this will be (relatively) slow. You should attempt to make an iterative solution instead.

I'd say it's fairly common to have nested lists a few levels deep, but very uncommon to have lists nested more deeply than the stack limit. Though the stack limit issue truly is an issue for the first solution because of the extra recursions caused by treating it as head/tail. Even flatten3_h([0] * 1000) explodes for me, and that's a rather small and benign list.

user1717828 user1717828 2542 silver badges9 bronze badges · Answer 6 · 2020-07-31 18:14:24Z

Other answers gave you great tips for the PEP8 compliance, implicit falseyness for empty lists and naming/style conventions. These are the first things I would have said if I code reviewed this first.

I wanted to add, if you are going to stick with the isinstance approach, that you can keep it while making your function a generator by replacing return with yield. You still have recursion, but now it will yield from itself instead of returning itself.

Something like:

def flatten(element):
 if isinstance(element, list):
 for item in element:
 yield from flatten(item)
 else:
 yield(element)

Just note that because it is a generator, you have to wrap it with list() to print output.

list(flatten(['a','b',['c','d']])) 
--> ['a', 'b', 'c', 'd']

Stack Exchange Network

Pythonic way to flatten lists or arbitrary depth using assignment expressions

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Pythonic way to flatten lists or arbitrary depth using assignment expressions

6 Answers 6

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions