Pythonic way to flatten nested dictionarys

Question 1

Here's my attempt to write a function that flattens a nested dictionary structure in Python 3.6

This came about because I had a nested dictionary structure of data I wanted to visualize in the library Bokeh.

I didn't like other examples that require a recursive function call. (I find it difficult to reason about, and I believe python has a relative small recursion limit)

Tell me about the styling, ease of (or lack of) understanding and what you would do.

Here's my code:

def flatten_dict(dictionary, delimiter='.'):
 """
 Function to flatten a nested dictionary structure
 :param dictionary: A python dict()
 :param delimiter: The desired delimiter between 'keys'
 :return: dict()
 """
 #
 # Assign the actual dictionary to the one that will be manipulated
 #
 dictionary_ = dictionary
 def unpack(parent_key, parent_value):
 """
 A function to unpack one level of nesting in a python dictionary
 :param parent_key: The key in the parent dictionary being flattened
 :param parent_value: The value of the parent key, value pair
 :return: list(tuple(,))
 """
 #
 # If the parent_value is a dict, unpack it
 #
 if isinstance(parent_value, dict):
 return [
 (parent_key + delimiter + key, value)
 for key, value
 in parent_value.items()
 ]
 #
 # If the If the parent_value is a not dict leave it be
 #
 else:
 return [
 (parent_key, parent_value)
 ]
 #
 # Keep unpacking the dictionary until all value's are not dictionary's
 #
 while True:
 #
 # Loop over the dictionary, unpacking one level. Then reduce the dimension one level
 #
 dictionary_ = dict(
 ii
 for i
 in [unpack(key, value) for key, value in dictionary_.items()]
 for ii
 in i
 )
 #
 # Break when there is no more unpacking to do
 #
 if all([
 not isinstance(value, dict)
 for value
 in dictionary_.values()
 ]):
 break
 return dictionary_

Question 2

Hi, your code works like a charm. Consider making a lib/package of it.

Question 3

Your code looks good, but I noticed a few PEP8 things that could be improved:

PEP8 for comments

Why are you adding extra empty comment lines around your comments?

#
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level
#

This could just be:

# Loop over the dictionary, unpacking one level. Then reduce the dimension one level

There's no need to add additional fluff to your comments.

Also, if your comment has multiple sentences in it, it looks nicer to end with a full stop:

# Loop over the dictionary, unpacking one level. Then reduce the dimension one level.

PEP257 for docstrings

Docstrings should be written like a command, not an explanation.

Also, the first line of your docstring should be on the same line as the starting three quotes, but the ending three quotes should be on a new line.

This:

"""
Function to flatten a nested dictionary structure
:param dictionary: A python dict()
:param delimiter: The desired delimiter between 'keys'
:return: dict()
"""

could be this:

"""Flatten a nested dictionary structure.
Arguments:
dictionary -- dict to flatten
delimiter -- desired delimiter between keys
Return a flattened dict.
"""

Question 4

Cheers for the review. To answer, I saw it done in someone else's module and thought i'd try it out. I did not know that about doc strings. I will make sure to structure them like a command in the future! Much appreciated :)

Question 5

 dictionary_ = dict(
 ii
 for i
 in [unpack(key, value) for key, value in dictionary_.items()]
 for ii
 in i
 )

This is probably the worst part in the code, yet this is pretty much the whole code. Variable names are terrible and the intent is a bit hidden. I understand that unpack returns lists so your list-comprehension generates a list of lists. So this expression is flattening a list of list of tuples and turning it into a dictionary.

For starter, flattening iterables of iterables can be achieved more easily using itertools.chain.from_iterable so you just need dictionary_ = dict(itertools.chain.from_iterable([unpack(key, value) for key, value in dictionary_.items()])) and get rid of the awful variable names.

Second, and this also apply to the all() call, you can use simpler generator expression instead of full list-comprehension to play it nicer with the memory. This only means removing the square brackets around the expression:

dictionary_ = dict(itertools.chain.from_iterable(
 unpack(key, value) for key, value in dictionary_.items()
))

Lastly, but this is highly debatable, I would use itertools.starmap instead of the whole expression:

dictionary_ = dict(itertools.chain.from_iterable(itertools.starmap(unpack, dictionary_.items()))

I would also change unpack to fit more nicely with these changes: make it a generator instead of building lists (that you are going to discard right away). And just to propose an alternative way, I’ll go with an EAFP approach:

def unpack(parent_key, parent_value):
 """A function to unpack one level of nesting in a dictionary"""
 try:
 items = parent_value.items()
 except AttributeError:
 # parent_value was not a dict, no need to flatten
 yield (parent_key, parent_value)
 else:
 for key, value in items:
 yield (parent_key + delimiter + key, value)

You may also find this function to choke on dictionaries whose keys are not strings. This is due to the parent_key + delimiter + key part that will fail to "concatenate" numbers and strings for instance. Or any custom object...

There are two solutions to this problem depending on the intended use-case:

use str.format to convert each key to a string;
use a tuple as the flattened key instead of a string.

I prefer the second solution as a key of '3.14' can ambiguously be comming from the key '3' and '14' or '3.14' in the first solution. Whole code would be:

from itertools import chain, starmap
def flatten_dict(dictionary):
 """Flatten a nested dictionary structure"""
 def unpack(parent_key, parent_value):
 """Unpack one level of nesting in a dictionary"""
 try:
 items = parent_value.items()
 except AttributeError:
 # parent_value was not a dict, no need to flatten
 yield (parent_key, parent_value)
 else:
 for key, value in items:
 yield (parent_key + (key,), value)
 # Put each key into a tuple to initiate building a tuple of subkeys
 dictionary = {(key,): value for key, value in dictionary.items()}
 while True:
 # Keep unpacking the dictionary until all value's are not dictionary's
 dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items()))
 if not any(isinstance(value, dict) for value in dictionary.values()):
 break
 return dictionary

Question 6

Must admit 'terrible variable names' was tough to stomach... haha. Is there a standard variable to name to indicate a non-meaningful value? Appreciate the explanation of the itertool functions. I find once i see a working example of its use. I start think how else i could use it. Ta!

Question 7

@JamesSchinner by convention, _ is used when there is a need for a variable name somewhere in the code but that variable is never used anywhere else. This is what non-meaningful is to me. Otherwise you need a name that help convey the intent. Some short names are universally understood, like i, j, k for (loop) counters/indices or x, y, z as point coordinates. But in general you want more descriptive names. unpacked and flattened can be replacement for ii and i but I fail to find really good names. Naming is hard.

Question 8

dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items())) is missing a right paren at the end.

LyricLy LyricLy 1865 bronze badges · Accepted Answer · 2017-08-20 07:09:45Z

Your code looks good, but I noticed a few PEP8 things that could be improved:

PEP8 for comments

Why are you adding extra empty comment lines around your comments?

#
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level
#

This could just be:

# Loop over the dictionary, unpacking one level. Then reduce the dimension one level

There's no need to add additional fluff to your comments.

Also, if your comment has multiple sentences in it, it looks nicer to end with a full stop:

# Loop over the dictionary, unpacking one level. Then reduce the dimension one level.

PEP257 for docstrings

Docstrings should be written like a command, not an explanation.

Also, the first line of your docstring should be on the same line as the starting three quotes, but the ending three quotes should be on a new line.

This:

"""
Function to flatten a nested dictionary structure
:param dictionary: A python dict()
:param delimiter: The desired delimiter between 'keys'
:return: dict()
"""

could be this:

"""Flatten a nested dictionary structure.
Arguments:
dictionary -- dict to flatten
delimiter -- desired delimiter between keys
Return a flattened dict.
"""

Cheers for the review. To answer, I saw it done in someone else's module and thought i'd try it out. I did not know that about doc strings. I will make sure to structure them like a command in the future! Much appreciated :)

Stack Exchange Network

Pythonic way to flatten nested dictionarys

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Pythonic way to flatten nested dictionarys

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions