Here's my attempt to write a function that flattens a nested dictionary structure in Python 3.6
This came about because I had a nested dictionary structure of data I wanted to visualize in the library Bokeh
.
I didn't like other examples that require a recursive function call. (I find it difficult to reason about, and I believe python has a relative small recursion limit)
Tell me about the styling, ease of (or lack of) understanding and what you would do.
Here's my code:
def flatten_dict(dictionary, delimiter='.'):
"""
Function to flatten a nested dictionary structure
:param dictionary: A python dict()
:param delimiter: The desired delimiter between 'keys'
:return: dict()
"""
#
# Assign the actual dictionary to the one that will be manipulated
#
dictionary_ = dictionary
def unpack(parent_key, parent_value):
"""
A function to unpack one level of nesting in a python dictionary
:param parent_key: The key in the parent dictionary being flattened
:param parent_value: The value of the parent key, value pair
:return: list(tuple(,))
"""
#
# If the parent_value is a dict, unpack it
#
if isinstance(parent_value, dict):
return [
(parent_key + delimiter + key, value)
for key, value
in parent_value.items()
]
#
# If the If the parent_value is a not dict leave it be
#
else:
return [
(parent_key, parent_value)
]
#
# Keep unpacking the dictionary until all value's are not dictionary's
#
while True:
#
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level
#
dictionary_ = dict(
ii
for i
in [unpack(key, value) for key, value in dictionary_.items()]
for ii
in i
)
#
# Break when there is no more unpacking to do
#
if all([
not isinstance(value, dict)
for value
in dictionary_.values()
]):
break
return dictionary_
-
\$\begingroup\$ Hi, your code works like a charm. Consider making a lib/package of it. \$\endgroup\$Nikhil VJ– Nikhil VJ2019年06月24日 03:40:05 +00:00Commented Jun 24, 2019 at 3:40
2 Answers 2
Your code looks good, but I noticed a few PEP8 things that could be improved:
Why are you adding extra empty comment lines around your comments?
#
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level
#
This could just be:
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level
There's no need to add additional fluff to your comments.
Also, if your comment has multiple sentences in it, it looks nicer to end with a full stop:
# Loop over the dictionary, unpacking one level. Then reduce the dimension one level.
Docstrings should be written like a command, not an explanation.
Also, the first line of your docstring should be on the same line as the starting three quotes, but the ending three quotes should be on a new line.
This:
"""
Function to flatten a nested dictionary structure
:param dictionary: A python dict()
:param delimiter: The desired delimiter between 'keys'
:return: dict()
"""
could be this:
"""Flatten a nested dictionary structure.
Arguments:
dictionary -- dict to flatten
delimiter -- desired delimiter between keys
Return a flattened dict.
"""
-
\$\begingroup\$ Cheers for the review. To answer, I saw it done in someone else's module and thought i'd try it out. I did not know that about doc strings. I will make sure to structure them like a command in the future! Much appreciated :) \$\endgroup\$James Schinner– James Schinner2017年08月20日 07:13:25 +00:00Commented Aug 20, 2017 at 7:13
dictionary_ = dict( ii for i in [unpack(key, value) for key, value in dictionary_.items()] for ii in i )
This is probably the worst part in the code, yet this is pretty much the whole code. Variable names are terrible and the intent is a bit hidden. I understand that unpack
returns lists so your list-comprehension generates a list of lists. So this expression is flattening a list of list of tuples and turning it into a dictionary.
For starter, flattening iterables of iterables can be achieved more easily using itertools.chain.from_iterable
so you just need dictionary_ = dict(itertools.chain.from_iterable([unpack(key, value) for key, value in dictionary_.items()]))
and get rid of the awful variable names.
Second, and this also apply to the all()
call, you can use simpler generator expression instead of full list-comprehension to play it nicer with the memory. This only means removing the square brackets around the expression:
dictionary_ = dict(itertools.chain.from_iterable(
unpack(key, value) for key, value in dictionary_.items()
))
Lastly, but this is highly debatable, I would use itertools.starmap
instead of the whole expression:
dictionary_ = dict(itertools.chain.from_iterable(itertools.starmap(unpack, dictionary_.items()))
I would also change unpack
to fit more nicely with these changes: make it a generator instead of building lists (that you are going to discard right away). And just to propose an alternative way, I’ll go with an EAFP approach:
def unpack(parent_key, parent_value):
"""A function to unpack one level of nesting in a dictionary"""
try:
items = parent_value.items()
except AttributeError:
# parent_value was not a dict, no need to flatten
yield (parent_key, parent_value)
else:
for key, value in items:
yield (parent_key + delimiter + key, value)
You may also find this function to choke on dictionaries whose keys are not strings. This is due to the parent_key + delimiter + key
part that will fail to "concatenate" numbers and strings for instance. Or any custom object...
There are two solutions to this problem depending on the intended use-case:
- use
str.format
to convert each key to a string; - use a tuple as the flattened key instead of a string.
I prefer the second solution as a key of '3.14'
can ambiguously be comming from the key '3'
and '14'
or '3.14'
in the first solution. Whole code would be:
from itertools import chain, starmap
def flatten_dict(dictionary):
"""Flatten a nested dictionary structure"""
def unpack(parent_key, parent_value):
"""Unpack one level of nesting in a dictionary"""
try:
items = parent_value.items()
except AttributeError:
# parent_value was not a dict, no need to flatten
yield (parent_key, parent_value)
else:
for key, value in items:
yield (parent_key + (key,), value)
# Put each key into a tuple to initiate building a tuple of subkeys
dictionary = {(key,): value for key, value in dictionary.items()}
while True:
# Keep unpacking the dictionary until all value's are not dictionary's
dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items()))
if not any(isinstance(value, dict) for value in dictionary.values()):
break
return dictionary
-
1\$\begingroup\$ Must admit 'terrible variable names' was tough to stomach... haha. Is there a standard variable to name to indicate a non-meaningful value? Appreciate the explanation of the itertool functions. I find once i see a working example of its use. I start think how else i could use it. Ta! \$\endgroup\$James Schinner– James Schinner2017年08月21日 02:21:49 +00:00Commented Aug 21, 2017 at 2:21
-
\$\begingroup\$ @JamesSchinner by convention,
_
is used when there is a need for a variable name somewhere in the code but that variable is never used anywhere else. This is what non-meaningful is to me. Otherwise you need a name that help convey the intent. Some short names are universally understood, likei
,j
,k
for (loop) counters/indices orx
,y
,z
as point coordinates. But in general you want more descriptive names.unpacked
andflattened
can be replacement forii
andi
but I fail to find really good names. Naming is hard. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年08月21日 06:45:18 +00:00Commented Aug 21, 2017 at 6:45 -
\$\begingroup\$
dictionary = dict(chain.from_iterable(starmap(unpack, dictionary.items()))
is missing a right paren at the end. \$\endgroup\$Tim Hopper– Tim Hopper2018年11月30日 15:09:28 +00:00Commented Nov 30, 2018 at 15:09