I have a question about this coding challenge for "Flatten a Dictionary":
Given a dictionary dict, write a function flattenDictionary that returns a flattened version of it .
If you’re using a compiled language such Java, C++, C#, Swift and Go, you may want to use a Map/Dictionary/Hash Table that maps strings (keys) to a generic type (e.g. Object in Java, AnyObject in Swift etc.) to allow nested dictionaries.
Example:
Input:
dict = { "Key1" : "1", "Key2" : { "a" : "2", "b" : "3", "c" : { "d" : "3", "e" : "1" } } }
Output:
{ "Key1" : "1", "Key2.a" : "2", "Key2.b" : "3", "Key2.c.d" : "3", "Key2.c.e" : "1" }
Important: when you concatenate keys, make sure to add the dot character between them. For instance concatenating Key2, c and d the result key would be Key2.c.d.
def flatten_dictionary(dictionary):
def items():
# loop through each item inside the dictionary k, v
#Appending
# check if the sub-key and sub-value are
# inside the flatten_dict(value)
# join on subkey array
# add to result
# clear out prev_keys
for key, value in dictionary.items():
if isinstance(value, dict):
for subkey, subvalue in flatten_dictionary(value).items():
if key == "":
yield subkey, subvalue
yield key + "." + subkey, subvalue
else:
yield key, value
return dict(items())
# test cases 1
dictionary2 = {
"Key1" : "1",
"Key2" : {
"a" : "2",
"b" : "3",
"c" : {
"d" : "3",
"e" : "1"
}
}
}
# output: {
# "Key1" : "1",
# "Key2.a" : "2",
# "Key2.b" : "3",
# "Key2.c.d" : "3",
# "Key2.c.e" : "1"
# }
print(flatten_dictionary(dictionary2))
1 Answer 1
1. Bugs
The special case for empty keys:
if key == "": yield subkey, subvalue yield key + "." + subkey, subvalue
is missing an
else:
and so leads to items appearing twice:>>> flatten_dictionary({"": {"a":1}}) {'a': 1, '.a': 1}
Even if we fix the bug by adding the
else:
, there's still a problem. Consider this example:>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}}) {'a.b': 2}
What happened to the
1
? Ignoring the empty string keys has led two different keys to be collapsed into one. It would be better to remove the special case for empty strings, so that:>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}}) {'a..b': 1, '.a.b': 2}
Python has a limited stack, but dictionaries can be nested arbitrarily deeply:
>>> d = {} >>> for i in range(1000): d = {'a':d} ... >>> flatten_dictionary(d) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "cr186013.py", line 22, in flatten_dictionary return dict(items()) [... many line omitted ...] File "cr186013.py", line 13, in items for subkey, subvalue in flatten_dictionary(value).items(): RecursionError: maximum recursion depth exceeded
This problem can be avoided using the stack of iterators pattern.
2. Other review points
The comment does not match the code: there is nothing in the code corresponding to "check if the sub-key and sub-value are inside the flatten_dict(value)" or "join on subkey array" or "clear out prev_keys". Incorrect comments like this are worse than useless: they make it harder to understand and maintain the code.
Did this comment describe an earlier version of the code, and then you changed the code but forgot to edit the comment? It is worth getting into the habit of changing the comment first so that you don't forget.
The construction of the result keys has unnecessary quadratic runtime. For example, in this situation:
>>> flatten_dictionary({'a': {'a': {'a': {'a': {'a': {'a': 1}}}}}}) {'a.a.a.a.a.a': 1}
there is only the need to concatenate one result key (
'a.a.a.a.a.a'
), but the code concatenates five result keys: not just the one that we need, but'a.a'
,'a.a.a'
, 'a.a.a.a
' and'a.a.a.a.a'
as well. The way to avoid this is to keep a stack of dictionary keys currently being visited, and usestr.join
on the stack when you need to concatenate the result key.
3. Revised code
def flatten_dictionary(d):
result = {}
stack = [iter(d.items())]
keys = []
while stack:
for k, v in stack[-1]:
keys.append(k)
if isinstance(v, dict):
stack.append(iter(v.items()))
break
else:
result['.'.join(keys)] = v
keys.pop()
else:
if keys:
keys.pop()
stack.pop()
return result
-
\$\begingroup\$ I don't think it can pass this test cases: Input: {"":{"a":"1"},"b":"3"} Expected: {"a":"1","b":"3"} Actual: {'.a': '1', 'b': '3'} \$\endgroup\$NinjaG– NinjaG2018年01月28日 04:38:48 +00:00Commented Jan 28, 2018 at 4:38
-
\$\begingroup\$ See §1.2 for why special handling of empty strings is a bad idea. \$\endgroup\$Gareth Rees– Gareth Rees2018年01月28日 10:14:53 +00:00Commented Jan 28, 2018 at 10:14
dict
behaves likeOrderedDict
), so you can't definitively say across python versions how it should behave. Yourkey + '.' + subkey
also doesn't generalize, but keys are not always strings. Anything immutable that defines__hash__
and__eq__
can be a key. \$\endgroup\$