Flatten Dictionary Python challenge

Question 1

I have a question about this coding challenge for "Flatten a Dictionary":

Given a dictionary dict, write a function flattenDictionary that returns a flattened version of it .

If you’re using a compiled language such Java, C++, C#, Swift and Go, you may want to use a Map/Dictionary/Hash Table that maps strings (keys) to a generic type (e.g. Object in Java, AnyObject in Swift etc.) to allow nested dictionaries.

Example:

Input:
dict = {
 "Key1" : "1",
 "Key2" : {
 "a" : "2",
 "b" : "3",
 "c" : {
 "d" : "3",
 "e" : "1"
 }
 }
}
Output:
{
 "Key1" : "1",
 "Key2.a" : "2",
 "Key2.b" : "3",
 "Key2.c.d" : "3",
 "Key2.c.e" : "1"
}
Important: when you concatenate keys, make sure to add the dot character between them. For instance concatenating Key2, c and d the result key would be Key2.c.d.

def flatten_dictionary(dictionary):
 def items():
 # loop through each item inside the dictionary k, v
 #Appending
 # check if the sub-key and sub-value are 
 # inside the flatten_dict(value)
 # join on subkey array
 # add to result
 # clear out prev_keys
 for key, value in dictionary.items():
 if isinstance(value, dict):
 for subkey, subvalue in flatten_dictionary(value).items():
 if key == "":
 yield subkey, subvalue
 yield key + "." + subkey, subvalue
 else:
 yield key, value
 return dict(items()) 
# test cases 1
dictionary2 = {
 "Key1" : "1",
 "Key2" : {
 "a" : "2",
 "b" : "3",
 "c" : {
 "d" : "3",
 "e" : "1"
 }
 }
 }
# output: {
# "Key1" : "1",
# "Key2.a" : "2",
# "Key2.b" : "3",
# "Key2.c.d" : "3",
# "Key2.c.e" : "1"
# }
print(flatten_dictionary(dictionary2))

Question 2

Flattening a dictionary is a weird request. How do you handle key collisions? Overwrites really don't make sense, because they are implementation dependent (at least before Python 3.4, I believe where dict behaves like OrderedDict), so you can't definitively say across python versions how it should behave. Your key + '.' + subkey also doesn't generalize, but keys are not always strings. Anything immutable that defines __hash__ and __eq__ can be a key.

Question 3

@BaileyParker Or the problem has to be properly described. The way it stands at the moment justifies all your concerns. This is what SO would classify as too broad. Imho always.

Question 4

You should at least add a link to the problem description, where these ambiguities might be resolved already.

Question 5

I updated the problem description so it is clear what the question is asking me to do.

Question 6

Your question currently leaves some things to be desired. I'd recommend taking a look at Simon's Guide to posting a good question. In particular, you could add a reason for why you are posting your code here, what do you want from a review? Also, how much have you tested this code?

Question 7

1. Bugs

The special case for empty keys:

if key == "":
 yield subkey, subvalue
yield key + "." + subkey, subvalue

is missing an else: and so leads to items appearing twice:

>>> flatten_dictionary({"": {"a":1}})
{'a': 1, '.a': 1}

Even if we fix the bug by adding the else:, there's still a problem. Consider this example:
```
>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}})
{'a.b': 2}
```
What happened to the 1? Ignoring the empty string keys has led two different keys to be collapsed into one. It would be better to remove the special case for empty strings, so that:
```
>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}})
{'a..b': 1, '.a.b': 2}
```

Python has a limited stack, but dictionaries can be nested arbitrarily deeply:

>>> d = {}
>>> for i in range(1000): d = {'a':d}
... 
>>> flatten_dictionary(d)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "cr186013.py", line 22, in flatten_dictionary
 return dict(items()) 
 [... many line omitted ...]
 File "cr186013.py", line 13, in items
 for subkey, subvalue in flatten_dictionary(value).items():
RecursionError: maximum recursion depth exceeded

This problem can be avoided using the stack of iterators pattern.

2. Other review points

The comment does not match the code: there is nothing in the code corresponding to "check if the sub-key and sub-value are inside the flatten_dict(value)" or "join on subkey array" or "clear out prev_keys". Incorrect comments like this are worse than useless: they make it harder to understand and maintain the code.

Did this comment describe an earlier version of the code, and then you changed the code but forgot to edit the comment? It is worth getting into the habit of changing the comment first so that you don't forget.
The construction of the result keys has unnecessary quadratic runtime. For example, in this situation:
```
>>> flatten_dictionary({'a': {'a': {'a': {'a': {'a': {'a': 1}}}}}})
{'a.a.a.a.a.a': 1}
```
there is only the need to concatenate one result key ('a.a.a.a.a.a'), but the code concatenates five result keys: not just the one that we need, but 'a.a', 'a.a.a', 'a.a.a.a' and 'a.a.a.a.a' as well. The way to avoid this is to keep a stack of dictionary keys currently being visited, and use str.join on the stack when you need to concatenate the result key.

3. Revised code

def flatten_dictionary(d):
 result = {}
 stack = [iter(d.items())]
 keys = []
 while stack:
 for k, v in stack[-1]:
 keys.append(k)
 if isinstance(v, dict):
 stack.append(iter(v.items()))
 break
 else:
 result['.'.join(keys)] = v
 keys.pop()
 else:
 if keys:
 keys.pop()
 stack.pop()
 return result

Question 8

I don't think it can pass this test cases: Input: {"":{"a":"1"},"b":"3"} Expected: {"a":"1","b":"3"} Actual: {'.a': '1', 'b': '3'}

Question 9

See §1.2 for why special handling of empty strings is a bad idea.

Gareth Rees Gareth Rees 50.1k3 gold badges130 silver badges210 bronze badges · Accepted Answer · 2018-01-27 13:07:27Z

1. Bugs

The special case for empty keys:

if key == "":
 yield subkey, subvalue
yield key + "." + subkey, subvalue

is missing an else: and so leads to items appearing twice:

>>> flatten_dictionary({"": {"a":1}})
{'a': 1, '.a': 1}

Even if we fix the bug by adding the else:, there's still a problem. Consider this example:
```
>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}})
{'a.b': 2}
```
What happened to the 1? Ignoring the empty string keys has led two different keys to be collapsed into one. It would be better to remove the special case for empty strings, so that:
```
>>> flatten_dictionary({"a": {"": {"b": 1}}, "": {"a": {"b": 2}}})
{'a..b': 1, '.a.b': 2}
```

Python has a limited stack, but dictionaries can be nested arbitrarily deeply:

>>> d = {}
>>> for i in range(1000): d = {'a':d}
... 
>>> flatten_dictionary(d)
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "cr186013.py", line 22, in flatten_dictionary
 return dict(items()) 
 [... many line omitted ...]
 File "cr186013.py", line 13, in items
 for subkey, subvalue in flatten_dictionary(value).items():
RecursionError: maximum recursion depth exceeded

This problem can be avoided using the stack of iterators pattern.

2. Other review points

The comment does not match the code: there is nothing in the code corresponding to "check if the sub-key and sub-value are inside the flatten_dict(value)" or "join on subkey array" or "clear out prev_keys". Incorrect comments like this are worse than useless: they make it harder to understand and maintain the code.

Did this comment describe an earlier version of the code, and then you changed the code but forgot to edit the comment? It is worth getting into the habit of changing the comment first so that you don't forget.
The construction of the result keys has unnecessary quadratic runtime. For example, in this situation:
```
>>> flatten_dictionary({'a': {'a': {'a': {'a': {'a': {'a': 1}}}}}})
{'a.a.a.a.a.a': 1}
```
there is only the need to concatenate one result key ('a.a.a.a.a.a'), but the code concatenates five result keys: not just the one that we need, but 'a.a', 'a.a.a', 'a.a.a.a' and 'a.a.a.a.a' as well. The way to avoid this is to keep a stack of dictionary keys currently being visited, and use str.join on the stack when you need to concatenate the result key.

3. Revised code

def flatten_dictionary(d):
 result = {}
 stack = [iter(d.items())]
 keys = []
 while stack:
 for k, v in stack[-1]:
 keys.append(k)
 if isinstance(v, dict):
 stack.append(iter(v.items()))
 break
 else:
 result['.'.join(keys)] = v
 keys.pop()
 else:
 if keys:
 keys.pop()
 stack.pop()
 return result

I don't think it can pass this test cases: Input: {"":{"a":"1"},"b":"3"} Expected: {"a":"1","b":"3"} Actual: {'.a': '1', 'b': '3'}
See §1.2 for why special handling of empty strings is a bad idea.

Stack Exchange Network

Flatten Dictionary Python challenge

1 Answer 1

1. Bugs

2. Other review points

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Hot Network Questions

Flatten Dictionary Python challenge

1 Answer 1

1. Bugs

2. Other review points

3. Revised code

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Linked

Related

Hot Network Questions