What is the most Pythonic way to take a dict of lists and produce a new dict
with the list items as keys and the previous dict
's keys as list items.
Here's a visual explanation:
favorite_fruits = {"alice": {"apple", "orange"}, "bob": {"apple"}, "carol": {"orange"}}
people_by_fruit = {"orange": {"carol", "alice"}, "apple": {"bob", "alice"}}
Here's the best I have at the moment:
from collections import defaultdict
favorite_fruits = {"alice": {"apple", "orange"}, "bob": {"apple"}, "carol": {"orange"}}
people_by_fruit = defaultdict(set)
for person, fruit in favorite_fruits.items():
for fruit in fruit:
people_by_fruit[fruit].add(person)
2 Answers 2
Actually, I do believe you are quite good as you are. The simple inversion listed in this similar question (from comments) does not work when you want to split up the set values of your first dict. You could try something like a dict comprehension with a double for loop, but that doesn't work either as the second time you get a fruit it will overwrite the first one.
The only thing I would like to change in your answer is to use the plural of fruit, fruits
, so that you don't do the for fruit in fruit
which looks kind of hairy, and has the potential for code breaking as you're overwriting a variable with the same variable. Not good. In other words:
people_by_fruit = defaultdict(set)
for person, fruits in favorite_fruits.items():
for fruit in fruits:
people_by_fruit[fruit].add(person)
-
\$\begingroup\$ Nice catch on the repeat variable name. Thanks for the feedback! \$\endgroup\$Trey Hunner– Trey Hunner2015年11月15日 06:32:32 +00:00Commented Nov 15, 2015 at 6:32
First of all my opinion is that your version is quite close to the best one. But it is possible to use a single for
cycle or write it in just one line by use of map()
and list compression instead of nested for
cycles:
from collections import defaultdict
direct = {"a": [1, 2, 3], "b": [3], "c": [2, 4, 5], "d": [6]}
def invert(d):
ret = defaultdict(set)
for key, values in d.items():
for value in values:
ret[value].add(key)
return ret
def invert_alt(d):
ret = defaultdict(set)
list(map(lambda h: ret[h[1]].add(h[0]), [(key, value) for key in d for value in d[key]]))
return ret
def invert_final(d):
ret = defaultdict(set)
for key, value in [(key, value) for key in d for value in d[key]]:
ret[value].add(key)
return ret
print(invert(direct))
print(invert_alt(direct))
print(invert_final(direct))
Is it clear that invert_alt()
have too much issues to use it:
- You should use
list()
trick just in Python3 becausemap()
is a generator and not evaluated until the code access to generator element, you don't need it in Python2. - This implementation uses
map
's side effect to do its job and my position is to avoid any use of side effects to complete the core jobs. - Is really hard to understand.
For invert_final()
you pay a little bit in clearness to remove a nested indentation: maybe a good compromise. Due to Python's formatting if you remove nesting indentation that is always a good goal.
-
\$\begingroup\$ In
invert_final
you don't remove a nested cycle, you add a cycle and a list comprehension. Ininvert_alt
you add a list generation to force execution of generator of an extra lamda/map expression. Both versions going a little backwards, proving that the OP version is actually a good version: Readable, memory efficient and effective. \$\endgroup\$holroy– holroy2015年11月15日 11:12:30 +00:00Commented Nov 15, 2015 at 11:12 -
\$\begingroup\$ Sorry,, I've used wrong term. I meant remove nestled indentation. Remove nested indentations are always a good point in Python. As I wrote
invert_alt
is wrong but I wrote it just because OP ask me in the comment. I've pointed that you pay little in readability but you remove indentation, nested indentation is a well know Python issue so remove one could be a valuable goal. \$\endgroup\$Michele d'Amico– Michele d'Amico2015年11月15日 11:23:45 +00:00Commented Nov 15, 2015 at 11:23 -
\$\begingroup\$ @holroy just for the records, Python developer put lot effort in list compression syntax and tools like
zip
map
to make simple remove some nested indentations and make code more compact.. Try to write a five nestedfor
cycle with aif
logic inside ... You will lovezip
construct. \$\endgroup\$Michele d'Amico– Michele d'Amico2015年11月15日 11:31:24 +00:00Commented Nov 15, 2015 at 11:31 -
\$\begingroup\$ Readability and good line lengths are also good, if not better. Removing indentation on cost of clarity, is not a good trade off in my book. \$\endgroup\$holroy– holroy2015年11月15日 11:33:53 +00:00Commented Nov 15, 2015 at 11:33
-
\$\begingroup\$ I do love the different comprehensions available, not that keen on the not so recommended
map
though useful in some contexts. Choose the right tool for the right job. Nesting fivefor
loops withif
sounds like something in need of refactoring \$\endgroup\$holroy– holroy2015年11月15日 11:39:33 +00:00Commented Nov 15, 2015 at 11:39
map()
. But IMHO that is not really better and even a trick because in Python 3 map is a generator and executed just when you evaluate generator I.e. you should add something like[:]
at the end ofmap
code. \$\endgroup\$