Python list dictionary items round robin mixing

Question 1

I have a list of dictionaries, which I want to make round robin mixing.

sample = [
 {'source': 'G', '"serial"': '0'},
 {'source': 'G', '"serial"': '1'},
 {'source': 'G', '"serial"': '2'},
 {'source': 'P', '"serial"': '30'},
 {'source': 'P', '"serial"': '0'},
 {'source': 'P', '"serial"': '1'},
 {'source': 'P', '"serial"': '2'},
 {'source': 'P', '"serial"': '3'},
 {'source': 'T', '"serial"': '2'},
 {'source': 'T', '"serial"': '3'}
]

I want result as below:

sample_solved = [
 {'source': 'G', '"serial"': '0'},
 {'source': 'P', '"serial"': '30'},
 {'source': 'T', '"serial"': '2'},
 {'source': 'G', '"serial"': '1'},
 {'source': 'P', '"serial"': '1'},
 {'source': 'T', '"serial"': '3'},
 {'source': 'G', '"serial"': '2'},
 {'source': 'P', '"serial"': '0'},
 {'source': 'P', '"serial"': '2'},
 {'source': 'P', '"serial"': '3'}
]

The way I solved it as follows:

def roundrobin(*iterables):
 # took from here https://docs.python.org/3/library/itertools.html#itertools-recipes
 "roundrobin('ABC', 'D', 'EF') --> A D E B F C"
 # Recipe credited to George Sakkis
 pending = len(iterables)
 nexts = cycle(iter(it).__next__ for it in iterables)
 while pending:
 try:
 for next in nexts:
 yield next()
 except StopIteration:
 pending -= 1
 nexts = cycle(islice(nexts, pending))
def solve():
 items_by_sources = collections.defaultdict(list)
 for item in sample:
 items_by_sources[item["source"]].append(item)
 t, p, g = items_by_sources.values()
 print(list(roundrobin(t, p, g)))

Using python default dict to separate the items by source and then using roundrobin solution which I got from python docs.

How can make the solution more pythonic or improved?

Question 2

Using the roundrobin() recipe from itertools is already pythonic. The solve() method could be replaced with more use of itertools.

In particular itertools.groupby() would do the same as your defaultdict and for loop:

>>> import operator as op
>>> g, p, t = [list(v) for k, v in groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(g, p, t))
[{'"serial"': '0', 'source': 'G'},
 {'"serial"': '30', 'source': 'P'},
 {'"serial"': '2', 'source': 'T'},
 {'"serial"': '1', 'source': 'G'},
 {'"serial"': '0', 'source': 'P'},
 {'"serial"': '3', 'source': 'T'},
 {'"serial"': '2', 'source': 'G'},
 {'"serial"': '1', 'source': 'P'},
 {'"serial"': '2', 'source': 'P'},
 {'"serial"': '3', 'source': 'P'}]

You don't really need to unpack as you can make the call to roundrobin() using *, e.g:

>>> x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(*x))
[{'"serial"': '0', 'source': 'G'},
 {'"serial"': '30', 'source': 'P'},
...

Note roundrobin() could be rewritten using itertools.zip_longest(), which should be faster for near equal sized iterables e.g.:

def roundrobin(*iterables):
 sentinel = object()
 return (a for x in it.zip_longest(*iterables, fillvalue=sentinel) 
 for a in x if a != sentinel)

Did a quick run of a 10000 random items in sample and found the recipe surprisingly slow (need to figure out why):

In [11]: sample = [{'source': random.choice('GPT'), 'serial': random.randrange(100)} for _ in range(10000)]
In [12]: x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
In [13]: %timeit list(roundrobin_recipe(*x))
1 loop, best of 3: 1.48 s per loop
In [14]: %timeit list(roundrobin_ziplongest(*x))
100 loops, best of 3: 4.12 ms per loop
In [15]: %timeit TW_zip_longest(*x)
100 loops, best of 3: 6.36 ms per loop
In [16]: list(roundrobin_recipe(*x)) == list(roundrobin_ziplongest(*x))
True
In [17]: list(roundrobin_recipe(*x)) == TW_zip_longest(*x)
True

Question 3

You can use operator.itemgetter('source') instead of the lambda for better performances. There should also be no need of turning v into a list as the roundrobin recipe can work on any iterable.

Question 4

Yes, itemgetter() is good will update. You do need to turn v into a list because of the implementation of groupby from docs: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list"

Question 5

Today I learned

Question 6

If you don't want to import itertools you can roll your own:

def zip_longest(*args):
 result = []
 max_index = max(len(arg) for arg in args)
 for index in range(max_index):
 result.extend([arg[index] for arg in args if len(arg) > index])
 return result
zip_longest(*items_by_sources.values())

For short lists, they're about identical performance wise (which actually surprised me), although if you don't list() the result you'd see gains from the generator. I find this method more readable, as it doesn't require any non built-in functions.

There is some speedup that could be found via removing args as they run out of elements, but it detracts from the readability and offers only a slight performance increase.

Question 7

OP has already imported itertools, otherwise they'd not be able to use itertools.cycle or itertools.islice in roundrobin. roundrobin is not part of itertools, but is an example described in the documentation. OP also isn't using zip_longest and so I also don't quite get why your only talking about it.

Question 8

@Peilonrayz zip_longest(...) is a drop in replacement for list(roundrobin(...)) and removes the requirement for itertools, including as a dependency for the roundrobin recipe.

Question 9

This does assume that the iterables are containers or strings in order to call len() on them. It's also confusing naming the function the same as itertools.zip_longest() (Py3). The roundrobin() recipe works with any type of iterable, not sure if that is important to OP. You can also flatten an itertools.zip_longest() with a guard to implement roundrobin() without the requirement for sequences - see my update.

AChampion AChampion 4962 silver badges9 bronze badges · Answer 1 · 2017-03-03 04:12:33Z

Using the roundrobin() recipe from itertools is already pythonic. The solve() method could be replaced with more use of itertools.

In particular itertools.groupby() would do the same as your defaultdict and for loop:

>>> import operator as op
>>> g, p, t = [list(v) for k, v in groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(g, p, t))
[{'"serial"': '0', 'source': 'G'},
 {'"serial"': '30', 'source': 'P'},
 {'"serial"': '2', 'source': 'T'},
 {'"serial"': '1', 'source': 'G'},
 {'"serial"': '0', 'source': 'P'},
 {'"serial"': '3', 'source': 'T'},
 {'"serial"': '2', 'source': 'G'},
 {'"serial"': '1', 'source': 'P'},
 {'"serial"': '2', 'source': 'P'},
 {'"serial"': '3', 'source': 'P'}]

You don't really need to unpack as you can make the call to roundrobin() using *, e.g:

>>> x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(*x))
[{'"serial"': '0', 'source': 'G'},
 {'"serial"': '30', 'source': 'P'},
...

Note roundrobin() could be rewritten using itertools.zip_longest(), which should be faster for near equal sized iterables e.g.:

def roundrobin(*iterables):
 sentinel = object()
 return (a for x in it.zip_longest(*iterables, fillvalue=sentinel) 
 for a in x if a != sentinel)

Did a quick run of a 10000 random items in sample and found the recipe surprisingly slow (need to figure out why):

In [11]: sample = [{'source': random.choice('GPT'), 'serial': random.randrange(100)} for _ in range(10000)]
In [12]: x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
In [13]: %timeit list(roundrobin_recipe(*x))
1 loop, best of 3: 1.48 s per loop
In [14]: %timeit list(roundrobin_ziplongest(*x))
100 loops, best of 3: 4.12 ms per loop
In [15]: %timeit TW_zip_longest(*x)
100 loops, best of 3: 6.36 ms per loop
In [16]: list(roundrobin_recipe(*x)) == list(roundrobin_ziplongest(*x))
True
In [17]: list(roundrobin_recipe(*x)) == TW_zip_longest(*x)
True

You can use operator.itemgetter('source') instead of the lambda for better performances. There should also be no need of turning v into a list as the roundrobin recipe can work on any iterable.
Yes, itemgetter() is good will update. You do need to turn v into a list because of the implementation of groupby from docs: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list"

TemporalWolf TemporalWolf 3242 silver badges7 bronze badges · Answer 2 · 2017-03-03 05:19:37Z

1

\$\begingroup\$

If you don't want to import itertools you can roll your own:

def zip_longest(*args):
 result = []
 max_index = max(len(arg) for arg in args)
 for index in range(max_index):
 result.extend([arg[index] for arg in args if len(arg) > index])
 return result
zip_longest(*items_by_sources.values())

For short lists, they're about identical performance wise (which actually surprised me), although if you don't list() the result you'd see gains from the generator. I find this method more readable, as it doesn't require any non built-in functions.

There is some speedup that could be found via removing args as they run out of elements, but it detracts from the readability and offers only a slight performance increase.

Share

edited Mar 3, 2017 at 18:28

answered Mar 3, 2017 at 5:19

TemporalWolf's user avatar

TemporalWolf TemporalWolf

3242 silver badges7 bronze badges

\$\endgroup\$

3

\$\begingroup\$ OP has already imported itertools, otherwise they'd not be able to use itertools.cycle or itertools.islice in roundrobin. roundrobin is not part of itertools, but is an example described in the documentation. OP also isn't using zip_longest and so I also don't quite get why your only talking about it. \$\endgroup\$

Peilonrayz
– Peilonrayz ♦

2017年03月03日 10:45:42 +00:00
Commented Mar 3, 2017 at 10:45
\$\begingroup\$ @Peilonrayz zip_longest(...) is a drop in replacement for list(roundrobin(...)) and removes the requirement for itertools, including as a dependency for the roundrobin recipe. \$\endgroup\$

TemporalWolf
– TemporalWolf

2017年03月03日 18:24:21 +00:00
Commented Mar 3, 2017 at 18:24
\$\begingroup\$ This does assume that the iterables are containers or strings in order to call len() on them. It's also confusing naming the function the same as itertools.zip_longest() (Py3). The roundrobin() recipe works with any type of iterable, not sure if that is important to OP. You can also flatten an itertools.zip_longest() with a guard to implement roundrobin() without the requirement for sequences - see my update. \$\endgroup\$

AChampion
– AChampion

2017年03月05日 04:01:53 +00:00
Commented Mar 5, 2017 at 4:01

Add a comment |

Stack Exchange Network

Python list dictionary items round robin mixing

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Hot Network Questions

Python list dictionary items round robin mixing

2 Answers 2

Your Answer

Sign up or log in

Post as a guest

Post as a guest

Related

Hot Network Questions