I have a list of dictionaries, which I want to make round robin mixing.
sample = [
{'source': 'G', '"serial"': '0'},
{'source': 'G', '"serial"': '1'},
{'source': 'G', '"serial"': '2'},
{'source': 'P', '"serial"': '30'},
{'source': 'P', '"serial"': '0'},
{'source': 'P', '"serial"': '1'},
{'source': 'P', '"serial"': '2'},
{'source': 'P', '"serial"': '3'},
{'source': 'T', '"serial"': '2'},
{'source': 'T', '"serial"': '3'}
]
I want result as below:
sample_solved = [
{'source': 'G', '"serial"': '0'},
{'source': 'P', '"serial"': '30'},
{'source': 'T', '"serial"': '2'},
{'source': 'G', '"serial"': '1'},
{'source': 'P', '"serial"': '1'},
{'source': 'T', '"serial"': '3'},
{'source': 'G', '"serial"': '2'},
{'source': 'P', '"serial"': '0'},
{'source': 'P', '"serial"': '2'},
{'source': 'P', '"serial"': '3'}
]
The way I solved it as follows:
def roundrobin(*iterables):
# took from here https://docs.python.org/3/library/itertools.html#itertools-recipes
"roundrobin('ABC', 'D', 'EF') --> A D E B F C"
# Recipe credited to George Sakkis
pending = len(iterables)
nexts = cycle(iter(it).__next__ for it in iterables)
while pending:
try:
for next in nexts:
yield next()
except StopIteration:
pending -= 1
nexts = cycle(islice(nexts, pending))
def solve():
items_by_sources = collections.defaultdict(list)
for item in sample:
items_by_sources[item["source"]].append(item)
t, p, g = items_by_sources.values()
print(list(roundrobin(t, p, g)))
Using python default dict to separate the items by source and then using roundrobin solution which I got from python docs.
How can make the solution more pythonic or improved?
2 Answers 2
Using the roundrobin()
recipe from itertools
is already pythonic. The solve()
method could be replaced with more use of itertools
.
In particular itertools.groupby()
would do the same as your defaultdict
and for
loop:
>>> import operator as op
>>> g, p, t = [list(v) for k, v in groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(g, p, t))
[{'"serial"': '0', 'source': 'G'},
{'"serial"': '30', 'source': 'P'},
{'"serial"': '2', 'source': 'T'},
{'"serial"': '1', 'source': 'G'},
{'"serial"': '0', 'source': 'P'},
{'"serial"': '3', 'source': 'T'},
{'"serial"': '2', 'source': 'G'},
{'"serial"': '1', 'source': 'P'},
{'"serial"': '2', 'source': 'P'},
{'"serial"': '3', 'source': 'P'}]
You don't really need to unpack as you can make the call to roundrobin()
using *
, e.g:
>>> x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
>>> list(roundrobin(*x))
[{'"serial"': '0', 'source': 'G'},
{'"serial"': '30', 'source': 'P'},
...
Note roundrobin()
could be rewritten using itertools.zip_longest()
, which should be faster for near equal sized iterables e.g.:
def roundrobin(*iterables):
sentinel = object()
return (a for x in it.zip_longest(*iterables, fillvalue=sentinel)
for a in x if a != sentinel)
Did a quick run of a 10000 random items in sample
and found the recipe surprisingly slow (need to figure out why):
In [11]: sample = [{'source': random.choice('GPT'), 'serial': random.randrange(100)} for _ in range(10000)]
In [12]: x = [list(v) for k, v in it.groupby(sample, key=op.itemgetter('source'))]
In [13]: %timeit list(roundrobin_recipe(*x))
1 loop, best of 3: 1.48 s per loop
In [14]: %timeit list(roundrobin_ziplongest(*x))
100 loops, best of 3: 4.12 ms per loop
In [15]: %timeit TW_zip_longest(*x)
100 loops, best of 3: 6.36 ms per loop
In [16]: list(roundrobin_recipe(*x)) == list(roundrobin_ziplongest(*x))
True
In [17]: list(roundrobin_recipe(*x)) == TW_zip_longest(*x)
True
-
2\$\begingroup\$ You can use
operator.itemgetter('source')
instead of the lambda for better performances. There should also be no need of turningv
into a list as theroundrobin
recipe can work on any iterable. \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年03月03日 08:45:30 +00:00Commented Mar 3, 2017 at 8:45 -
1\$\begingroup\$ Yes,
itemgetter()
is good will update. You do need to turnv
into alist
because of the implementation ofgroupby
from docs: "The returned group is itself an iterator that shares the underlying iterable with groupby(). Because the source is shared, when the groupby() object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list" \$\endgroup\$AChampion– AChampion2017年03月05日 03:00:39 +00:00Commented Mar 5, 2017 at 3:00 -
\$\begingroup\$ Today I learned \$\endgroup\$301_Moved_Permanently– 301_Moved_Permanently2017年03月05日 09:13:06 +00:00Commented Mar 5, 2017 at 9:13
If you don't want to import itertools you can roll your own:
def zip_longest(*args):
result = []
max_index = max(len(arg) for arg in args)
for index in range(max_index):
result.extend([arg[index] for arg in args if len(arg) > index])
return result
zip_longest(*items_by_sources.values())
For short lists, they're about identical performance wise (which actually surprised me), although if you don't list()
the result you'd see gains from the generator. I find this method more readable, as it doesn't require any non built-in functions.
There is some speedup that could be found via removing args as they run out of elements, but it detracts from the readability and offers only a slight performance increase.
-
\$\begingroup\$ OP has already imported itertools, otherwise they'd not be able to use
itertools.cycle
oritertools.islice
inroundrobin
.roundrobin
is not part ofitertools
, but is an example described in the documentation. OP also isn't usingzip_longest
and so I also don't quite get why your only talking about it. \$\endgroup\$2017年03月03日 10:45:42 +00:00Commented Mar 3, 2017 at 10:45 -
\$\begingroup\$ @Peilonrayz
zip_longest(...)
is a drop in replacement forlist(roundrobin(...))
and removes the requirement foritertools
, including as a dependency for theroundrobin
recipe. \$\endgroup\$TemporalWolf– TemporalWolf2017年03月03日 18:24:21 +00:00Commented Mar 3, 2017 at 18:24 -
\$\begingroup\$ This does assume that the
iterables
are containers or strings in order to calllen()
on them. It's also confusing naming the function the same asitertools.zip_longest()
(Py3). Theroundrobin()
recipe works with any type of iterable, not sure if that is important to OP. You can also flatten anitertools.zip_longest()
with a guard to implementroundrobin()
without the requirement for sequences - see my update. \$\endgroup\$AChampion– AChampion2017年03月05日 04:01:53 +00:00Commented Mar 5, 2017 at 4:01
Explore related questions
See similar questions with these tags.