The following program starts with hundreds of permutations and shrinks them down to a small result set. Furthermore, I am only looking for the first valid result. What techniques can I use to make this more efficient?
#!/usr/bin/python3
from itertools import permutations, zip_longest
def get_result():
checked = []
for permutation in permutations_of_items_in_bins(items='aabbb', number_of_bins=3, items_per_bin=2):
if permutation in checked:
continue
checked.append(permutation)
if is_valid_result(permutation):
return permutation
# this is for debug only
for x in sorted(checked):
print(_format(x))
return None
def is_valid_result(result):
return False # in reality, this may return True
# from: http://stackoverflow.com/questions/312443/how-do-you-split-a-list-into-evenly-sized-chunks
def _get_chunks(n, iterable, padvalue=None):
"grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)
def _format(iterable):
return ','.join([''.join(c) for c in iterable])
def permutations_of_items_in_bins(items: str, items_per_bin: int, number_of_bins: int, pad='z'):
# if not enough items to fit in all the bins, pad them with a value representing "empty"
items = items.rjust(number_of_bins * items_per_bin, pad)
# get all possible sort orders of the items
_permutations = permutations(items)
# that are unique
unique_permutations = set(_permutations)
# for each sort order, split them into bins in that order
for permutation in unique_permutations:
# split items into bins
chunks = list(_get_chunks(items_per_bin, permutation))
# item sort order in bins doesn't matter, so let's always sort the items in each bin
normalized = [tuple(sorted(x)) for x in chunks]
yield normalized
get_result()
"""
aa,bb,bz
aa,bz,bb
ab,ab,bz
ab,az,bb
ab,bb,az
ab,bz,ab
az,ab,bb
az,bb,ab
bb,aa,bz
bb,ab,az
bb,az,ab
bb,bz,aa
bz,aa,bb
bz,ab,ab
bz,bb,aa
"""
Oleg Pryadko
-
1Have you tried profiling it yet? Where are the hot spots?Robert Harvey– Robert Harvey2016年10月26日 23:49:46 +00:00Commented Oct 26, 2016 at 23:49
-
Techniques are no good without knowing what you need to fix. So do that first.Mike Dunlavey– Mike Dunlavey2016年10月27日 12:29:32 +00:00Commented Oct 27, 2016 at 12:29
-
I avoided profiling it because I want to rewrite this to other programming languages. I understand Python itertools uses optimization in C code, while other languages may not have that. But I thought the problem here was pretty obvious: the processing time goes up (exponentially?) as you increase the number of items, number of bins, and size of bins. I think @idan-arye's answer solves that problem brilliantly!Oleg Pryadko– Oleg Pryadko2016年10月27日 15:25:46 +00:00Commented Oct 27, 2016 at 15:25
-
This is my first time posting an algorithm question. Maybe there is a better way I could have explained that?Oleg Pryadko– Oleg Pryadko2016年10月27日 15:32:05 +00:00Commented Oct 27, 2016 at 15:32
-
So what if you're going to use different languages? If your algorithm is doing something it doesn't need to do, it will waste time in any language. Anything you fix in python will carry over to language X. As I said do this first, and do it again in whatever language.Mike Dunlavey– Mike Dunlavey2016年10月27日 16:45:21 +00:00Commented Oct 27, 2016 at 16:45
1 Answer 1
Instead of generation everything and removing the duplicates, generate them without duplicates in the first place:
def permutations_of_items_in_bins(items, items_per_bin, number_of_bins, pad='z'):
# Convert items to a dictionary that tracks how many we have of each
pile = {}
for item in items:
try:
pile[item] += 1
except KeyError:
pile[item] = 1
# Pad to make sure we can generate combinations of the required length
num_missing = items_per_bin * number_of_bins - sum(pile.values())
if 0 < num_missing:
try:
pile[pad] += num_missing
except KeyError:
pile[pad] = num_missing
# We need this for iteration on possibilities and for ordering the bins
unique_items = list(sorted(pile.keys()))
# While each combination is yielded the pile is updated to remove the items in that combination
def combinations_consuming(size=items_per_bin, items=unique_items):
if size == 0:
yield ()
return
items = items[:] # create a new list
while items:
item = items[0]
if pile[item]:
pile[item] -= 1
for suffix in combinations_consuming(size - 1, items):
yield (item, *suffix)
pile[item] += 1
items.pop(0)
def generate_bins(num=number_of_bins):
if num == 0:
yield ()
return
for combination in combinations_consuming():
# The nested call to generate_bins will not be able to use the items already in combination
for suffix in generate_bins(num - 1):
yield (combination, *suffix)
yield from generate_bins()
answered Oct 27, 2016 at 1:47
-
This is excellent! It even yields already-sorted results. Thank you! By any chance, does it use any algorithm or concept that has a name?Oleg Pryadko– Oleg Pryadko2016年10月27日 15:18:09 +00:00Commented Oct 27, 2016 at 15:18
Explore related questions
See similar questions with these tags.
lang-py