1
\$\begingroup\$

I believe this is essentially the same method as the itertools.combinations function, but is there any way to make this code more more perfect in terms of speed, code size and readability :

def all_subsets(source,size):
 index = len(source)
 index_sets = [()]
 for sz in xrange(size):
 next_list = []
 for s in index_sets:
 si = s[len(s)-1] if len(s) > 0 else -1
 next_list += [s+(i,) for i in xrange(si+1,index)]
 index_sets = next_list
 subsets = []
 for index_set in index_sets:
 rev = [source[i] for i in index_set]
 subsets.append(rev)
 return subsets

Yields:

>>> Apriori.all_subsets(['c','r','i','s'],2)
[['c', 'r'], ['c', 'i'], ['c', 's'], ['r', 'i'], ['r', 's'], ['i', 's']]

There is probably a way to use generators or functional concepts, hopefully someone can suggest improvements.

asked Mar 2, 2013 at 19:38
\$\endgroup\$
1
  • \$\begingroup\$ Why aren't you using itertools.combinations? It's reasonable to do this for learning purposes, but if you are actually doing a bigger project you'll want to use the itertools version. \$\endgroup\$ Commented Mar 2, 2013 at 23:53

1 Answer 1

2
\$\begingroup\$

This type of problems lend themselves very well to recursion. A possible implementation, either in list or generator form could be:

def all_subsets(di, i) :
 ret = []
 for j, item in enumerate(di) :
 if i == 1 :
 ret = [(j,) for j in di]
 elif len(di) - j >= i :
 for subset in all_subsets(di[j + 1:], i - 1) :
 ret.append((item,) + subset)
 return ret
def all_subsets_gen(di, i) :
 for j, item in enumerate(di) :
 if i == 1 :
 yield (j,)
 elif len(di) - j >= i :
 for subset in all_subsets(di[j + 1:], i - 1) :
 yield (item,) + subset
>>> all_subsets(range(4), 3)
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]
>>> list(all_subsets_gen(range(4), 3))
[(0, 1, 2), (0, 1, 3), (0, 2, 3), (1, 2, 3)]

If you are going to run this on large sets, you may want to memoize intermediate results to speed things up.

answered Mar 2, 2013 at 23:54
\$\endgroup\$
3
  • \$\begingroup\$ RE the memorization, do you mean if the function is going to be run more than once? Otherwise I don't see how memorization would help here, since isn't there only one path down through the recursion, with each size of subsets being created just once? \$\endgroup\$ Commented Mar 3, 2013 at 16:14
  • \$\begingroup\$ @CrisStringfellow In my example above, the (0, 2, 3) and the (1, 2, 3) are both calling all_subsets([2, 3], 2) and it gets worse for larger sets, especially with a small i. \$\endgroup\$ Commented Mar 3, 2013 at 16:47
  • \$\begingroup\$ Got it. I was wrong. But couldn't that be subtly changed by making the call to 2,3 and then prepending the first element/s? Is this what you mean by memorization? \$\endgroup\$ Commented Mar 3, 2013 at 16:50

Your Answer

Draft saved
Draft discarded

Sign up or log in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

Post as a guest

Required, but never shown

By clicking "Post Your Answer", you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.