I need to loop through all possible combinations of an array for a project I'm doing. I'm doing this by looping through all integers in binary from 0
to 2^length of array - 1
. Then, a 0 represents to not include the item at that index, and 1 represents to include the item at that index. Here is my code:
def binary_combinations(array):
combinations = []
total_combinations = 2 ** len(array) - 1
for i in range(total_combinations + 1):
binary_indexes = ("{:0" + str(len(array)) + "d}").format(int(str(bin(i))[2:]))
current_combination = []
for j in range(len(binary_indexes)):
if binary_indexes[j] == "1":
current_combination.append(array[j])
combinations.append(current_combination)
return combinations
So if I called print(binary_combinations([1, 2, 3, 4])
, the output would be [[], [4], [3], [3, 4], [2], [2, 4], [2, 3], [2, 3, 4], [1], [1, 4], [1, 3], [1, 3, 4], [1, 2], [1, 2, 4], [1, 2, 3], [1, 2, 3, 4]]
.
In the context of my project, I want to check each combination as it is generated, instead of waiting for the entire array to be finished generating (so check the array []
, if it works, break
, otherwise move on to [4]
.
I am looking for the smallest possible array that satisfies a condition, so I need to loop through the sub-arrays from least to greatest length, and I don't have enough time to wait for the entire array of combinations to generate and then loop through it.
Is there a way to optimize the order of combinations I am creating, so the combinations are created in order of least to longest length?
2 Answers 2
DeathIncarnate's answer is already comprehensive in terms of code review. I would just like to address this point you made:
I am looking for the smallest possible array that satisfies a condition, so I need to loop through the sub-arrays from least to greatest length, and I don't have enough time to wait for the entire array of combinations to generate and then loop through it.
Algorithmically, this is of the highest importance, as early stopping can mitigate the effects of combinatorial explosion as you pass in longer arrays.
The standard library function itertools.combinations
already mentioned can be used to great effect here, with the use of yield from
:
import itertools
def ordered_combinations(array):
for i in range(len(array) + 1):
yield from itertools.combinations(array, i)
This returns a "flat" generator over all combinations, starting with the 0-length combination, then 1-length combinations, 2-length combinations and so on:
>>> list(ordered_combinations([1, 2, 3]):
[(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]
When used in a loop that stops when your condition is met, you never need to generate combinations longer than that minimum valid combination:
for sub_array in ordered_combinations([1, 2, 3]):
if condition(sub_array):
break
Or as a generator of matching combinations:
matches = (sub_array for sub_array in ordered_combinations([1, 2, 3]) if condition(sub_array))
first_match = next(matches)
(Just remember to not accidentally cast the generator to a list (with [
, ]
), as that obviates this benefit.)
Format
Modern Python often uses f-strings for formatting, it also has specification for binary format, this means that:
binary_indexes = ("{:0" + str(len(array)) + "d}").format(int(str(bin(i))[2:]))
Becomes:
binary_indexes = f"{i:0{len(array)}b}"
As brought up by Peter Cordes in the comments, if you want to be able to extract the bits of a number, it might be better to use something like:
def bits(n, length=None):
if length is None:
length = n.bit_length()
for _ in range(length):
yield n & 1
n = n >> 1
Which yields the bits (little-endian style, though since you're doing combinations this is not an issue)
Comprehensions
Instead of building a list in a loop, it is often much more efficient to use list comprehensions to construct lists, they allow for multiple loops and predicates. It's also often better to loop over objects themselves rather than indexing repeatedly
current_combination = []
for j in range(len(binary_indexes)):
if binary_indexes[j] == "1":
current_combination.append(array[j])
becomes
current_combination = [array[j] for j, ind in enumerate(binary_indexes)
if ind == "1"]
Or better
current_combination = [elem for elem, ind in zip(array, binary_indexes)
if ind == "1"]
If using the bits
method above, this might become:
current_combination = [elem for elem, bit in zip(array, bits(i, len(array)))
if bit]
Generators
You mention not wanting to wait until all combinations are generated. Python provides a tool for this called generators
.
Generators can be a little confusing at first, the way they are constructed, they look like functions (or list comprehensions), but they behave subtly differently. When they yield
a value they pause, storing whatever state's still there and on the next
call, they run until they hit the next yield
. They do not store their history! So you can't wind them back, so once they've gone past a value it's gone and they can exhaust (run out of values to return) which usually results in StopIteration
being raise
d as an error (for
loops handle this automatically for you).
You can build lists from generators, or iterate over them as iterables in a loop. In your case, your function becomes:
def binary_combinations(array):
total_combinations = 2 ** len(array) - 1
for i in range(total_combinations + 1):
binary_indexes = f"{i:0{len(array)}b}"
current_combination = [elem for elem, ind in zip(array, binary_indexes)
if ind == "1"]
yield current_combination
We can then call this by saying:
for comb in binary_combinations([1, 2, 3, 4]):
print(comb)
or if we need a list
x = list(binary_combinations([1, 2, 3, 4]))
or
x = [comb for comb in binary_combinations([1, 2, 3, 4]) if predicate]
or
x = binary_combinations([1, 2, 3, 4]) # This means X is a generator
first_val = next(x)
second_val = next(x)
Whatever else you want to do with it.
Using stdlib
It seems like what you're trying to do is something which already exists in the Python stdlib. The itertools
module (see: itertools.combinations
). If you're not implementing this as an exercise to yourself, I'd check that out along with the other components of the Python stdlib.
-
\$\begingroup\$ Is a string really the best way to enumerate the set bits of an integer? Wouldn't it work equally well to do something like
[array[j] for j in something if ((bigint>>j) & 1)]
. Or left-shift a1
bit, likebigint & (1<<j)
. (Redoing the big shift every time might be less efficient that left or right shifting a variable by 1 every time, but I don't know how to do that in a list comprehension). \$\endgroup\$Peter Cordes– Peter Cordes2023年04月11日 01:22:59 +00:00Commented Apr 11, 2023 at 1:22 -
\$\begingroup\$ Also, if the integer is sparse and you have an efficient count-trailing-zeros,
x &= x-1
to clear the lowest set bit, andj = clz(x)
to get the position of the new LSB. \$\endgroup\$Peter Cordes– Peter Cordes2023年04月11日 01:23:09 +00:00Commented Apr 11, 2023 at 1:23 -
\$\begingroup\$ No, it's not, but I wanted to keep what they'd done as closely as possible. I'm not sure efficiency is the most important thing here. \$\endgroup\$DeathIncarnate– DeathIncarnate2023年04月11日 07:25:16 +00:00Commented Apr 11, 2023 at 7:25
-
\$\begingroup\$ They do mention a performance consideration, I don't have enough time to wait for the entire array of combinations to generate and then loop through it. If algorithmic big-O considerations are important, perhaps constant factors are at least worth mentioning. If I knew more Python, I'd probably write an answer about that myself, but unfortunately I don't. \$\endgroup\$Peter Cordes– Peter Cordes2023年04月11日 07:42:19 +00:00Commented Apr 11, 2023 at 7:42
-
\$\begingroup\$ The performance is mostly solved by using the
itertools.combinations
, yes, you can get more efficient methods, but I always feel slight concern when I see optimisation (including bitwise operations) in Python, a not terribly performant language designed in my mind for simplicity in reading/writing and in the case you can't get away with Python's speeds you can always fall back on compiled C (possibly through numpy or other libraries). \$\endgroup\$DeathIncarnate– DeathIncarnate2023年04月11日 07:47:21 +00:00Commented Apr 11, 2023 at 7:47