This is really a follow on from a question I asked earlier this year on Stack Overflow. I received a great answer to the following problem:
I'm trying to code up something simple and pythonic to identify combinations of values from a list which sum to a defined value, within some tolerance.
For example:
if A=[0.4,2,3,1.4,2.6,6.3] and the target value is 5 +/- 0.5, then the output I want is (2,3), (1.4,2.6), (2,2.6), (0.4,2,3), (0.4,3,1.4) etc. if no combinations are found then the function should return 0 or none or something similar.
I implemented the suggestion in my code. However, the method has quickly become the performance limiting step in my code. It's fairly quick to run each iteration but it is being run many, many times.
Can you see any way to optimise this function or replace it with something much faster?
def findSum(self, dataArray, target, tolerance=0.5):
for i in xrange(1, len(dataArray)+1):
results = [list(comb) for comb in list(itertools.combinations(dataArray, i))
if target-tolerance < sum(map(float, comb)) < target+tolerance]
if len(results) != 0:
return results
1 Answer 1
In keeping the same algorithm, there's a bunch of extra work you're doing that's slowing you down. I am using the word slow loosely, since the one example you provided is still more or less instantaneous and the timings I performed were based on 100,000 runs.
Leave iterables alone
You wrap your combinations()
call with list()
. This means that you have to do actually create all the combinations up front, then iterate over them. But you never actually need all of them at once. You just need them one at a time. Changing your body to:
results = [list(comb) for comb in itertools.combinations(dataArray, i)
if target-tolerance < sum(map(float, comb)) < target+tolerance]
drops runtime from 2.64s to 2.45s. Furthermore, comb
is going to be a tuple, so that's fine as is too:
results = [comb for comb in itertools.combinations(dataArray, i)
if target-tolerance < sum(map(float, comb)) < target+tolerance]
Down to 2.24s. Another 10%.
Building lists for no reason
Our numbers are already summable, no need to cast them to float
first. You're making a whole bunch of extra lists that you're immediately discarding - just to be able to do a sum. But you can already do a sum:
results = [comb for comb in itertools.combinations(dataArray, i)
if target-tolerance < sum(comb) < target+tolerance]
That alone brings us down to 1.10s. An enormous savings.
Comparisons
One comparison is faster than two comparisons, so let's just do the one:
results = [comb for comb in itertools.combinations(dataArray, i)
if abs(sum(comb) - target) < tolerance]
Down to 1.04s.
What do you actually want
This doesn't yield all of the potential partial sums. It will yield some sums if any exist, and will return None
otherwise - but it won't give you all of them. If you want all of them, you'll have to loop through all of the i
s. If you just want the first one, then this:
for i in xrange(1, len(dataArray)+1):
for comb in itertools.combinations(dataArray, i):
if abs(sum(comb) - target) < tolerance:
return comb
just got reduced to 0.67s.
As Joe Wallis suggests, you probably want:
for i in xrange(1, len(dataArray)+1):
for comb in itertools.combinations(dataArray, i):
if abs(sum(comb) - target) < tolerance:
yield comb
That way the caller can decide if one is sufficient or if all of them are needed.
-
\$\begingroup\$ You should say if they want all combinations they should use
yield comb
, as otherwise it, like the question, will fail most cases. And not fulfill the task of getting all of them. Then they can just use onenext
, if they want the first. \$\endgroup\$2015年10月15日 22:22:48 +00:00Commented Oct 15, 2015 at 22:22 -
\$\begingroup\$ Many thanks @Barry! I think I can implement all of these except the tuple output one - the functions which call this method expect a list of lists, hence list(comb). In terms of the combinations, I need all combinations I'm afraid. Oh and time wise, the example I gave was a simple case. In reality the list of values (dataArray) is much larger. \$\endgroup\$Mark– Mark2015年10月15日 22:28:47 +00:00Commented Oct 15, 2015 at 22:28
-
\$\begingroup\$ Could you please indicate how you got your timing references? I would like to do similar stuff... \$\endgroup\$holroy– holroy2015年10月15日 22:42:46 +00:00Commented Oct 15, 2015 at 22:42
-
\$\begingroup\$ @holroy I assume that Barry is using timeit. \$\endgroup\$2015年10月16日 07:44:37 +00:00Commented Oct 16, 2015 at 7:44
Explore related questions
See similar questions with these tags.
A = [4,20,30,14,26,63]
and the target range is 45 to 55? How large would your scaled target value be then in your real use case? And how many items can the list have? \$\endgroup\$